As a developer, having the ability to quickly and reliably download files from the internet is critical. The open-source wget command offers advanced file download capabilities that every developer should have in their toolkit.
In this comprehensive guide, we will dive deep into using wget in the Windows PowerShell environment. Whether you are looking to mirror websites, resume failed downloads, or tweak low-level options, this guide has you covered. Follow along as we explore real-world examples and unlock the full potential of PowerShell‘s wget alias.
An Introduction to wget
Wget originated as command line utility on Linux and UNIX platforms for downloading content from the web. Some of the notable features include:
- Ability to download files from HTTP, HTTPS, and FTP servers
- Resumable downloads for large files or unreliable connections
- Recursive mirroring of entire folder structures and websites
- Customizable number of connection retries and timeouts
- Bandwidth throttling for managing bandwidth utilization
- No external library dependencies – runs solely from command line
As a developer, having these advanced networking capabilities built directly into the command line unlocks new levels of automation and simplifies the process of moving files between systems.
Fortunately, Microsoft made wget available in PowerShell 3.0 and beyond by essentially aliasing it to Invoke-WebRequest
. When you run the wget command, PowerShell handles everything seamlessly in the background.
Now let‘s explore some practical examples and functionality provided by wget in PowerShell.
Downloading a Single File
The most basic wget usage scenario is downloading a file from a web server to your local system.
Here is a simple example to download an ISO image file:
wget https://distrowatch.com/table.php?distribution=garuda
By default this will save the target file to your current working directory with the filename from the URL (in this case table.php).
You can also explicitly define the local output filename and path like so:
wget https://distrowatch.com/table.php?distribution=garuda -OutFile C:\Users\John\Downloads\garuda.html
When downloading a single resource, wget and PowerShell behave very similarly to simply calling the file directly in your browser. However, the main advantage is avoiding the interactive user interface and enabling scripted automation.
Measuring Download Speeds
One neat trick developers love is benchmarking download speeds using wget‘s built-in timing metrics.
Use the --output-file
argument to log bandwidth metrics to a text file:
wget https://speed.hetzner.de/1GB.bin --output-file=speed-log.txt
This will save download timing information including the bytes transferred and overall speed:
2022-07-27 16:21:03 (1.01 MB/s) - ‘1GB.bin’ saved [1073741824/1073741824]
By running against a large test file on a server with plenty of bandwidth, you can accurately measure the maximum download capabilities of your internet connection.
This is extremely useful when profiling new infrastructure or troubleshooting performance.
Resuming Failed Downloads
One of wget‘s killer features is the ability resume failed downloads from where they left off. This saves you from having to restart from 0% when file transfers get unexpectedly interrupted.
To resume a partial download from a remote server, use wget‘s -c
flag:
wget -c https://releases.ubuntu.com/22.04/ubuntu-22.04.1-live-server-amd64.iso
As long as the server supports byte serving, wget will add the Range
header to continue from the previous byte position rather than starting over.
Automating Download Retries
Taking this one step further, you can configure automatic download retries by specifying --tries=N
.
For example, this command will attempt the download 20 times before considering it a failure:
wget --tries=20 https://download.visualstudio.microsoft.com/download/pr/c4fef23e-cc45-4836-9544-70e213134bc8/1ee5717e9a1e05315b7dc868a629e602/dotnet-sdk-6.0.100-win-x64.exe
Combine this with scheduled task automation, and you have a resilient system for attempting downloads even across system restarts!
Downloading Multiple Files
Wget also makes it easy to pull down multiple files in a single command by specifying all URLs sequentially:
wget https://example.com/img1.png https://example.com/img2.png https://example.com/img3.png
You can utilize this when seeding sample data, setting up developer sandbox environments, or compiling local mirrors of resources commonly needed when offline.
This becomes even more powerful when coupled with file list text documents. For example, create a simple text file like:
urls.txt
---------
https://file1.com
https://file2.com
https://file3.com
Then use wget to iterate through and download everything automatically:
wget -i urls.txt
As you can see, basic bash piping makes it straightforward to scale up wide-reaching download workflows.
Accelerating Downloads with Multiple Connections
A key technique for improving throughput with wget is downloading pieces of a file concurrently across multiple connections.
Use the -t
parameter to dictate how many simultaneous connections:
wget -t 10 http://distro.ibiblio.org/quirky/quirky6/x86_64/quirky-6.1.2.iso
Here we are pulling 10 threads against a single ISO image download. By loading balance and utilizing multiple TCP streams in parallel, you can significantly speed up transfers based on your available bandwidth.
Just be careful not to overdo it and saturate your pipe or crash underpowered servers!
Recursively Downloading Websites
One of wget‘s most popular uses is fully mirroring websites for offline development and testing.
The --mirror
argument downloads a site recursively following rules in the robots.txt file:
wget --mirror https://classic.yarnpkg.com/en/docs
Alternatively, you pass --page-requisites
and --convert-links
to more aggressively mirror an exact site copy:
wget --page-requisites --convert-links https://handlebarsjs.com
The whole folder structure is retained locally for fully offline static browsing. Images, CSS, JS documents get stored appropriately relative to the HTML files.
This technique is immensely useful for pulling down copies of documentation, frameworks, tutorials, and more for offline study.
Mirroring Only Certain Resources
When downloading large sites, you may want to target only specific file types in order to conserve bandwidth and disk space.
For example, to selectively mirror only PDF documents:
wget -r -A.pdf https://www.healthit.gov/topic/safety
The -A
flag accepts a regex pattern of files to accept. Combine this with the recursive flag to walk through links on a domain pulling PDFs.
Similarly, reject unwanted content using --reject
:
wget -r --reject "*.css,*.js" https://apidocs.chef.io/
Here CSS and JS files are omitted, allowing you to save docs and images from Chef‘s API for local viewing minus the styling.
Such selectivity paired with wget‘s breadth makes capturing relevant subsets of data a breeze!
Advanced Download Options
So far we have covered basic usage of wget for common scenarios like mirroring websites and resuming downloads. Now let‘s explore some of the more advanced functionality and tweaks available.
Authentication Options
Many downloads may require authentication these days before permitting access. Wget offers a few ways to handle this:
- Basic Auth – Pass username & password via
-auth-no-challenge
- Client Certificates – Specify public and private keys through command options
- Bearer Tokens – Manually inject authorization header with
--header
For example:
wget --header="Authorization: Bearer fdslkjfds23232..." https://api.acme.com/endpoint
This grants access by directly providing the expected JWT without the formal login flow.
Custom User Agents
Websites can identify and block traffic from wget based on the default user agent values.
Spoof a more typical browser agent string using the --user-agent
argument:
wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)..." https://www.amazon.com
This enables better mimicry providing sites the expected header values through automation scripts.
Rate Limiting Bandwidth
To avoid saturating connections and potentially crashing networks, use --limit-rate=100m
to throttle transfer speeds in bytes per second.
For example:
wget --limit-rate=3M https://cdnjs.com/libraries/lodash.js/4.17.21
Here the download is restricted to ~3 Mbps to avoid overwhelming the server.
Apply this cautiously across large collections of file transfers to reduce impact.
Password Protecting Archives
When downloading sensitive files, you may want to encrypt them or set a password for access.
Use --restrict-file-names=unix
to ZIP archive each file while pulling, requiring a password to extract:
wget --restrict-file-names=unix https://fileserver.com
You can also pipe through desktop archiving tools like 7-Zip by calling out to cmd.exe. This keeps your downloads secure whether at-rest on the filesystem or in-transit through unencrypted networks.
Common wget Pitfalls
While extremely useful, wget does come with some nuances to be aware of – especially when running in PowerShell.
Non-Gzipped Responses
One issue developers commonly encounter is non-HTML content getting mangled when saved through wget.
PowerShell attempts to parse the document as gzipped text, corrupting BIN files, PDFs, images and more.
The fix is forcing basic handling of responses with -UseBasicParsing
:
wget -UseBasicParsing https://filebin.net/om1meobgydw8m04c/test.pdf
This properly saves the raw file contents unmodified.
Certificate Errors
Secure HTTPS connectivity often breaks the download process with SSL verification issues:
Write-Error : Unable to connect to the remote server
Use the -UseDefaultCredentials
option to ignore these for internal domains along with corporate proxies:
wget -UseDefaultCredentials https://mycompany.sharepoint.com
This properly authenticates using your default IE credentials to access resources.
Access Denied
Despite proper credentials, you may encounter access errors to restricted resources:
Invoke-WebRequest : Authorization failed from the remote server
Some systems block anything that looks like automated wget traffic. Workaround this by passing a vanilla browser user agent string:
wget --user-agent="Mozilla/5.0 (Windows NT 10.0; ..." https://docs.mydomain.com
This tricks endpoints into allowing your scripted session through essentially camouflaging it as a normal desktop browser request.
Tips and Tricks
Let‘s round out the guide with some helpful tips for mastering wget in your PowerShell developer toolkit:
Continue Broken Transfers
Resume failed partial downloads using -c
:
wget -c https://fileserver.com/files/disk-image.vhd
This continues an existing transfer vs. discarding past progress.
Follow Site Directory Structure
Mirror a remote path hierarchy locally adding -np
:
wget -r -np https://assets.cms.gov/resources/framework
The nested folder and file layout gets retained as-is.
Overwrite Identical Files
Avoid cluttering disks and re-download files with bit-matching names using -nc
:
wget -c -nc https://sourceforge.net
This causes duplicate named items across the domain to get overwritten atomically.
Pull Specific Sites
When recursing directories, only spider certain domains:
wget -D site1.com,site2.com https://server.com
This restricts mirroring to just resources living under site1.com and site2.com avoiding anything else linked.
Configure Retries
Handle flaky infrastructure defining retry limits via --waitretry
:
wget --waitretry=3 https://volatile.ds
Here failed requests gets attempted 3 times improving resilience.
Wrapping Up
That wraps up our deep dive into wielding the full power of the wget in Windows PowerShell. While mainly intended for Linux environments, Microsoft did developers a solid by enabling this utility directly within PowerShell with intuitive native compatibility.
We explored simple single file downloads, resuming broken transfers, recursively mirroring directories, throttling speeds, automating authentication and more. You are now equipped with expert techniques for scripting robust download workflows.
Wget skills unlock the potential for connecting disjointed infrastructure, capturing internet resources for offline use, transferring bulk data, and backing up critical production artifacts in a pinch. Add this tool your PowerShell arsenal to enable reliable downloads via code!