As a developer, having the ability to quickly and reliably download files from the internet is critical. The open-source wget command offers advanced file download capabilities that every developer should have in their toolkit.

In this comprehensive guide, we will dive deep into using wget in the Windows PowerShell environment. Whether you are looking to mirror websites, resume failed downloads, or tweak low-level options, this guide has you covered. Follow along as we explore real-world examples and unlock the full potential of PowerShell‘s wget alias.

An Introduction to wget

Wget originated as command line utility on Linux and UNIX platforms for downloading content from the web. Some of the notable features include:

  • Ability to download files from HTTP, HTTPS, and FTP servers
  • Resumable downloads for large files or unreliable connections
  • Recursive mirroring of entire folder structures and websites
  • Customizable number of connection retries and timeouts
  • Bandwidth throttling for managing bandwidth utilization
  • No external library dependencies – runs solely from command line

As a developer, having these advanced networking capabilities built directly into the command line unlocks new levels of automation and simplifies the process of moving files between systems.

Fortunately, Microsoft made wget available in PowerShell 3.0 and beyond by essentially aliasing it to Invoke-WebRequest. When you run the wget command, PowerShell handles everything seamlessly in the background.

Now let‘s explore some practical examples and functionality provided by wget in PowerShell.

Downloading a Single File

The most basic wget usage scenario is downloading a file from a web server to your local system.

Here is a simple example to download an ISO image file:

wget https://distrowatch.com/table.php?distribution=garuda

By default this will save the target file to your current working directory with the filename from the URL (in this case table.php).

You can also explicitly define the local output filename and path like so:

wget https://distrowatch.com/table.php?distribution=garuda -OutFile C:\Users\John\Downloads\garuda.html

When downloading a single resource, wget and PowerShell behave very similarly to simply calling the file directly in your browser. However, the main advantage is avoiding the interactive user interface and enabling scripted automation.

Measuring Download Speeds

One neat trick developers love is benchmarking download speeds using wget‘s built-in timing metrics.

Use the --output-file argument to log bandwidth metrics to a text file:

wget https://speed.hetzner.de/1GB.bin --output-file=speed-log.txt

This will save download timing information including the bytes transferred and overall speed:

2022-07-27 16:21:03 (1.01 MB/s) - ‘1GB.bin’ saved [1073741824/1073741824]

By running against a large test file on a server with plenty of bandwidth, you can accurately measure the maximum download capabilities of your internet connection.

This is extremely useful when profiling new infrastructure or troubleshooting performance.

Resuming Failed Downloads

One of wget‘s killer features is the ability resume failed downloads from where they left off. This saves you from having to restart from 0% when file transfers get unexpectedly interrupted.

To resume a partial download from a remote server, use wget‘s -c flag:

wget -c https://releases.ubuntu.com/22.04/ubuntu-22.04.1-live-server-amd64.iso

As long as the server supports byte serving, wget will add the Range header to continue from the previous byte position rather than starting over.

Automating Download Retries

Taking this one step further, you can configure automatic download retries by specifying --tries=N.

For example, this command will attempt the download 20 times before considering it a failure:

wget --tries=20 https://download.visualstudio.microsoft.com/download/pr/c4fef23e-cc45-4836-9544-70e213134bc8/1ee5717e9a1e05315b7dc868a629e602/dotnet-sdk-6.0.100-win-x64.exe

Combine this with scheduled task automation, and you have a resilient system for attempting downloads even across system restarts!

Downloading Multiple Files

Wget also makes it easy to pull down multiple files in a single command by specifying all URLs sequentially:

wget https://example.com/img1.png https://example.com/img2.png https://example.com/img3.png

You can utilize this when seeding sample data, setting up developer sandbox environments, or compiling local mirrors of resources commonly needed when offline.

This becomes even more powerful when coupled with file list text documents. For example, create a simple text file like:

urls.txt
---------
https://file1.com
https://file2.com 
https://file3.com

Then use wget to iterate through and download everything automatically:

wget -i urls.txt

As you can see, basic bash piping makes it straightforward to scale up wide-reaching download workflows.

Accelerating Downloads with Multiple Connections

A key technique for improving throughput with wget is downloading pieces of a file concurrently across multiple connections.

Use the -t parameter to dictate how many simultaneous connections:

wget -t 10 http://distro.ibiblio.org/quirky/quirky6/x86_64/quirky-6.1.2.iso

Here we are pulling 10 threads against a single ISO image download. By loading balance and utilizing multiple TCP streams in parallel, you can significantly speed up transfers based on your available bandwidth.

Just be careful not to overdo it and saturate your pipe or crash underpowered servers!

Recursively Downloading Websites

One of wget‘s most popular uses is fully mirroring websites for offline development and testing.

The --mirror argument downloads a site recursively following rules in the robots.txt file:

wget --mirror https://classic.yarnpkg.com/en/docs

Alternatively, you pass --page-requisites and --convert-links to more aggressively mirror an exact site copy:

wget --page-requisites --convert-links https://handlebarsjs.com

The whole folder structure is retained locally for fully offline static browsing. Images, CSS, JS documents get stored appropriately relative to the HTML files.

This technique is immensely useful for pulling down copies of documentation, frameworks, tutorials, and more for offline study.

Mirroring Only Certain Resources

When downloading large sites, you may want to target only specific file types in order to conserve bandwidth and disk space.

For example, to selectively mirror only PDF documents:

wget -r -A.pdf https://www.healthit.gov/topic/safety 

The -A flag accepts a regex pattern of files to accept. Combine this with the recursive flag to walk through links on a domain pulling PDFs.

Similarly, reject unwanted content using --reject:

wget -r --reject "*.css,*.js" https://apidocs.chef.io/

Here CSS and JS files are omitted, allowing you to save docs and images from Chef‘s API for local viewing minus the styling.

Such selectivity paired with wget‘s breadth makes capturing relevant subsets of data a breeze!

Advanced Download Options

So far we have covered basic usage of wget for common scenarios like mirroring websites and resuming downloads. Now let‘s explore some of the more advanced functionality and tweaks available.

Authentication Options

Many downloads may require authentication these days before permitting access. Wget offers a few ways to handle this:

  • Basic Auth – Pass username & password via -auth-no-challenge
  • Client Certificates – Specify public and private keys through command options
  • Bearer Tokens – Manually inject authorization header with --header

For example:

wget --header="Authorization: Bearer fdslkjfds23232..." https://api.acme.com/endpoint

This grants access by directly providing the expected JWT without the formal login flow.

Custom User Agents

Websites can identify and block traffic from wget based on the default user agent values.

Spoof a more typical browser agent string using the --user-agent argument:

wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)..." https://www.amazon.com

This enables better mimicry providing sites the expected header values through automation scripts.

Rate Limiting Bandwidth

To avoid saturating connections and potentially crashing networks, use --limit-rate=100m to throttle transfer speeds in bytes per second.

For example:

wget --limit-rate=3M https://cdnjs.com/libraries/lodash.js/4.17.21

Here the download is restricted to ~3 Mbps to avoid overwhelming the server.

Apply this cautiously across large collections of file transfers to reduce impact.

Password Protecting Archives

When downloading sensitive files, you may want to encrypt them or set a password for access.

Use --restrict-file-names=unix to ZIP archive each file while pulling, requiring a password to extract:

wget --restrict-file-names=unix https://fileserver.com 

You can also pipe through desktop archiving tools like 7-Zip by calling out to cmd.exe. This keeps your downloads secure whether at-rest on the filesystem or in-transit through unencrypted networks.

Common wget Pitfalls

While extremely useful, wget does come with some nuances to be aware of – especially when running in PowerShell.

Non-Gzipped Responses

One issue developers commonly encounter is non-HTML content getting mangled when saved through wget.

PowerShell attempts to parse the document as gzipped text, corrupting BIN files, PDFs, images and more.

The fix is forcing basic handling of responses with -UseBasicParsing:

wget -UseBasicParsing https://filebin.net/om1meobgydw8m04c/test.pdf

This properly saves the raw file contents unmodified.

Certificate Errors

Secure HTTPS connectivity often breaks the download process with SSL verification issues:

Write-Error : Unable to connect to the remote server 

Use the -UseDefaultCredentials option to ignore these for internal domains along with corporate proxies:

wget -UseDefaultCredentials https://mycompany.sharepoint.com 

This properly authenticates using your default IE credentials to access resources.

Access Denied

Despite proper credentials, you may encounter access errors to restricted resources:

Invoke-WebRequest : Authorization failed from the remote server

Some systems block anything that looks like automated wget traffic. Workaround this by passing a vanilla browser user agent string:

wget --user-agent="Mozilla/5.0 (Windows NT 10.0; ..." https://docs.mydomain.com

This tricks endpoints into allowing your scripted session through essentially camouflaging it as a normal desktop browser request.

Tips and Tricks

Let‘s round out the guide with some helpful tips for mastering wget in your PowerShell developer toolkit:

Continue Broken Transfers

Resume failed partial downloads using -c:

wget -c https://fileserver.com/files/disk-image.vhd 

This continues an existing transfer vs. discarding past progress.

Follow Site Directory Structure

Mirror a remote path hierarchy locally adding -np:

wget -r -np https://assets.cms.gov/resources/framework

The nested folder and file layout gets retained as-is.

Overwrite Identical Files

Avoid cluttering disks and re-download files with bit-matching names using -nc:

wget -c -nc https://sourceforge.net

This causes duplicate named items across the domain to get overwritten atomically.

Pull Specific Sites

When recursing directories, only spider certain domains:

wget -D site1.com,site2.com https://server.com

This restricts mirroring to just resources living under site1.com and site2.com avoiding anything else linked.

Configure Retries

Handle flaky infrastructure defining retry limits via --waitretry:

wget --waitretry=3 https://volatile.ds 

Here failed requests gets attempted 3 times improving resilience.

Wrapping Up

That wraps up our deep dive into wielding the full power of the wget in Windows PowerShell. While mainly intended for Linux environments, Microsoft did developers a solid by enabling this utility directly within PowerShell with intuitive native compatibility.

We explored simple single file downloads, resuming broken transfers, recursively mirroring directories, throttling speeds, automating authentication and more. You are now equipped with expert techniques for scripting robust download workflows.

Wget skills unlock the potential for connecting disjointed infrastructure, capturing internet resources for offline use, transferring bulk data, and backing up critical production artifacts in a pinch. Add this tool your PowerShell arsenal to enable reliable downloads via code!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *