One of the frequently used utilities by sysadmin is wget. It can be very handy during web related troubleshooting.

What is wget command?

wget command is a popular Unix/Linux command-line utility for fetching the content from the web. It is free to use and provides a non-interactive way to download files from the web. The wget command supports HTTPS, HTTP, and FTP protocols out of the box. Moreover, you can also use HTTP proxies with it.

How it helps you to troubleshoot?

There are many ways.

As a sysadmin, most of the time, you’ll be working on a terminal, and when troubleshooting web application related issues, you may not want to check the entire page but just the connectivity. Or, you want to verify intranet websites. Or, you want to download a certain page to verify the content.

wget is non-interactive, which means that you can run it in the background even when you are logged off. There can be many instances where it is essential for you to disconnect from the system even when doing file retrieval from the web. In the background, the wget will run and finish their assigned job.

It can also be used to get the entire website on your local machines. It can follow links in XHTML and HTML pages to create a local version. To do so, it has to download the page recursively. This is very useful as you can use it to download important pages or sites for offline viewing.

Let’s see them in action. The syntax of the wget is as below.

wget [option] [URL]

Download a webpage

Let’s try to download a page. Ex: github.com

wget github.com

If connectivity is fine, then it will download the homepage and show the output as below.

[email protected]:~# wget github.com
URL transformed to HTTPS due to an HSTS policy
--2020-02-23 10:45:52--  https://github.com/
Resolving github.com (github.com)... 140.82.118.3
Connecting to github.com (github.com)|140.82.118.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’

index.html                                       [                                                                                         ] 131.96K  --.-KB/s    in 0.04s   

2020-02-23 10:45:52 (2.89 MB/s) - ‘index.html’ saved [135126]

[email protected]:~#

Download multiple files

Handy when you have to download multiple files at once. This can give you an idea about automating files download through some scripts.

Let’s try to download Python 3.8.1 and 3.5.1 files.

wget https://www.python.org/ftp/python/3.8.1/Python-3.8.1.tgz https://www.python.org/ftp/python/3.5.1/Python-3.5.1.tgz

So, as you can guess, the syntax is as below.

wget URL1 URL2 URL3

You just have to ensure giving space between URLs.

Limit download speed

It would be useful when you want to check how much time your file takes to download at different bandwidth.

Using the --limit-rate option, you can limit the download speed.

Here is the output of downloading the Nodejs file.

[email protected]:~# wget https://nodejs.org/dist/v12.16.1/node-v12.16.1-linux-x64.tar.xz
--2020-02-23 10:59:58--  https://nodejs.org/dist/v12.16.1/node-v12.16.1-linux-x64.tar.xz
Resolving nodejs.org (nodejs.org)... 104.20.23.46, 104.20.22.46, 2606:4700:10::6814:162e, ...
Connecting to nodejs.org (nodejs.org)|104.20.23.46|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14591852 (14M) [application/x-xz]
Saving to: ‘node-v12.16.1-linux-x64.tar.xz’

node-v12.16.1-linux-x64.tar.xz               100%[===========================================================================================>]  13.92M  --.-KB/s    in 0.05s   

2020-02-23 10:59:58 (272 MB/s) - ‘node-v12.16.1-linux-x64.tar.xz’ saved [14591852/14591852]

It took 0.05 seconds to download 13.92 MB files. Now, let’s try to limit the speed to 500K.

[email protected]:~# wget --limit-rate=500k https://nodejs.org/dist/v12.16.1/node-v12.16.1-linux-x64.tar.xz
--2020-02-23 11:00:18--  https://nodejs.org/dist/v12.16.1/node-v12.16.1-linux-x64.tar.xz
Resolving nodejs.org (nodejs.org)... 104.20.23.46, 104.20.22.46, 2606:4700:10::6814:162e, ...
Connecting to nodejs.org (nodejs.org)|104.20.23.46|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14591852 (14M) [application/x-xz]
Saving to: ‘node-v12.16.1-linux-x64.tar.xz.1’

node-v12.16.1-linux-x64.tar.xz.1             100%[===========================================================================================>]  13.92M   501KB/s    in 28s     

2020-02-23 11:00:46 (500 KB/s) - ‘node-v12.16.1-linux-x64.tar.xz.1’ saved [14591852/14591852]

Reducing the bandwidth took longer to download – 28 seconds. Imagine, your users are complaining about slow download, and you know their network bandwidth is low. You can quickly try --limit-rate to simulate the issue.

Download in the background

Downloading large files can take the time or the above example where you want to set the rate limit as well. This is expected, but what if you don’t want to stare at your terminal?

Well, you can use -b argument to start the wget in the background.

[email protected]:~# wget -b https://slack.com
Continuing in background, pid 25430.
Output will be written to ‘wget-log.1’.
[email protected]:~#

Ignore Certificate Error

This is handy when you need to check intranet web applications that don’t have the proper certificate. By default, wget will throw an error when a certificate is not valid.

[email protected]:~# wget https://expired.badssl.com/
--2020-02-23 11:24:59--  https://expired.badssl.com/
Resolving expired.badssl.com (expired.badssl.com)... 104.154.89.105
Connecting to expired.badssl.com (expired.badssl.com)|104.154.89.105|:443... connected.
ERROR: cannot verify expired.badssl.com's certificate, issued by ‘CN=COMODO RSA Domain Validation Secure Server CA,O=COMODO CA Limited,L=Salford,ST=Greater Manchester,C=GB’:
  Issued certificate has expired.
To connect to expired.badssl.com insecurely, use `--no-check-certificate'.

The above example is for the URL where cert is expired. As you can see it has suggested using --no-check-certificate which will ignore any cert validation.

[email protected]:~# wget https://untrusted-root.badssl.com/ --no-check-certificate
--2020-02-23 11:33:45--  https://untrusted-root.badssl.com/
Resolving untrusted-root.badssl.com (untrusted-root.badssl.com)... 104.154.89.105
Connecting to untrusted-root.badssl.com (untrusted-root.badssl.com)|104.154.89.105|:443... connected.
WARNING: cannot verify untrusted-root.badssl.com's certificate, issued by ‘CN=BadSSL Untrusted Root Certificate Authority,O=BadSSL,L=San Francisco,ST=California,C=US’:
  Self-signed certificate encountered.
HTTP request sent, awaiting response... 200 OK
Length: 600 [text/html]
Saving to: ‘index.html.6’

index.html.6                                 100%[===========================================================================================>]     600  --.-KB/s    in 0s      

2020-02-23 11:33:45 (122 MB/s) - ‘index.html.6’ saved [600/600]

[email protected]:~#

Cool, isn’t it?

HTTP Response Header

See the HTTP response header of a given site on the terminal.

Using -S will print the header, as you can see below for Coursera.

[email protected]:~# wget https://www.coursera.org -S
--2020-02-23 11:47:01--  https://www.coursera.org/
Resolving www.coursera.org (www.coursera.org)... 13.224.241.48, 13.224.241.124, 13.224.241.82, ...
Connecting to www.coursera.org (www.coursera.org)|13.224.241.48|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Content-Type: text/html
  Content-Length: 511551
  Connection: keep-alive
  Cache-Control: private, no-cache, no-store, must-revalidate, max-age=0
  Date: Sun, 23 Feb 2020 11:47:01 GMT
  etag: W/"7156d-WcZHnHFl4b4aDOL4ZSrXP0iBX3o"
  Server: envoy
  Set-Cookie: CSRF3-Token=1583322421.s1b4QL6OXSUGHnRI; Max-Age=864000; Expires=Wed, 04 Mar 2020 11:47:02 GMT; Path=/; Domain=.coursera.org
  Set-Cookie: __204u=9205355775-1582458421174; Max-Age=31536000; Expires=Mon, 22 Feb 2021 11:47:02 GMT; Path=/; Domain=.coursera.org
  Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
  X-Content-Type-Options: nosniff
  x-coursera-render-mode: html
  x-coursera-render-version: v2
  X-Coursera-Request-Id: NCnPPlYyEeqfcxIHPk5Gqw
  X-Coursera-Trace-Id-Hex: a5ef7028d77ae8f8
  x-envoy-upstream-service-time: 1090
  X-Frame-Options: SAMEORIGIN
  x-powered-by: Express
  X-XSS-Protection: 1; mode=block
  X-Cache: Miss from cloudfront
  Via: 1.1 884d101a3faeefd4fb32a5d2a8a076b7.cloudfront.net (CloudFront)
  X-Amz-Cf-Pop: LHR62-C3
  X-Amz-Cf-Id: vqvX6ZUQgtZAde62t7qjafIAqHXQ8BLAv8UhkPHwyTMpvH617yeIbQ==
Length: 511551 (500K) [text/html]

Manipulate the User-Agent

There might be a situation where you want to connect a site using a custom user-agent. Or specific browser’s user-agent. This is doable by specifying --user-agent. The below example is for the user agent as MyCustomUserAgent.

[email protected]:~# wget https://gf.dev --user-agent="MyCustomUserAgent"

Host Header

When an application is still in development, you may not have a proper URL to test it. Or, you may want to test an individual HTTP instance using IP, but you need to supply the host header for application to work properly. In this situation, --header would be useful.

Let’s take an example of testing http://10.10.10.1 with host header as application.com

wget --header="Host: application.com" http://10.10.10.1

Not just host, but you can inject any header you like.

Connect using Proxy

If you are working on a DMZ environment, you may not have access to Internet sites. But you can take advantage of proxy to connect.

wget -e use_proxy=yes http_proxy=$PROXYHOST:PORT http://externalsite.com

Don’t forget to update $PROXYHOST:PORT variable with the actual ones.

Connect using a specific TLS protocol

Usually, I would recommend using OpenSSL to test the TLS protocol. But, you can use wget too.

wget --secure-protocol=TLSv1_2 https://example.com

The above will force wget to connect over TLS 1.2.

Conclusion

Knowing the necessary command can help you at work. I hope the above gives you an idea of what you can do with wget.