command line to retrieve files via FTP, HTTP and HTTPS

wget [option][URL]

Non-interactive download of files using HTTP, HTTPS, and FTP protocols,

Works in the background, allows running from cron. (consider --timeout=2 --tries=2 since the defaults are large.)

Can follow links in HTML pages and create local versions of remote web sites, fully recreating the directory structure of the original site, referred to as recursive downloading.

Designed for robustness over slow or unstable network connections; if a download fails due to a network problem, retrys until the whole file has been retrieved.
If the server supports regetting, it will instruct the server to continue the download from where it left off.




Logging and Input File Options
Turn off Wget's output.
Turn on verbose output, with all the available data. default: verbose.
Turn off verbose error messages and basic information get output.
Output headers sent by HTTP servers and responses sent by FTP servers.

-o logfile
Log messages to logfile. Default STDERR
-a logfile
-i file

Read URLs from file
URLs on the command line are retrieved first.
The file need not be an HTML document .


When input is read from a file, treat as HTML, enables relative links by adding <base href="url"> to HTML, or using --base

with --force-html prepends URL to relative links in the file specified by --input-file.
--[no]config=file Specify the location of a startup file
--rejected-log=logfile Logs all URL rejections to logfile as comma separated values. include the reason of rejection, the URL and the parent URL it was found in.
--no-netrc Do not to obtain credentials from .netrc
Download Options
-O file
documents will be concatenated to file or STDOUT if - is specified.
Sets tries to 1.

Suppresses creating versions of duplicate files named ffff.n
Newer copies of file are not retrieved.
-nc may not be specified with --timestamping.

Handling a file downloaded more than once in the same directory:
Without --timestamping, --noclobber, or --recursive, downloading the same file in the same directory will result in the original copy of file being preserved and the second copy being named ffff.n.

With --recursive , but without --timestamping or --noclobber, the last file download is retained.

With --timestamping download only newer version of a file

When -nc is specified, files with the suffixes .html or .htm will be loaded from the local disk and parsed as if they had been retrieved from the Web.

--backups=n Before overwriting a file, back up an existing file by adding a .1 suffix rotated to .2, .3, and so on, up to n
Turn on time-stamping.
--no-if-modified-since Do not send If-Modified-Since header in --timestamping mode. Send preliminary HEAD request instead. only effects --timestamping mode.
--no-use-server-timestamps Don't set the local file's timestamp to the one on the server.
By default, timestamps match the remote file.
Useful to set the local file's timestamp when it was actually downloaded;
This allows the use of --timestamping on subsequent invocations of wget.

Continue a partially-downloaded file. Default.

Don't specify this option to retry downloading a file should the connection be lost midway through.
Only effects resumption of downloads started prior to this invocation of Wget, and whose local files are present.

A local file that's smaller than the server one will be considered part of an incomplete download and only remaining bytes will be downloaded and appended.

The server must support continued downloading via the Range header.

Using -c with -r, will interpret every file as an "incomplete download" .

A garbled file will result HTTP the a proxy that inserts a transfer interrupted string into the local file.

--start-pos= offset Start downloading at zero-based position OFFSET. expressed in bytes, kilobytes with the `k' suffix, or megabytes with the `m' suffix, etc. has higher precedence over --continue.
| bar
default bar .
draws an ASCII progress bar (aka thermometer display) indicating the status of retrieval.
If the output is not a TTY, the dot bar will be used by default.
styledot dots per cluster dots in a line.line
Default 1K 10 50
binary 8K 16 48 384K
mega 64K 8 48 3M

progress in .wgetrc is overridden from the command line, unless the output is not a TTY, the dot progress will be favored over bar.
To force the bar output, use --progress=bar:force.
--show-progress Force wget to display the progress bar in any verbosity.
Pages only checked not downloaded. Useful for checking bookmarks.
wget --spider --force-html -i bookmarks.html

Expressed in bytes (default), kilobytes or megabytes.
Example: --limit-rate=20k
Implemented by sleeping an appropriate amount of time after network reads. Does not to work with very small files. Specifing bandwidth less than KBps may be ineffecive.

-w seconds

between retrievals. minutes m suffix, hours use h , or days d.
Specifying a large value is useful if the network or the destination host is down. Wget can wait long enough to reasonably expect the network error to be fixed before the retry.

causes the time between requests to vary between 0 and 2 * wait seconds specified using the --wait
--proxy=on|off, -Y onoff On by default if the appropriate environmental variable is defined.
-Q quota

bytes , kilobytes k suffix, or megabytes ( m).
Does not affect downloading a single file.
When retrieving either recursively, or from an input file list. Example: wget --quota=2m --input-file=retrive.lst .
interface hostname or IP address.
-t number

only wait between retries of failed downloads.
Uses linear backoff, waiting 1 second after the first failure on a given file, then 2 seconds … up to the maximum number of attempts . Therefore, a value of 10 will wait up to (1 + 2 + ... + 10) = 55 seconds per file.
On by default.
-T seconds

read timeout. Default 900 seconds (15 minutes!).
--connect-timeout=seconds Set the connect timeout for TCP connections
--read-timeout=seconds Set the read (and write) timeout of idle time: if, no data is received for more than seconds, reading fails and the download is restarted. Does not affect the duration of the entire download. Default 900 seconds.
Directory Options

Do not create a hierarchy of directories when retrieving recursively, files will be saved to the current directory, without clobbering (if a name shows up more than once, the filenames will get extensions .n).


The opposite of -nd---create a hierarchy of directories, even if one would not have been created otherwise. E.g. wget -x http://fly.srk.fer.hr/robots.txt will save the downloaded file to fly.srk.fer.hr/robots.txt.


Disable generation of host-prefixed directories. By default, invoking Wget with -r http://fly.srk.fer.hr/ will create a structure of directories beginning with fly.srk.fer.hr/ . This option disables such behavior.

Ignore number directory components. This is useful for getting a fine-grained control over the directory where recursive retrieval will be saved.

For example, the directory at ftp://ftp.xemacs.org/pub/xemacs/. If you retrieve it with -r, it will be saved locally under ftp.xemacs.org/pub/xemacs/. While the -nH option can remove the ftp.xemacs.org/ part, you get pub/xemacs. Using --cut-dirs makes Wget not see number remote directory components.

                   No options        -> ftp.xemacs.org/pub/xemacs/
                   -nH               -> pub/xemacs/
                   -nH --cut-dirs=1  -> xemacs/
                   -nH --cut-dirs=2  -> .
                   --cut-dirs=1      -> ftp.xemacs.org/xemacs/

To supress the directory structure, this option is similar to a combination of -nd and -P. However, unlike -nd, --cut-dirs does not lose with subdirectories---for instance, with -nH --cut-dirs=1, a beta/ subdirectory will be placed to xemacs/beta, as one would expect.

-P prefix

The directory prefix is the directory where all other files and subdirectories will be saved to, i.e. the top of the retrieval tree.
The default is . (the current directory).

HTTP Options

If a file of type text/html is downloaded and the URL does not end with the regexp
\.[Hh][Tt][Mm][Ll]?, .html will be appended to the local filename. Use when mirroring a remote site that uses .asp or CGIs. URL http://site.com/article.cgi?25 will be saved as article.cgi?25.html.
WARNING: filenames changed in this way will be re-downloaded every time you re-mirror a site.
To prevent this use --convert-links and --backup-converted so that the original version of the file will be saved as X.orig.

before the actual contents, with an empty line as the separator.

-C on|off

If off Wget sends Pragma:nocache to disable server-side cache. Used to retrieving out-of-date documents on proxy servers. default:on.

--load-cookies file
Load cookies from file before the first HTTP retrieval. file in the format originally used by Netscape's cookies.txt file.
Use this option when mirroring sites that require that you be logged in to access their content. The login process typically works by the web server issuing an HTTP cookie upon receiving and verifying your credentials. The cookie is then resent by the browser when accessing that part of the site, and so sets your identity.

Mirroring such a site requires Wget to send the same cookies your browser sends when communicating with the site. This is achieved by --load-cookies---simply point Wget to the location of the cookies.txt file, and it will send the same cookies your browser would send in the same situation. Different browsers keep textual cookie files in different locations:
Netscape 4.x: The cookies are in ~/.netscape/cookies.txt.
Mozilla and Netscape 6.x: Mozilla's cookie file is also named cookies.txt, located some where under ~/.mozilla, in the directory of your profile. The full path usually ends up looking somewhat like ~/.mozilla/default/some-weird-string/cookies.txt.
Internet Explorer: You can produce a cookie file Wget can use by using the File menu, Import and Export, Export Cookies.
Other browsers: If you are using a different browser to create your cookies, --load-cookies will only work if you can locate or produce a cookie file in the Netscape format that Wget expects.
If you cannot use --load-cookies, there might still be an alternative. If your browser supports a cookie manager, you can use it to view the cookies used when accessing the site you're mirroring. Write down the name and value of the cookie, and manually instruct Wget to send those cookies, bypassing the official cookie support:

wget --cookies=off --header "Cookie: I<name>=I<value>"

--save-cookies file
Cookies whose expiry time is not specified, or those that have already expired, are not saved.

--ignore-length CGI programs send out incorrent "Content-Length" headers.

passed to the HTTP servers. Headers must contain a : preceded by one or more non-blank characters, and must not contain newlines.
Define more than one additional header by specifying --header more than once.

       wget --header='Accept-Charset: iso-8859-2' \
                        --header='Accept-Language: hr'        \
Specification of an empty string as the header value will clear all previous user-defined headers.


for authentication on a proxy server. Encode using the "basic" authentication scheme.

According to the type of the challenge, Wget will encode them using either the "basic" (insecure) or the "digest" authentication scheme.

Another way to specify username and password is in the URL itself. Either method reveals the password to ps. To prevent the passwords from being seen, store them in .wgetrc or .netrc, and make sure to protect those files with "chmod".

Include 'Referer: url' header in HTTP request.
-U agent-string

Identify as agent-string to the HTTP server.
"User-Agent" header field. Default Wget/version.

FTP Options


without filename creates HTML formatted directory listing index.html including complete <a href …

Don't remove .listing files generated by FTP retrievals containing the raw directory listings.
wget takes the directory listing and create an HTML page including complete <a href… then deletes .listing
-g on|off
Use the shell like special characters (wildcards), like *, ?, [ and ] to retrieve more than one file

wget "ftp://gnjilux.srk.fer.com/mail/*.msg"
on by default.
Quote the URL to protect it from being expanded by the shell.
--passive-ftp the client initiates the data connection.

When retrieving FTP directories recursively and a symbolic link is encountered, the linked-to file is not downloaded. Instead, a matching symbolic link is created on the local filesystem. The pointed-to file will not be downloaded unless this recursive retrieval would have encountered it separately and downloaded.
When --retr-symlinks is specified, symbolic links are traversed and the pointed-to files are retrieved. This option does not cause Wget to traverse symlinks to directories and recurse through them.
Retrieving a file (not a directory) specified on the commandline, rather than because it was recursed to, this option has no effect. Symbolic links are always traversed in this case.

Recursive Retrieval Options

Turn on recursive retrieving.
-l depth
default is 5.
useful for pre-fetching popular pages through a proxy, e.g.:

wget -r -nd --delete-after http://whatever.com/~popular/page/

The -r option is to retrieve recursively, and -nd to not create directories. --delete-after deletes files on the local machine. It does not issue the DELE command to remote FTP sites.
when --delete-after is specified, --convert-links is ignored, so .orig files are not .


Fix links for local viewing. affects visible hyperlinks, as well as any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.
  1. links to files that have been downloaded will be changed to refer to the file they point to as a relative link.
    Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif.
  2. links to files that have not been downloaded by Wget will be changed to include host name and absolute path.
    Example: if the downloaded file /foo/doc.html links to /bar/img.gif it will be modified to point to http://hostname/bar/img.gif.

Performed at the end of all the downloads.


When converting a file, back up the original version with .orig .
Affects the behavior of --timestamping.

Turn on options suitable for mirroring, i.e. recursion and time-stamping, sets unlimited recursion depth and keeps FTP directory listings.
Equivalent to -r --timestamping --level inf -nr.

Download files necessary to display the HTML page including inlined images, sounds, and referenced stylesheets.
Use --r with --level can help.

For example:
doc1.html contains an <IMG> referencing 1.gif and an <A> pointing to external document 2.html.
2.html has image 2.gif and it links to 3.html with image 3.gif.

wget -recursive --level 2 http://site/1.html

doc1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
but, 3.gif is not downloaded because the level (up to 2) away from 1.html in order to determine where to stop the recursion.

wget -r --level 2 --page-requisites http://site/1.html

all the above files and 3.html's requisite 3.gif will be downloaded. Similarly,

wget -r --level 1 --page-requisites http://site/1.html

will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded.

wget -r --level 0 --page-requisites http://site/1.html

Does not download just 1.html and 1.gif, because -l 0 is equivalent to -l inf, that is, infinite recursion. To download a single HTML page, specified on the commandline or in a -i URL input file and its requisites, omit -r and -l:

wget -p http://site/1.html

Wget will behave as if -r had been specified, but only that single page and its requisites will be downloaded.
Links from that page to external documents will not be followed.

To download a single page and all its requisites (even if they exist on separate websites),use:

wget --html-extension --span-hosts --convert-links --backup-converted --page-requisites http://site/document

An external document link is any URL in an <A>, <AREA> or a <LINK> tag other than <LINK REL="stylesheet">.

Recursive Accept/Reject Options

-A acclist
--accept acclist
-R rejlist
--reject rejlist

Comma-separated lists of file name suffixes or patterns to accept or reject.

-G list

Wget has an internal table of HTML tag / attribute pairs that it considers when looking for linked documents during a recursive retrieval.
To speciy a subset of tags to be considered, specify them in a comma separated list .
To specify tags to be ignored use -G.

Enable spanning across hosts when doing recursive retrieving.

-D list

Set domains to be followed. list is a comma-separated list of domains.
This does not turn on -H.

--exclude-domains list Specify the domains not to be followed..
-I list
-X list

Specify a comma-separated list of directories to follow/exclude when downloading. Elements of list may contain wildcards./exclude
Do not ascend to the parent directory when retrieving recursively.

Follow relative links only. Useful for retrieving a specific home page without any distractions, not even those from the same hosts.
Follow FTP links from HTML documents. Default: ignore all the FTP links.

Go to background immediately after startup. If no output file is specified via the -o, output is redirected to wget-log.
-e command
--execute command

Execute command after the commands in .wgetrc.
Display the version of Wget.
Print a help message describing all of Wget's command-line options.
--input-metalink=file Downloads files covered in local Metalink file. Metalink version 3 and 4 are supported.
--keep-badhash Keeps downloaded Metalink's files with a bad hash. It appends .badhash to the name of Metalink's files which have a checksum mismatch, except without overwriting existing files.
--metalink-over-http Issues HTTP HEAD request instead of GET and extracts Metalink metadata from response headers. Then it switches to Metalink download. If no valid Metalink metadata is found, it falls back to ordinary HTTP download. Enables Content-Type: application/metalink4+xml files download/processing.
--metalink-index=number Set the Metalink application/metalink4+xml metaurl ordinal NUMBER. From 1 to the total number of "application/metalink4+xml" available. Specify 0 or inf to choose the first good one. Metaurls, such as those from a --metalink-over-http, may have been sorted by priority key's value; keep this in mind to choose the right NUMBER.
--preferred-location Set preferred location for Metalink resources. This has effect if multiple resources with same priority are available.
--xattr Enable use of file system's extended attributes to save the original URL and the Referer HTTP header value if used.
might contain private information like access tokens or credentials.
--dns-timeout=seconds Set the DNS lookup timeout
--bind-dns-address=address overrides the route for DNS requests. IPv4 or IPv6 . needs to be built with libcares for this
--dns-servers=addresses The given address(es) override the standard nameserver addresses, IPv4 or IPv6 , comma-separated. needs to be built with libcares for this
if compiled with debug

Respects the Robot Exclusion file (/robots.txt).

Can convert the links in downloaded HTML files to the local files for offline viewing.


  1. Download a URL:
    wget http://fly.srk.fer.com/

  2. Work in the background (notice trailing &) and write progress to a log
    wget --output-file=wgetjpg.log http://fly.srk.fer.hr/jpg/flyweb.jpg &
  3. Retrieve a directory, parse it and convert it to HTML. :
    wget ftp://prep.ai.mit.edu/pub/gnu/ links index.html
  4. Use a file containng URLs to download :
    wget --input-file=urlfile

  5. Create a mirror, with the same directory structure the original , logging activities
    default --level=5 deep
    wget --recursive http://www.gnu.org/ --output-file=gnuwget.log
  6. Convert links in the HTML files to point to local files, to enable viewing the documents off-line:
    wget --convert-links --recursive http://www.gnu.org/docs

  7. Retrieve a page, including all the elements needed for it to be displayed, such as inline images and external style sheets. Also the page refers to the downloaded links.
    Files will be stored in the www.server.com/ directory.
    wget --page-requisites --convert-links http://www.server.com/dir/page.html

  8. save all files under a download/ subdirectory of the current directory.
    wget --page-requisites --convert-links --no-host-directories --no-directories --directory-prefix=download http://www.server.com/dir/page.html

  9. Include server headers:
    wget --server-response http://www.lycos.com/

  10. Save the server headers with the file, for post-processing.
    wget -s http://www.lycos.com/ more index.html

  11. Retrieve the first two levels of files to /tmp.
    wget --recursive --level=2 --directory-prefix=/tmp ftp://wuarchive.wustl.edu/

  12. Failed to download the GIFs from a directory because HTTP retrieval does not support globbing. :
    wget --accept .gif http://www.server.com/dir/

    -A "*.gif"

  13. Resume and interrupted get and do not clobber files already present.
    wget --no-clobber --recursive http://www.gnu.org/

  14. Include username and password , use the appropriate URL syntax.
    wget ftp://myuname:mypassword@unix.server.com/docs/expenses.xls

  15. output documents to standard output instead of to filenames:
    wget --output-document=- http://www.srce.com/

  16. Combine two retrievals and use pipeles to retrieve the documents from remote hotlists:
    wget --output-document=- http://cool.list.com/hotfiles.txt | wget --force-html --input-file=

  17. keep a mirror of a page (or FTP subdirectories), use --mirror (-m), which is the shorthand for -r -l inf --timestamping. Use crontab file to recheck a site each Sunday logging the progress:
    0 0 * * 0 wget --mirror http://www.gnu.org/docs --output-file=/home/me/weeklog

  18. links to be converted for local viewing. link conversion doesn't play well with timestamping, also have Wget back up original HTML files before the conversion. :
    wget --mirror --convert-links --backup-converted \ http://www.gnu.org/ -o /home/me/weeklog

  19. local viewing doesn't work when HTML files are saved under extensions other than .html, rename all the files served with content-type text/html to name.html.
    wget --mirror --convert-links --backup-converted \ --html-extension -o /home/me/weeklog \ http://www.gnu.org/

    Or, with less typing:

    wget -m -k -K -E http://www.gnu.org/ -o /home/me/weeklog


/usr/local/etc/wgetrc Default location of the global startup file.
.wgetrc User startup file.

This document was taken from : GNU Wget 1.8.2 2003-01-25 WGET(1)
and reworked for terserness and HTML formatting by Dennis German


 GNU Wget 1.8.2, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...

Mandatory arguments to long options are mandatory for short options too.

  -V,  --version           display the version of Wget and exit.
  -h,  --help              print this help.
  -b,  --background        go to background after startup.
  -e,  --execute=COMMAND   execute a `.wgetrc'-style command.

Logging and input file:
  -o,  --output-file=FILE     log messages to FILE.
  -a,  --append-output=FILE   append messages to FILE.
  -d,  --debug                print debug output.
  -q,  --quiet                quiet (no output).
  -v,  --verbose              be verbose (this is the default).
  -nv, --non-verbose          turn off verboseness, without being quiet.
  -i,  --input-file=FILE      download URLs found in FILE.
  -F,  --force-html           treat input file as HTML.
  -B,  --base=URL             prepends URL to relative links in -F -i file.
       --sslcertfile=FILE     optional client certificate.
       --sslcertkey=KEYFILE   optional keyfile for this certificate.
       --egd-file=FILE        file name of the EGD socket.

       --bind-address=ADDRESS   bind to ADDRESS (hostname or IP) on local host.
  -t,  --tries=NUMBER           set number of retries to NUMBER (0 unlimits).
  -O   --output-document=FILE   write documents to FILE.
  -nc, --no-clobber             don't clobber existing files or use .# suffixes.
  -c,  --continue               resume getting a partially-downloaded file.
       --progress=TYPE          select progress gauge type.
  -N,  --timestamping           don't re-retrieve files unless newer than local.
  -S,  --server-response        print server response.
       --spider                 don't download anything.
  -T,  --timeout=SECONDS        set the read timeout to SECONDS.
  -w,  --wait=SECONDS           wait SECONDS between retrievals.
       --waitretry=SECONDS      wait 1...SECONDS between retries of a retrieval.
       --random-wait            wait from 0...2*WAIT secs between retrievals.
  -Y,  --proxy=on/off           turn proxy on or off.
  -Q,  --quota=NUMBER           set retrieval quota to NUMBER.
       --limit-rate=RATE        limit download rate to RATE.

  -nd  --no-directories            don't create directories.
  -x,  --force-directories         force creation of directories.
  -nH, --no-host-directories       don't create host directories.
  -P,  --directory-prefix=PREFIX   save files to PREFIX/...
       --cut-dirs=NUMBER           ignore NUMBER remote directory components.

HTTP options:
       --http-user=USER      set http user to USER.
       --http-passwd=PASS    set http password to PASS.
  -C,  --cache=on/off        (dis)allow server-cached data (normally allowed).
  -E,  --html-extension      save all text/html documents with .html extension.
       --ignore-length       ignore `Content-Length' header field.
       --header=STRING       insert STRING among the headers.
       --proxy-user=USER     set USER as proxy username.
       --proxy-passwd=PASS   set PASS as proxy password.
       --referer=URL         include `Referer: URL' header in HTTP request.
  -s,  --save-headers        save the HTTP headers to file.
  -U,  --user-agent=AGENT    identify as AGENT instead of Wget/VERSION.
       --no-http-keep-alive  disable HTTP keep-alive (persistent connections).
       --cookies=off         don't use cookies.
       --load-cookies=FILE   load cookies from FILE before session.
       --save-cookies=FILE   save cookies to FILE after session.

FTP options:
  -nr, --dont-remove-listing   don't remove `.listing' files.
  -g,  --glob=on/off           turn file name globbing on or off.
       --passive-ftp           use the "passive" transfer mode.
       --retr-symlinks         when recursing, get linked-to files (not dirs).

Recursive retrieval:
  -r,  --recursive          recursive web-suck -- use with care!
  -l,  --level=NUMBER       maximum recursion depth (inf or 0 for infinite).
       --delete-after       delete files locally after downloading them.
  -k,  --convert-links      convert non-relative links to relative.
  -K,  --backup-converted   before converting file X, back up as X.orig.
  -m,  --mirror             shortcut option equivalent to -r -N -l inf -nr.
  -p,  --page-requisites    get all images, etc. needed to display HTML page.

Recursive accept/reject:
  -A,  --accept=LIST                comma-separated list of accepted extensions.
  -R,  --reject=LIST                comma-separated list of rejected extensions.
  -D,  --domains=LIST               comma-separated list of accepted domains.
       --exclude-domains=LIST       comma-separated list of rejected domains.
       --follow-ftp                 follow FTP links from HTML documents.
       --follow-tags=LIST           comma-separated list of followed HTML tags.
  -G,  --ignore-tags=LIST           comma-separated list of ignored HTML tags.
  -H,  --span-hosts                 go to foreign hosts when recursive.
  -L,  --relative                   follow relative links only.
  -I,  --include-directories=LIST   list of allowed directories.
  -X,  --exclude-directories=LIST   list of excluded directories.
--no-parent don't ascend to the parent directory. Mail bug reports and suggestions to <bug-wget@gnu.org>.