|Specify the location of a startup file
|Logs all URL rejections to logfile as comma separated values. include the reason of rejection, the URL and the
parent URL it was found in.
|Do not to obtain credentials from
|documents will be concatenated to |
file or STDOUT if
- is specified.
tries to 1.
Suppresses creating versions of duplicate files named |
Newer copies of file are not retrieved.
-nc may not be specified with
Handling a file downloaded more than once in the same directory:
--recursive, downloading the same file
in the same directory will result in the original copy of file
being preserved and the second copy being named
--recursive , but without
the last file download is retained.
--timestamping download only newer version of a file
-nc is specified, files with the suffixes
.htm will be loaded from the local disk and parsed as if they had been retrieved from the Web.
Before overwriting a file, back up an existing file by adding a .1 suffix rotated to .2, .3, and so on, up to |
|Turn on time-stamping.
Do not send If-Modified-Since header in --timestamping mode. Send preliminary HEAD request instead. only effects |
Don't set the local file's timestamp to the one on the server.|
By default, timestamps match the remote file.
Useful to set the local file's timestamp when it was actually downloaded;
This allows the use of --timestamping on subsequent invocations of wget.
|Continue a partially-downloaded file. Default.
Don't specify this option to retry downloading a file should the connection be lost midway through.
Only effects resumption of downloads started prior to this invocation of Wget, and whose local files are present.
A local file that's smaller than the server one will be considered part of an incomplete download and only
remaining bytes will be downloaded and appended.
The server must support continued downloading via the
-r, will interpret every file as an "incomplete download" .
A garbled file will result HTTP the a proxy that inserts a transfer interrupted string into the local file.
Start downloading at zero-based position OFFSET. expressed in bytes, kilobytes with the `k' suffix, or
megabytes with the `m' suffix, etc.
has higher precedence over |
draws an ASCII progress bar (aka thermometer display) indicating the status of retrieval.
If the output is not a TTY, the
dot bar will be used by default.
|dot|| dots per cluster ||dots in a line.||line
|Default ||1K || 10 ||50
|8K ||16 ||48 || 384K
|64K || 8 ||48 ||3M
.wgetrc is overridden from the command line, unless the output is not a TTY, the
dot progress will be favored over
To force the bar output, use
| Force wget to display the progress bar in any verbosity.
| Pages only checked not downloaded. Useful for checking bookmarks.
wget --spider --force-html -i bookmarks.html
| Expressed in bytes (default), |
Implemented by sleeping an appropriate amount of time after network reads.
Does not to work with very small files. Specifing bandwidth less than KBps may be ineffecive.
|between retrievals. minutes |
m suffix, hours use
h , or days
Specifying a large value is useful if the network
or the destination host is down. Wget can wait long enough
to reasonably expect the network error to be fixed before the retry.
|causes the time between requests to vary between 0 and 2 * wait seconds
specified using the |
--proxy=on|off, -Y onoff
|On by default if the appropriate environmental variable is defined.
|bytes , kilobytes |
k suffix, or megabytes (
Does not affect downloading a single file.
When retrieving either recursively, or from an input
file list. Example:
wget --quota=2m --input-file=retrive.lst .
| interface hostname or IP address.
|only wait between retries of failed downloads. |
Uses linear backoff, waiting 1 second after the first failure on a given file, then 2 seconds … up to the maximum number of
Therefore, a value of 10 will wait up to (1 + 2 + ... + 10) = 55 seconds per file.
On by default.
|read timeout. Default |
900 seconds (15 minutes!).
| Set the connect timeout for TCP connections
Set the read (and write) timeout of idle time: if,
no data is received for more than
seconds, reading fails and the download is restarted.
Does not affect the duration of the entire download.
Default 900 seconds.
| Directory Options
|Do not create a hierarchy of directories when retrieving recursively, files will be saved to
the current directory, without clobbering (if a name shows up more
than once, the filenames will get extensions .n).
|The opposite of |
-nd---create a hierarchy of directories, even if
one would not have been created otherwise. E.g.
http://fly.srk.fer.hr/robots.txt will save the downloaded file to
|Disable generation of host-prefixed directories. By default,
invoking Wget with |
-r http://fly.srk.fer.hr/ will create a structure of directories beginning with
fly.srk.fer.hr/ . This option
disables such behavior.
|Ignore number directory components. This is useful for getting a
fine-grained control over the directory where recursive retrieval will be saved.
For example, the directory at
ftp://ftp.xemacs.org/pub/xemacs/. If you retrieve it with
-r, it will be saved locally under
ftp.xemacs.org/pub/xemacs/. While the
-nH option can remove the
ftp.xemacs.org/ part, you get
Wget not see number remote directory components.
No options -> ftp.xemacs.org/pub/xemacs/
-nH -> pub/xemacs/
-nH --cut-dirs=1 -> xemacs/
-nH --cut-dirs=2 -> .
--cut-dirs=1 -> ftp.xemacs.org/xemacs/
To supress the directory structure, this option
is similar to a combination of
-P. However, unlike
--cut-dirs does not lose with subdirectories---for instance, with
--cut-dirs=1, a beta/ subdirectory will be placed to
xemacs/beta, as one would expect.
|The directory prefix is the directory where all other files and subdirectories will be saved to,
i.e. the top of the retrieval tree. |
The default is
. (the current directory).
|If a file of type |
text/html is downloaded and the URL does not end
with the regexp
.html will be appended to the local filename. Use when mirroring a remote site that uses
or CGIs. URL
will be saved as
WARNING: filenames changed in this way will be re-downloaded every
time you re-mirror a site.
To prevent this use
so that the original version of the file will be saved as
|before the actual contents, with an empty line as the separator.
Pragma:nocache to disable server-side cache. Used to retrieving out-of-date documents on proxy servers. default:on.
|Load cookies from file before the first HTTP retrieval.
file in the format originally used by Netscape's cookies.txt file.
Use this option when mirroring sites that
require that you be logged in to access their content. The login process typically works by the web server issuing
an HTTP cookie upon receiving and verifying your credentials. The
cookie is then resent by the browser when accessing that part of
the site, and so sets your identity.
Mirroring such a site requires Wget to send the same cookies your
browser sends when communicating with the site. This is achieved
--load-cookies---simply point Wget to the location of the cookies.txt file, and it will send the same cookies your browser would
send in the same situation. Different browsers keep textual cookie
files in different locations:
The cookies are in
Mozilla and Netscape 6.x:
Mozilla's cookie file is also named cookies.txt, located some where under
~/.mozilla, in the directory of your profile. The
full path usually ends up looking somewhat like
You can produce a cookie file Wget can use by using the File
menu, Import and Export, Export Cookies.
If you are using a different browser to create your cookies,
--load-cookies will only work if you can locate or produce a
cookie file in the Netscape format that Wget expects.
If you cannot use
--load-cookies, there might still be an alternative. If your browser supports a cookie manager, you can use
it to view the cookies used when accessing the site you're mirroring. Write down the name and value of the cookie, and manually
instruct Wget to send those cookies, bypassing the official
wget --cookies=off --header "Cookie: I<name>=I<value>"
|Cookies whose expiry time is not specified, or those that have already expired, are not saved.
|CGI programs send out incorrent "|
|passed to the HTTP servers.
Headers must contain a |
: preceded by one or more non-blank characters, and must not contain newlines.
Define more than one additional header by specifying
--header more than once.
wget --header='Accept-Charset: iso-8859-2' \
--header='Accept-Language: hr' \
Specification of an empty string as the header value will clear all previous user-defined headers.
|for authentication on a proxy server. Encode using the "basic" authentication scheme.
|According to the type of the challenge, Wget will encode them using
either the "basic" (insecure) or the "digest" authentication scheme.
Another way to specify username and password is in the URL itself.
Either method reveals the password to
ps. To prevent the passwords from being seen, store them in
.netrc, and make sure to protect those files with "chmod".
Referer: url' header in HTTP request.
|Identify as agent-string to the HTTP server.
User-Agent" header field.
|without filename creates HTML formatted directory listing |
index.html including complete
<a href …
|Don't remove |
.listing files generated by FTP retrievals containing the raw directory listings.
wget takes the directory listing and create an HTML page including complete
<a href… then
|Use the shell like special characters (wildcards), like |
to retrieve more than one file
on by default.
Quote the URL to protect it from being expanded by the shell.
|the client initiates the data connection.
|When retrieving FTP directories recursively and a symbolic
link is encountered, the linked-to file is not downloaded.
Instead, a matching symbolic link is created on the local filesystem. The pointed-to file will not be downloaded unless this recursive retrieval would have encountered it separately and downloaded.
--retr-symlinks is specified, symbolic links are traversed and the pointed-to files are retrieved. This
option does not cause Wget to traverse symlinks to directories and
recurse through them.
Retrieving a file (not a directory)
specified on the commandline, rather than because it was recursed
to, this option has no effect. Symbolic links are always traversed
in this case.
Recursive Retrieval Options
| Turn on recursive retrieving.
|default is 5.
|useful for pre-fetching popular pages through a proxy, e.g.: |
wget -r -nd --delete-after http://whatever.com/~popular/page/
-r option is to retrieve recursively, and
-nd to not create directories.
--delete-after deletes files on the local machine. It does not issue the DELE command to remote FTP sites.
--delete-after is specified,
ignored, so .orig files are not .
|Fix links for local viewing. affects visible hyperlinks, as well as any part of the document that links to
external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc. |
- links to files that have been downloaded will be changed to refer to the file they point to as a relative link.
Example: if the downloaded file
/foo/doc.html links to
/bar/img.gif, also downloaded, then the link in doc.html will
be modified to point to
- links to files that have not been downloaded by Wget will
be changed to include host name and absolute path.
Example: if the downloaded file
/foo/doc.html links to
/bar/img.gif it will be modified to point to
Performed at the end of all the downloads.
|When converting a file, back up the original version with |
Affects the behavior of
|Turn on options suitable for mirroring, i.e. recursion and time-stamping, sets unlimited recursion depth and
keeps FTP directory listings. |
--level inf -nr.
|Download files necessary to display the HTML page including
inlined images, sounds, and referenced stylesheets. |
--level can help.
doc1.html contains an
1.gif and an
<A> pointing to external document
2.html has image
2.gif and it links to
3.html with image
wget -recursive --level 2 http://site/1.html
doc1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
3.gif is not downloaded because the level (up to 2) away from
order to determine where to stop the recursion.
wget -r --level 2 --page-requisites http://site/1.html
all the above files and 3.html's requisite 3.gif will be downloaded. Similarly,
wget -r --level 1 --page-requisites http://site/1.html
will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded.
wget -r --level 0 --page-requisites http://site/1.html
Does not download just 1.html and 1.gif, because
-l 0 is equivalent to
-l inf, that is, infinite
recursion. To download a single HTML page,
specified on the commandline or in a
-i URL input file and its
wget -p http://site/1.html
Wget will behave as if
-r had been specified, but only that single page and its requisites will be downloaded.
Links from that page to external documents will not be followed.
To download a single page and all its requisites (even if they exist on separate websites),use:
wget --html-extension --span-hosts --convert-links --backup-converted --page-requisites http://site/document
An external document link is any URL in an <A>,
<AREA> or a <LINK> tag other than <LINK REL="stylesheet">.
Recursive Accept/Reject Options
|Comma-separated lists of file name suffixes or patterns to
accept or reject.
|Wget has an internal table of HTML |
tag / attribute pairs that it
considers when looking for linked documents during a recursive
To speciy a subset of tags to be considered, specify them in a comma separated list .
To specify tags to be ignored use
|Enable spanning across hosts when doing recursive retrieving.
|Set domains to be followed. list is a comma-separated list of domains. |
This does not turn on
|Specify the domains not to be followed..
| Specify a comma-separated list of directories to follow/exclude
when downloading. Elements of list may contain wildcards./exclude
| Do not ascend to the parent directory when retrieving recursively.
|Follow relative links only. Useful for retrieving a specific home
page without any distractions, not even those from the same hosts.
|Follow FTP links from HTML documents. Default: ignore all the FTP links.
|Go to background immediately after startup. If no output file is
specified via the |
-o, output is redirected to wget-log.
|Execute command after the commands in |
|Display the version of Wget. |
|Print a help message describing all of Wget's command-line options.|
|Downloads files covered in local Metalink file. Metalink version 3 and 4 are supported.
|Keeps downloaded Metalink's files with a bad hash. It appends .badhash to the name of Metalink's files which have a checksum
mismatch, except without overwriting existing files.
|Issues HTTP HEAD request instead of GET and extracts Metalink metadata from response headers. Then it switches to Metalink
download. If no valid Metalink metadata is found, it falls back to ordinary HTTP download. Enables Content-Type:
application/metalink4+xml files download/processing.
|Set the Metalink application/metalink4+xml metaurl ordinal NUMBER. From 1 to the total number of "application/metalink4+xml"
available. Specify 0 or inf to choose the first good one. Metaurls, such as those from a --metalink-over-http, may have
been sorted by priority key's value; keep this in mind to choose the right NUMBER.
|Set preferred location for Metalink resources. This has effect if multiple resources with same priority are available.
|Enable use of file system's extended attributes to save the original URL and the Referer HTTP header value if used.|
might contain private information like access tokens or credentials.
| Set the DNS lookup timeout
|overrides the route for DNS requests. IPv4 or IPv6 . needs to be built with libcares for this
|The given address(es) override the standard nameserver addresses,
IPv4 or IPv6 , comma-separated. needs to be built with libcares for this
| if compiled with debug