--[no]config=file Specify the location of a startup file
| --rejected-log=logfile
Logs all URL rejections to logfile as comma separated values. include the reason of rejection, the URL and the
parent URL it was found in.
| --no-netrc Do not to obtain credentials from .netrc
|
Download Options
|
---|
-O file --output-document=file
| documents will be concatenated to file or STDOUT if - is specified.
Sets tries to 1.
| -nc --no-clobber |
Suppresses creating versions of duplicate files named ffff.n
Newer copies of file are not retrieved.
-nc may not be specified with --timestamping .
Handling a file downloaded more than once in the same directory:
Without --timestamping , --noclobber , or --recursive , downloading the same file
in the same directory will result in the original copy of file
being preserved and the second copy being named ffff.n .
With --recursive , but without --timestamping or --noclobber ,
the last file download is retained.
With --timestamping download only newer version of a file
When -nc is specified, files with the suffixes .html or
.htm will be loaded from the local disk and parsed as if they had been retrieved from the Web.
| --backups=n
Before overwriting a file, back up an existing file by adding a .1 suffix rotated to .2, .3, and so on, up to n
| -N --timestamping | Turn on time-stamping.
| --no-if-modified-since
Do not send If-Modified-Since header in --timestamping mode. Send preliminary HEAD request instead. only effects --timestamping mode.
| --no-use-server-timestamps
Don't set the local file's timestamp to the one on the server.
By default, timestamps match the remote file.
Useful to set the local file's timestamp when it was actually downloaded;
This allows the use of --timestamping on subsequent invocations of wget.
| -c --continue | Continue a partially-downloaded file. Default.
Don't specify this option to retry downloading a file should the connection be lost midway through.
Only effects resumption of downloads started prior to this invocation of Wget, and whose local files are present.
A local file that's smaller than the server one will be considered part of an incomplete download and only
remaining bytes will be downloaded and appended.
The server must support continued downloading via the Range header.
Using -c with -r , will interpret every file as an "incomplete download" .
A garbled file will result HTTP the a proxy that inserts a transfer interrupted string into the local file.
| --start-pos= offset
Start downloading at zero-based position OFFSET. expressed in bytes, kilobytes with the `k' suffix, or
megabytes with the `m' suffix, etc.
has higher precedence over --continue.
| --progress=
dot[:style] | bar
| default bar .
draws an ASCII progress bar (aka thermometer display) indicating the status of retrieval.
If the output is not a TTY, the dot bar will be used by default.
style | dot | dots per cluster | dots in a line. | line
| Default | 1K | 10 | 50
| binary | 8K | 16 | 48 | 384K
| mega | 64K | 8 | 48 | 3M
|
progress in .wgetrc is overridden from the command line, unless the output is not a TTY, the
dot progress will be favored over bar .
To force the bar output, use --progress=bar:force .
| --show-progress Force wget to display the progress bar in any verbosity.
| --spider | Pages only checked not downloaded. Useful for checking bookmarks.
wget --spider --force-html -i bookmarks.html
| --limit-rate=Bps[k|m]
| Expressed in bytes (default), k ilobytes or m egabytes.
Example: --limit-rate=20k
Implemented by sleeping an appropriate amount of time after network reads.
Does not to work with very small files. Specifing bandwidth less than KBps may be ineffecive.
| -w seconds --wait=seconds
| between retrievals. minutes m suffix, hours use h , or days d .
Specifying a large value is useful if the network
or the destination host is down. Wget can wait long enough
to reasonably expect the network error to be fixed before the retry.
| --random-wait
| causes the time between requests to vary between 0 and 2 * wait seconds
specified using the --wait
| --proxy=on|off, -Y onoff
| On by default if the appropriate environmental variable is defined.
| -Q quota --quota=quota
| bytes , kilobytes k suffix, or megabytes ( m ).
Does not affect downloading a single file.
When retrieving either recursively, or from an input
file list. Example: wget --quota=2m --input-file=retrive.lst .
| --bind-address=address | interface hostname or IP address.
| -t number --tries=number |
| --waitretry=attempts
| only wait between retries of failed downloads.
Uses linear backoff, waiting 1 second after the first failure on a given file, then 2 seconds … up to the maximum number of attempts .
Therefore, a value of 10 will wait up to (1 + 2 + ... + 10) = 55 seconds per file.
On by default.
| -T seconds --timeout=seconds
| read timeout. Default 900 seconds (15 minutes!).
| --connect-timeout=seconds Set the connect timeout for TCP connections
| --read-timeout=seconds
Set the read (and write) timeout of idle time: if,
no data is received for more than seconds , reading fails and the download is restarted.
Does not affect the duration of the entire download.
Default 900 seconds.
|
Directory Options
|
---|
-nd --no-directories
| Do not create a hierarchy of directories when retrieving recursively, files will be saved to
the current directory, without clobbering (if a name shows up more
than once, the filenames will get extensions .n).
| -x --force-directories
| The opposite of -nd---create a hierarchy of directories, even if
one would not have been created otherwise. E.g. wget -x
http://fly.srk.fer.hr/robots.txt will save the downloaded file to
fly.srk.fer.hr/robots.txt .
| -nH --no-host-directories
| Disable generation of host-prefixed directories. By default,
invoking Wget with -r http://fly.srk.fer.hr/ will create a structure of directories beginning with fly.srk.fer.hr/ . This option
disables such behavior.
| --cut-dirs=number
| Ignore number directory components. This is useful for getting a
fine-grained control over the directory where recursive retrieval will be saved.
For example, the directory at
ftp://ftp.xemacs.org/pub/xemacs/ . If you retrieve it with -r , it will be saved locally under ftp.xemacs.org/pub/xemacs/ . While the
-nH option can remove the ftp.xemacs.org/ part, you get pub/xemacs . Using --cut-dirs makes
Wget not see number remote directory components.
No options -> ftp.xemacs.org/pub/xemacs/
-nH -> pub/xemacs/
-nH --cut-dirs=1 -> xemacs/
-nH --cut-dirs=2 -> .
--cut-dirs=1 -> ftp.xemacs.org/xemacs/
...
To supress the directory structure, this option
is similar to a combination of -n d and -P . However, unlike -nd ,
--cut-dirs does not lose with subdirectories---for instance, with
-n H --cut-dirs=1 , a beta/ subdirectory will be placed to
xemacs/beta, as one would expect.
| -P prefix --directory-prefix=prefix
| The directory prefix is the directory where all other files and subdirectories will be saved to,
i.e. the top of the retrieval tree.
The default is . (the current directory).
|
HTTP Options
|
---|
-E --html-extension
| If a file of type text/html is downloaded and the URL does not end
with the regexp
\.[Hh][Tt][Mm][Ll]? , .html will be appended to the local filename. Use when mirroring a remote site that uses .asp
or CGIs. URL http://site.com/article.cgi?25
will be saved as article.cgi?25.html .
WARNING: filenames changed in this way will be re-downloaded every
time you re-mirror a site.
To prevent this use --convert-links and --backup-converted
so that the original version of the file will be saved as X.orig .
| -s --save-headers
| before the actual contents, with an empty line as the separator.
| -C on|off --cache=on|off
| If off Wget
sends Pragma:nocache to disable server-side cache. Used to retrieving out-of-date documents on proxy servers. default:on.
| --cookies=on|off |
| --load-cookies file
| Load cookies from file before the first HTTP retrieval.
file in the format originally used by Netscape's cookies.txt file.
Use this option when mirroring sites that
require that you be logged in to access their content. The login process typically works by the web server issuing
an HTTP cookie upon receiving and verifying your credentials. The
cookie is then resent by the browser when accessing that part of
the site, and so sets your identity.
Mirroring such a site requires Wget to send the same cookies your
browser sends when communicating with the site. This is achieved
by --load-cookies---simply point Wget to the location of the cookies.txt file, and it will send the same cookies your browser would
send in the same situation. Different browsers keep textual cookie
files in different locations:
Netscape 4.x:
The cookies are in ~/.netscape/cookies.txt .
Mozilla and Netscape 6.x:
Mozilla's cookie file is also named cookies.txt, located some where under ~/.mozilla , in the directory of your profile. The
full path usually ends up looking somewhat like
~/.mozilla/default/some-weird-string/cookies.txt .
Internet Explorer:
You can produce a cookie file Wget can use by using the File
menu, Import and Export, Export Cookies.
Other browsers:
If you are using a different browser to create your cookies,
--load-cookies will only work if you can locate or produce a
cookie file in the Netscape format that Wget expects.
If you cannot use --load-cookies , there might still be an alternative. If your browser supports a cookie manager, you can use
it to view the cookies used when accessing the site you're mirroring. Write down the name and value of the cookie, and manually
instruct Wget to send those cookies, bypassing the official
cookie support:
wget --cookies=off --header "Cookie: I<name>=I<value>"
| --save-cookies file
| Cookies whose expiry time is not specified, or those that have already expired, are not saved.
| --ignore-length
| CGI programs send out incorrent "Content-Length " headers.
| --header=additional-header
| passed to the HTTP servers.
Headers must contain a : preceded by one or more non-blank characters, and must not contain newlines.
Define more than one additional header by specifying
--header more than once.
wget --header='Accept-Charset: iso-8859-2' \
--header='Accept-Language: hr' \
http://fly.srk.fer.hr/
Specification of an empty string as the header value will clear all previous user-defined headers.
| --proxy-user=user
--proxy-passwd=password
| for authentication on a proxy server. Encode using the "basic" authentication scheme.
| --http-user=user
| --http-passwd=password
| According to the type of the challenge, Wget will encode them using
either the "basic" (insecure) or the "digest" authentication scheme.
Another way to specify username and password is in the URL itself.
Either method reveals the password to ps . To prevent the passwords from being seen, store them in
.wgetrc or .netrc , and make sure to protect those files with "chmod".
| --referer=url
| Include 'Referer : url' header in HTTP request.
| -U agent-string --user-agent=agent-string
| Identify as agent-string to the HTTP server.
"User-Agent " header field.
Default Wget/version .
|
FTP Options
ftp://user:pass@host
| | without filename creates HTML formatted directory listing index.html including complete <a href …
| -nr --dont-remove-listing
| Don't remove .listing files generated by FTP retrievals containing the raw directory listings.
wget takes the directory listing and create an HTML page including complete <a href… then
deletes .listing
| -g on|off
--glob=on|off
| Use the shell like special characters (wildcards), like * , ? , [ and ]
to retrieve more than one file
wget "ftp://gnjilux.srk.fer.com/mail/*.msg"
on by default.
Quote the URL to protect it from being expanded by the shell.
| --passive-ftp | the client initiates the data connection.
| --retr-symlinks
| When retrieving FTP directories recursively and a symbolic
link is encountered, the linked-to file is not downloaded.
Instead, a matching symbolic link is created on the local filesystem. The pointed-to file will not be downloaded unless this recursive retrieval would have encountered it separately and downloaded.
When --retr-symlinks is specified, symbolic links are traversed and the pointed-to files are retrieved. This
option does not cause Wget to traverse symlinks to directories and
recurse through them.
Retrieving a file (not a directory)
specified on the commandline, rather than because it was recursed
to, this option has no effect. Symbolic links are always traversed
in this case.
|
Recursive Retrieval Options
| -r --recursive | Turn on recursive retrieving.
| -l depth --level=depth | default is 5.
| --delete-after
| useful for pre-fetching popular pages through a proxy, e.g.:
wget -r -nd --delete-after http://whatever.com/~popular/page/
The -r option is to retrieve recursively, and -nd to not create directories.
--delete-after deletes files on the local machine. It does not issue the DELE command to remote FTP sites.
when --delete-after is specified, --convert-links is
ignored, so .orig files are not .
| -k --convert-links
| Fix links for local viewing. affects visible hyperlinks, as well as any part of the document that links to
external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.
- links to files that have been downloaded will be changed to refer to the file they point to as a relative link.
Example: if the downloaded file /foo/doc.html links to
/bar/img.gif , also downloaded, then the link in doc.html will
be modified to point to ../bar/img.gif .
- links to files that have not been downloaded by Wget will
be changed to include host name and absolute path.
Example: if the downloaded file /foo/doc.html links to
/bar/img.gif it will be modified to point to http://hostname/bar/img.gif .
Performed at the end of all the downloads.
| -K --backup-converted | When converting a file, back up the original version with .orig .
Affects the behavior of --timestamping .
| -m --mirror
| Turn on options suitable for mirroring, i.e. recursion and time-stamping, sets unlimited recursion depth and
keeps FTP directory listings.
Equivalent to -r --timestamping --level inf -nr .
| -p --page-requisites
| Download files necessary to display the HTML page including
inlined images, sounds, and referenced stylesheets.
Use --r with --level can help.
For example:
doc1.html contains an <IMG> referencing 1.gif and an <A> pointing to external document 2.html .
2.html has image 2.gif and it links to 3.html with image 3.gif .
With:
wget -recursive --level 2 http://site/1.html
doc1.html, 1.gif, 2.html, 2.gif , and 3.html will be downloaded.
but, 3.gif is not downloaded because the level (up to 2) away from 1.html in
order to determine where to stop the recursion.
Instead:
wget -r --level 2 --page-requisites http://site/1.html
all the above files and 3.html's requisite 3.gif will be downloaded. Similarly,
wget -r --level 1 --page-requisites http://site/1.html
will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded.
wget -r --level 0 --page-requisites http://site/1.html
Does not download just 1.html and 1.gif, because -l 0 is equivalent to -l inf , that is, infinite
recursion. To download a single HTML page,
specified on the commandline or in a -i URL input file and its
requisites, omit -r and -l :
wget -p http://site/1.html
Wget will behave as if -r had been specified, but only that single page and its requisites will be downloaded.
Links from that page to external documents will not be followed.
To download a single page and all its requisites (even if they exist on separate websites),use:
wget --html-extension --span-hosts --convert-links --backup-converted --page-requisites http://site/document
An external document link is any URL in an <A>,
<AREA> or a <LINK> tag other than <LINK REL="stylesheet">.
| Recursive Accept/Reject Options
|
---|
-A acclist --accept acclist
-R rejlist --reject rejlist
| Comma-separated lists of file name suffixes or patterns to
accept or reject.
| --follow-tags=list |
| --ignore-tags=list
-G list
| Wget has an internal table of HTML tag / attribute pairs that it
considers when looking for linked documents during a recursive
retrieval.
To speciy a subset of tags to be considered, specify them in a comma separated list .
To specify tags to be ignored use -G .
| -H --span-hosts | Enable spanning across hosts when doing recursive retrieving.
| -D list --domains=list
| Set domains to be followed. list is a comma-separated list of domains.
This does not turn on -H .
| --exclude-domains list
| Specify the domains not to be followed..
| -I list --include-directories=list |
| -X list --exclude-directories=list
| Specify a comma-separated list of directories to follow/exclude
when downloading. Elements of list may contain wildcards./exclude
| -np --no-parent | Do not ascend to the parent directory when retrieving recursively.
| -L --relative
| Follow relative links only. Useful for retrieving a specific home
page without any distractions, not even those from the same hosts.
| --follow-ftp | Follow FTP links from HTML documents. Default: ignore all the FTP links.
| -b --background
| Go to background immediately after startup. If no output file is
specified via the -o , output is redirected to wget-log.
| -e command --execute command
| Execute command after the commands in .wgetrc .
| -V --version | Display the version of Wget.
| -h --help | Print a help message describing all of Wget's command-line options.
|
--input-metalink=file
Downloads files covered in local Metalink file. Metalink version 3 and 4 are supported.
| --keep-badhash
Keeps downloaded Metalink's files with a bad hash. It appends .badhash to the name of Metalink's files which have a checksum
mismatch, except without overwriting existing files.
| --metalink-over-http
Issues HTTP HEAD request instead of GET and extracts Metalink metadata from response headers. Then it switches to Metalink
download. If no valid Metalink metadata is found, it falls back to ordinary HTTP download. Enables Content-Type:
application/metalink4+xml files download/processing.
| --metalink-index=number
Set the Metalink application/metalink4+xml metaurl ordinal NUMBER. From 1 to the total number of "application/metalink4+xml"
available. Specify 0 or inf to choose the first good one. Metaurls, such as those from a --metalink-over-http, may have
been sorted by priority key's value; keep this in mind to choose the right NUMBER.
| --preferred-location
Set preferred location for Metalink resources. This has effect if multiple resources with same priority are available.
| --xattr
Enable use of file system's extended attributes to save the original URL and the Referer HTTP header value if used.
might contain private information like access tokens or credentials.
| --dns-timeout=seconds Set the DNS lookup timeout
| --bind-dns-address=address
overrides the route for DNS requests. IPv4 or IPv6 . needs to be built with libcares for this
| --dns-servers=addresses The given address(es) override the standard nameserver addresses,
IPv4 or IPv6 , comma-separated. needs to be built with libcares for this
| -d --debug | if compiled with debug
| | | | | | | | | | | | | | | | | | | | |