wget [option]... [URL]...
Non-interactive download of files from the Web, supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.
Works in the background, allows running from cron
Wget can follow links in HTML pages and create local versions of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as recursive downloading.
designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports regetting, it will instruct the server to continue the download from where it left off.
--help
| Logging and Input File Options | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
-o logfile
| Log messages to logfile. Default STDERR | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-a logfile
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-i file | Read URLs from file URLs on the command line are retrieved first. The file need not be an HTML document .
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-F | When input is read from a file, force it to be treated as an HTML
file. This enables retrieval of relative links from existing
HTML files on local system, by adding <base href="url"> to
HTML, or using --base
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-B URL | When used in conjunction with -F, prepends URL to relative
links in the file specified by -i.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Download Options | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-O file
| documents will be concatenated to file or STDOUT if - is specified.Sets tries to 1.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-nc |
Suppresses creating versions of duplicate files named ffff.n Newer copies of file are not retrieved. -nc may not be specified with --timestamping.
Handling a file downloaded more than once in the same directory:
With
With
When | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-N | Turn on time-stamping. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-c | Continue a partially-downloaded file.
Don't specify this option to retry downloading a file should
the connection be lost midway through. This is the default. A local file that's smaller than the server one will be considered part of an incomplete download and only "(length(remote) - length(local))" bytes will be downloaded and appended.
The server must support continued downloading via the
Using A garbled file will result HTTP the a proxy that inserts a transfer interrupted string into the local file.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--progress=| bar
| default bar . draws an ASCII progress bar (aka thermometer display) indicating the status of retrieval. If the output is not a TTY, the dot bar will be used by default.
progress in .wgetrc is overridden from the command
line, unless the output is not a TTY, the
dot progress will be favored over bar. To force the bar output, use --progress=bar:force.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--spider | Pages only checked not downloaded. Useful for checking bookmarks.
wget --spider --force-html -i bookmarks.html
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--limit-rate=Bps[k|m] |
Expressed in bytes (default), kilobytes or megabytes. Example: --limit-rate=20k Implemented by sleeping an appropriate amount of time after network reads. Does not to work with very small files. Specifing bandwidth less than KBps may be ineffecive.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-w seconds | between retrievals. minutes m suffix, hours use h , or days d.
Specifying a large value is useful if the network or the destination host is down. Wget can wait long enough to reasonably expect the network error to be fixed before the retry.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--random-wait | causes the time
between requests to vary between 0 and 2 * wait seconds
specified using the --wait to mask
Wget's presence from analysis.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--proxy=on|off, -Y onoff
| On by default if the appropriate environmental variable is defined.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-Q quota | specified in bytes , kilobytes k suffix, or
megabytes ( m).Will not affect downloading a single file. quota is
respected when retrieving either recursively, or from an input
file. Example: wget -Q2m -i sites. Download will
be aborted when the quota is reached.
Setting quota to 0 or inf specifies NO limit.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--bind-address=ADDRESS | When making client TCP/IP connections, "bind()" to ADDRESS on the
local machine. ADDRESS may be specified as a hostname or IP
address. Use when wget's host has multiple IPs.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-t number | Specify 0 or inf for infinite retrying.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--waitretry=attempts | only wait between retries of failed downloads. Wget
will use linear backoff, waiting 1 second after the first failure
on a given file, then waiting 2 seconds after the second failure on
that file, up to the maximum number of attempts .
Therefore, a value of 10 will have Wget wait up to (1 + 2 + ... + 10) = 55 seconds per file. On by default in the global wgetrc file.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-T seconds | read timeout. When a network read
is issued, the file descriptor is checked for a timeout, which
could otherwise leave a pending connection (uninterrupted read).
default 900 seconds (fifteen minutes). Setting
timeout to 0 will disable checking for timeouts.CAUTION!
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Directory Options | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-nd | Do not create a hierarchy of directories when retrieving recursively, files will be saved to
the current directory, without clobbering (if a name shows up more
than once, the filenames will get extensions .n).
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-x | The opposite of -nd---create a hierarchy of directories, even if
one would not have been created otherwise. E.g. wget -x
http://fly.srk.fer.hr/robots.txt will save the downloaded file to
fly.srk.fer.hr/robots.txt.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-nH | Disable generation of host-prefixed directories. By default,
invoking Wget with -r http://fly.srk.fer.hr/ will create a structure of directories beginning with fly.srk.fer.hr/ . This option
disables such behavior.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--cut-dirs=number | Ignore number directory components. This is useful for getting a
fine-grained control over the directory where recursive retrieval will be saved.
For example, the directory at
To supress the directory structure, this option
is similar to a combination of
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-P prefix | The directory prefix is the directory where all other files and subdirectories will be saved to,
i.e. the top of the retrieval tree. The default is . (the current directory).
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HTTP Options | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-E | If a file of type text/html is downloaded and the URL does not end
with the regexp \.[Hh][Tt][Mm][Ll]?, .html will be appended to the local filename. Use when mirroring a remote site that uses .asp
or CGIs. URL http://site.com/article.cgi?25
will be saved as article.cgi?25.html.
WARNING: filenames changed in this way will be re-downloaded every time you re-mirror a site. To prevent this use --convert-links and --backup-converted
so that the original version of the file will be saved as X.orig.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-s
| before the actual contents, with an empty line as the separator.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-C on|off | If off Wget
sends Pragma:nocache to disable server-side cache. Used to retrieving out-of-date documents on proxy servers. default:on.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--cookies=on|off | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--load-cookies file | Load cookies from file before the first HTTP retrieval.
file in the format originally used by Netscape's cookies.txt file.
Use this option when mirroring sites that require that you be logged in to access their content. The login process typically works by the web server issuing an HTTP cookie upon receiving and verifying your credentials. The cookie is then resent by the browser when accessing that part of the site, and so sets your identity.
Mirroring such a site requires Wget to send the same cookies your
browser sends when communicating with the site. This is achieved
by
wget --cookies=off --header "Cookie: I<name>=I<value>"
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--save-cookies file | Cookies whose expiry time is not specified, or those that have already expired, are not saved.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--ignore-length
| CGI programs send out incorrent "Content-Length" headers.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| passed to the HTTP servers.
Headers must contain a : preceded by one or more non-blank characters, and must not contain newlines.
Define more than one additional header by specifying --header more than once.
wget --header='Accept-Charset: iso-8859-2' \
--header='Accept-Language: hr' \
http://fly.srk.fer.hr/
Specification of an empty string as the header value will clear all
previous user-defined headers.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--proxy-user=user | for authentication on a proxy server. Encode using the "basic" authentication scheme.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
--http-user=user | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Respects the Robot Exclusion file
(/robots.txt). can convert the links in downloaded HTML files to the local files for offline viewing.
wget http://fly.srk.fer.hr/if the connection is slow, and the file is lengthy, The connection may fail before the whole file is retrieved, more than once. Wget will try getting the file until it either gets the whole of it, or exceeds the default number of retries .
log
wget --tries 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg &
-t inf Caution!.
wget ftp://gnjilux.srk.fer.hr/welcome.msg
wget ftp://prep.ai.mit.edu/pub/gnu/ links index.html
-i :
wget --input-file=urlfile
specify - as file name, the URLs will be read from standard input.
gnulog:
wget --recursive http://www.gnu.org/ -o gnulog
wget --convert-links -r http://www.gnu.org/ -o gnulog
wget -p --convert-links http://www.server.com/dir/page.html
The HTML page will be saved to www.server.com/dir/page.html, and the images, stylesheets, etc., somewhere under www.server.com/, depending on where they were on the remote server.
wget -p --convert-links -nH -nd -Pdownload \
http://www.server.com/dir/page.html
wget -S http://www.lycos.com/
wget -s http://www.lycos.com/ more index.html
wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/
wget http://www.server.com/dir/*.gif,
didn't work because HTTP retrieval does not support globbing. :
wget -r -l1 --no-parent -A.gif http://www.server.com/dir/
-r -l1 means to retrieve recursively, with maximum depth of 1. --no-parent means that references to the parent directory are ignored, and -A.gif means to download only the GIF files. -A "*.gif" would have worked too.
wget -nc -r http://www.gnu.org/
wget ftp://hniksic:mypassword@unix.server.com/.emacs
This usage is not advisable on multi-user systems because it reveals your password to anyone who looks at the output of "ps".
wget -O - http://jagor.srce.hr/ http://www.srce.hr/
combine the two options and make pipelines to retrieve the documents from remote hotlists:
wget -O - http://cool.list.com/ ' wget --force-html -i
-r -l inf --timestamping.
put Wget in the crontab file asking it to recheck a site each Sunday:
crontab 0 0 * * 0 wget --mirror http://www.gnu.org/ -o /home/me/weeklog
wget --mirror --convert-links --backup-converted \ http://www.gnu.org/ -o /home/me/weeklog
wget --mirror --convert-links --backup-converted \ --html-extension -o /home/me/weeklog \ http://www.gnu.org/
Or, with less typing:
wget -m -k -K -E http://www.gnu.org/ -o /home/me/weeklog
/usr/local/etc/wgetrc Default location of the global startup file. .wgetrc User startup file.
Before actually submitting a bug report, please try to follow a few simple guidelines.
This document was taken from :
GNU Wget 1.8.2 2003-01-25 WGET(1)
and reworked for terserness and HTML formatting by Dennis German
--help GNU Wget 1.8.2, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...
Mandatory arguments to long options are mandatory for short options too.
Startup:
-V, --version display the version of Wget and exit.
-h, --help print this help.
-b, --background go to background after startup.
-e, --execute=COMMAND execute a `.wgetrc'-style command.
Logging and input file:
-o, --output-file=FILE log messages to FILE.
-a, --append-output=FILE append messages to FILE.
-d, --debug print debug output.
-q, --quiet quiet (no output).
-v, --verbose be verbose (this is the default).
-nv, --non-verbose turn off verboseness, without being quiet.
-i, --input-file=FILE download URLs found in FILE.
-F, --force-html treat input file as HTML.
-B, --base=URL prepends URL to relative links in -F -i file.
--sslcertfile=FILE optional client certificate.
--sslcertkey=KEYFILE optional keyfile for this certificate.
--egd-file=FILE file name of the EGD socket.
Download:
--bind-address=ADDRESS bind to ADDRESS (hostname or IP) on local host.
-t, --tries=NUMBER set number of retries to NUMBER (0 unlimits).
-O --output-document=FILE write documents to FILE.
-nc, --no-clobber don't clobber existing files or use .# suffixes.
-c, --continue resume getting a partially-downloaded file.
--progress=TYPE select progress gauge type.
-N, --timestamping don't re-retrieve files unless newer than local.
-S, --server-response print server response.
--spider don't download anything.
-T, --timeout=SECONDS set the read timeout to SECONDS.
-w, --wait=SECONDS wait SECONDS between retrievals.
--waitretry=SECONDS wait 1...SECONDS between retries of a retrieval.
--random-wait wait from 0...2*WAIT secs between retrievals.
-Y, --proxy=on/off turn proxy on or off.
-Q, --quota=NUMBER set retrieval quota to NUMBER.
--limit-rate=RATE limit download rate to RATE.
Directories:
-nd --no-directories don't create directories.
-x, --force-directories force creation of directories.
-nH, --no-host-directories don't create host directories.
-P, --directory-prefix=PREFIX save files to PREFIX/...
--cut-dirs=NUMBER ignore NUMBER remote directory components.
HTTP options:
--http-user=USER set http user to USER.
--http-passwd=PASS set http password to PASS.
-C, --cache=on/off (dis)allow server-cached data (normally allowed).
-E, --html-extension save all text/html documents with .html extension.
--ignore-length ignore `Content-Length' header field.
--header=STRING insert STRING among the headers.
--proxy-user=USER set USER as proxy username.
--proxy-passwd=PASS set PASS as proxy password.
--referer=URL include `Referer: URL' header in HTTP request.
-s, --save-headers save the HTTP headers to file.
-U, --user-agent=AGENT identify as AGENT instead of Wget/VERSION.
--no-http-keep-alive disable HTTP keep-alive (persistent connections).
--cookies=off don't use cookies.
--load-cookies=FILE load cookies from FILE before session.
--save-cookies=FILE save cookies to FILE after session.
FTP options:
-nr, --dont-remove-listing don't remove `.listing' files.
-g, --glob=on/off turn file name globbing on or off.
--passive-ftp use the "passive" transfer mode.
--retr-symlinks when recursing, get linked-to files (not dirs).
Recursive retrieval:
-r, --recursive recursive web-suck -- use with care!
-l, --level=NUMBER maximum recursion depth (inf or 0 for infinite).
--delete-after delete files locally after downloading them.
-k, --convert-links convert non-relative links to relative.
-K, --backup-converted before converting file X, back up as X.orig.
-m, --mirror shortcut option equivalent to -r -N -l inf -nr.
-p, --page-requisites get all images, etc. needed to display HTML page.
Recursive accept/reject:
-A, --accept=LIST comma-separated list of accepted extensions.
-R, --reject=LIST comma-separated list of rejected extensions.
-D, --domains=LIST comma-separated list of accepted domains.
--exclude-domains=LIST comma-separated list of rejected domains.
--follow-ftp follow FTP links from HTML documents.
--follow-tags=LIST comma-separated list of followed HTML tags.
-G, --ignore-tags=LIST comma-separated list of ignored HTML tags.
-H, --span-hosts go to foreign hosts when recursive.
-L, --relative follow relative links only.
-I, --include-directories=LIST list of allowed directories.
-X, --exclude-directories=LIST list of excluded directories.
-np
|