Wget
wget
Recursively download site into current folder (great for browsed folders)
wget -r -np -nd [URL] # -r recursive, -np no parent, -nd no directory creation
wget:
wget -SO- [URL] # save to stdout '-' wget -SO /dev/null [URL] # save to /dev/null wget [URL] -O [OUTFILE] # overwrite file wget [URL] -P [PATH] # save to path, no clobber wget [URL] -N # timestamp - only download if newer, and clobber
wget [URL] -r # recursively download everything, and clobber wget [URL] -r -nd # recursively download into current folder (no dirs), no clobber wget [URL] -r -l [DEPTH] # levels to download (default is 5) wget [URL] -r -k # convert links for local viewing wget [URL] -p # recurse all needed to display current page (no downloads) wget [URL] -r -L # follow only relative URLs (helps keep on same host) wget [URL] -r -np # never ascend into parent directory
wget -e robots=off --wait 1 [url] # ignore robots and wait a second between downloads
wget [URL] -m # mirror website
Downloading an Entire Web Site with wget | Linux Journal - http://www.linuxjournal.com/content/downloading-entire-web-site-wget
$ wget \ --recursive \ --no-clobber \ --page-requisites \ --html-extension \ --convert-links \ --restrict-file-names=windows \ --domains website.org \ --no-parent \ www.website.org/tutorials/html/
The options are:
--recursive: download the entire Web site.
--domains website.org: don't follow links outside website.org.
--no-parent: don't follow links outside the directory tutorials/html/.
--page-requisites: get all the elements that compose the page (images, CSS and so on).
--html-extension: save files with the .html extension.
--convert-links: convert links so that they work locally, off-line.
--restrict-file-names=windows: modify filenames so that they will work in Windows as well.
--no-clobber: don't overwrite any existing files (used in case the download is interrupted and resumed).
It would be a VERY good idea to add to your command so you don't kill the server you are trying to download from
--wait=9 --limit-rate=10K
The Ultimate Wget Download Guide With 15 Awesome Examples - http://www.thegeekstuff.com/2009/09/the-ultimate-wget-download-guide-with-15-awesome-examples/
Site Download
wget -r -l1 --no-parent -A.gif http://www.locationwheretogetthefilefrom.com/dir/ -r -l1 means to retrieve recursively, with maximum depth of 1. --no-parent means that references to the parent directory are ignored. -A.gif means to download only the GIF files. (-A "*.gif" would have worked too as a wild card.)
Recursively Download FTP Site
Download FTP site to 99 levels
wget -r --level=99 ftp://myusername:mypassword@ftp.yoursite.com/ # -r –recursive Turn on recursive retrieving. # -l depth –level=depth Specify recursion maximum depth level depth. The default maximum depth is 5.
Mirror site (infinite levels)
wget -m ftp://myusername:mypassword@ftp.yoursite.com/ # The -m option turns on mirroring i.e. it turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings:
If you download a second time, use the 'no clobber' option to keep from downloading the same files:
-nc –no-clobber
Resources:
- Linuxaria – Everything about GNU/Linux and Open source How to download recursively from an FTP site - http://linuxaria.com/howto/how-to-download-recursively-from-an-ftp-site?lang=en
Recursively Download MP3s
Download Zelda Reorchestrated MP3s:
wget -e robots=off --wait 1 -r -l1 -H --no-parent -nd -A .mp3 http://www.zreomusic.com/listen
Download all music files off of a website using wget:
wget -r -l1 -H -nd -A mp3 -e robots=off http://example/url
Download all music files off of a website using wget This will download all files of the type specified after "-A" from a website. Here is a breakdown of the options: -r turns on recursion and downloads all links on page -l1 goes only one level of links into the page(this is really important when using -r) -H spans domains meaning it will download links to sites that don't have the same domain -nd means put all the downloads in the current directory instead of making all the directories in the path -A mp3 filters to only download links that are mp3s(this can be a comma separated list of different file formats to search for multiple types) -e robots=off just means to ignore the robots.txt file which stops programs like wget from crashing the site... sorry http://example/url lol..
Reference:
- Download all music files off of a website using wget | commandlinefu.com [1]