Downloading an Entire Web Site with wget

September 5th, 2008 by Dashamir Hoxha in

Your rating: None Average: 4.7 (91 votes)

If you ever need to download an entire Web site, perhaps for off-line viewing, wget can do the
job—for example:

$ wget \
     --recursive \
     --no-clobber \
     --page-requisites \
     --html-extension \
     --convert-links \
     --restrict-file-names=windows \
     --domains website.org \
     --no-parent \
         www.website.org/tutorials/html/

This command downloads the Web site www.website.org/tutorials/html/.

The options are:

  • --recursive: download the entire Web site.

  • --domains website.org: don't follow links outside website.org.

  • --no-parent: don't follow links outside the directory tutorials/html/.

  • --page-requisites: get all the elements that compose the page (images, CSS and so on).

  • --html-extension: save files with the .html extension.

  • --convert-links: convert links so that they work locally, off-line.

  • --restrict-file-names=windows: modify filenames so that they will work in Windows as well.

  • --no-clobber: don't overwrite any existing files (used in case the download is interrupted and
    resumed).

__________________________


Special Magazine Offer -- Free Gift with Subscription
Receive a free digital copy of Linux Journal's System Administration Special Edition as well as instant online access to current and past issues. CLICK HERE for offer

Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Arjun Pakrashi's picture

Good work

On July 9th, 2009 Arjun Pakrashi (not verified) says:

I was always finding suggestions on appropriate switches to be used when downloading a complete website. This piece of document was very helpful. Thanks and keep up the good work.

process info's picture

I like lftp

On December 30th, 2008 process info (not verified) says:

ftp is very for me,I don't know much about linux.
I hope can learn more you great people like you.

partero's picture

Great

On October 2nd, 2008 partero says:

Very good instructions

Luigimax's picture

My use

On September 15th, 2008 Luigimax (not verified) says:

Just to start, this post is most helpfull. Dashamir Hoxha thanks alot!

the reason for writing this is when downloading multiple sites in sequence will take much time. so to easly download multiple sites i set this up. and yes it would be more efficent to put the pipe command in the scrpt file.

what im using it for: dowload multiple websites (manga specificly)

step 1: put the wget command in a script file (for ease of use)

#!/bin/bash
wget -r --page-requisites --convert-links --no-parent -l $2 -U Mozilla $1

ill call mine "meget"
run command: chmod +x meget
is what i put in mine. how to use: [script-name] [target website] [scan depth]

step 2:

make a file with all the websites you want to download - one per line. ill call mine "zone"

step 3: run command:

cat zone | xargs -n1 -P 3 -i ./meget {} 1000

to increase the number of parallel downloads change the 3 to whatever number you need. keep in mind not to have a list of 300 sites and download them all at once - this may cause problems

be sure to also set the 1000 number to the depth you need. in my case to download a 1500 page manga i need to set it upto 1500 or more.

when it is running it will only show one downlaod at a time. if still running it will always show something.

TsueDesu's picture

Just what I've been looking for

On September 9th, 2008 TsueDesu (not verified) says:

Just this week I needed to make a site available offline so I can reference to it while working at home. And YaY!! I have wget and love using it already. However, I advise taking note of how wget is saving the files, if it's a site with lots of PHP pages, then you'll have to change the reference in every .php to .php.html ... Not to fear though, your computer can already do the hard work for you. Just type

grep -rl .php *.html | xargs perl -pi~ -e 's/.php/.php.html/'

et voila Your pages will open and link with out a hitch...really interesting and marvelous this Linux thing.

Post new comment

Please note that comments may not appear immediately, so there is no need to repost your comment.
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.

More information about formatting options

Newsletter

Each week Linux Journal editors will tell you what's hot in the world of Linux. You will receive late breaking news, technical tips and tricks, and links to in-depth stories featured on www.linuxjournal.com.
Sign up for our Email Newsletter

Tech Tip Videos

From the Magazine

August 2009, #184

If you're a culinary type you've probably heard of Pickled Capers. This month, we present you with an even tastier treat: Kerneled Kapers. That's right Linux so good that you can eat it for dinner. We've got two articles about kernel scheduling: one about real time scheduling and the other about the Completely Fair Scheduler which appeared in Linux 2.6.23. We also have an article on the new Ksplice technology that appeared on the scene just recently. Also in this issue: find out how to make root unprivileged.


And if Kapers aren't your cup of tea we have our usual buffet of articles: eyeOS which allows you to create your own cloud based desktops, using fixtures and factories with Rails, more on secure Squids, a review of the long awaited KOffice 2.0, Longomatch, and Kanatest.


But don't leave before we serve up the "piece de resistance": Point/Counterpoint on Twitter.


Apologies to Chef Marcel for borrowing his shtick.





Read this issue