[Hampshire] How can I mirror a webpage?

Top Page

Reply to this message
Author: Steve Kemp
Date:  
To: Hampshire LUG Discussion List
Subject: [Hampshire] How can I mirror a webpage?

A simple request which is confusing me mightily!

I'd like to download a remote webpage *including* any images, css
files, etc which are required and rewrite those to work in the
local copy. This is simple stuff with wget usually, but I'm
running into problems because I must have the initial page
be downloaded to a fixed name.

wget seems to dislike my initial attempt:

    wget --O index.html --no-clobber --page-requisites \
     --convert-links --no-directories --url=http://en.wikipedia.org/


The "--no-clobber" here, designed to avoid a file overwriting one
which already exists, stops things from working.

curl seems to allow me to name files like -O "index_#1", but it
doesn't do rewriting of the page contents (images/css/etc).

(I'm trying to create archives of bookmarks in an online bookmark
application - so I want files for bookmark "xx" to be located in
/path/to/archives/xx/ - which is why I have to insist upon "index.html"
as the initial page.)

I guess I could use perl to get a URLs contents, parse it for
links, and then get them individually - but it seems like this should
be a simple request... I looked at httrack too, but that seemed
confusingly complex.

Steve
--
http://www.steve.org.uk/