Downloading images
Every once in a while, I come across a site with loads of stuff that I’d like to download to my own computer, without the hassle of using a browser to do so manually. Here’s a combo wget and perl solution to the problem; since I rewrite essentially the same thing everytime I run into this situation, I figured I might as well make a solution once and for all. Basically, the perl script rips all the urls from a file (like a bookmark file exported by Opera in HTML format), and the wget script downloads the files to a directory, in a manner friendly to the other webserver (i.e., not demanding a lot of bandwidth). The perl script assumes that you have at most one url, possibly surrounded by other text, per line in the input file.
[perl]
while(<>) {
m/href=”(.*)?”/;
print “$1\n”;
}
[/perl]
Call the perl script like this : perl extract_urls.pl input.html > get.lst . Then invoke wget like this: wget -w1 --random-wait -nH -nc -r -k -i get.lst. It will get all the files into the current directory, and convert the links to local, so you probably will want to run this process in an appropriately named subdirectory.
Possibly relevant posts:
- Installing TeTeX latex packages (2/28/2005)
- Circuit Macros helper script (1/31/2005)
- Language confessions (12/23/2004)