Downloading images

Programming — Alex @ 9:56 am

Every once in a while, I come across a site with loads of stuff that I’d like to download to my own computer, without the hassle of using a browser to do so manually. Here’s a combo wget and perl solution to the problem; since I rewrite essentially the same thing everytime I run into this situation, I figured I might as well make a solution once and for all. Basically, the perl script rips all the urls from a file (like a bookmark file exported by Opera in HTML format), and the wget script downloads the files to a directory, in a manner friendly to the other webserver (i.e., not demanding a lot of bandwidth). The perl script assumes that you have at most one url, possibly surrounded by other text, per line in the input file.

[perl]
while(<>) {
m/href=”(.*)?”/;
print “$1\n”;
}
[/perl]

Call the perl script like this : perl extract_urls.pl input.html > get.lst . Then invoke wget like this: wget -w1 --random-wait -nH -nc -r -k -i get.lst. It will get all the files into the current directory, and convert the links to local, so you probably will want to run this process in an appropriately named subdirectory.

Possibly relevant posts:

0 Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2008 ChapterZero | powered by WordPress with Barecity