Re: [Hampshire] A php script to spider media files

Top Page

Reply to this message
Author: Adam Trickett
Date:  
To: Hampshire LUG Discussion List
Subject: Re: [Hampshire] A php script to spider media files
On Sunday 20 Jan 2008, Ottavio Caruso wrote:
> Sirs,
>
> I need your help to write a php script that spiders a site for media files.


PHP wouldn't be my first choice, it's not really a general purpose scripting
language. You will have better luck with Perl, Ruby or Python, and even
shell/sed/awk/wget.

> What I would like to achieve is: given two variables, $URL and $MEDIA,
> the script will crawl, starting from $URL and going, say, two or three
> level down (or up), all links ending with .$MEDIA and print their url
> on screen.
>
> Example: http://www.somesite.com , mpeg
>
> output:
>
> Found http://www.somesite.com/file1.mpeg
> Found http://www.somesite.com/file2.mpeg
> Found http://www.somesite.com/file3.mpeg
> Found http://www.somesite.com/file2.mpeg
> Found http://www.anothersite.co.uk/file1.mpeg
> Found http://www.anothersite.co.uk/file2.mpeg
> Found http://www.anothersite.co.uk/file3.mpeg
> Found http://www.anothersite.co.uk/file2.mpeg
>
> Something like that. Is that possible? I haven't touched php for ages.


That should be possible. How clever do you want to get? Do you want to include
or exclude off-site links? Should it try to parse links constructed in
Javascript? Would you be happy to use an existing downloading tool - I
believe several exist? Can you install stuff on the box you are running this
from or are there restrictions?

--
Adam Trickett
Overton, HANTS, UK

Any technology distinguishable from magic is insufficiently advanced
    -- anon