Re: [Hampshire] A php script to spider media files

Top Page

Reply to this message
Author: Dr Adam J Trickett
Date:  
To: Ottavio Caruso
CC: Hants LUG
Subject: Re: [Hampshire] A php script to spider media files
On Mon, 21 Jan 2008 at 08:49:32AM +0000, Ottavio Caruso wrote:
> On Jan 20, 2008 4:56 PM, Adam Trickett wrote:
> > On Sunday 20 Jan 2008, Ottavio Caruso wrote:

<cuts/>
> >
> > > What I would like to achieve is: given two variables, $URL and $MEDIA,
> > > the script will crawl, starting from $URL and going, say, two or three
> > > level down (or up), all links ending with .$MEDIA and print their url
> > > on screen.
> > >
> > That should be possible. How clever do you want to get? Do you want to include
> > or exclude off-site links? Should it try to parse links constructed in
> > Javascript? Would you be happy to use an existing downloading tool - I
> > believe several exist? Can you install stuff on the box you are running this
> > from or are there restrictions?
>
> I have a basic free account with a free web provider and I can only
> have php or perl. Perl is not my thing. Shell scripts won't work and I
> have my virtual Slackware at work (host: Window$) and I don't want to
> upset my boss using wget and local spiders, I have already enough
> problems on my own. Obvioulsy wget would be my first choice if I had
> my own internet connection.


Okay, what you want to do is a solved problem.

What you are talking about is not a script running from shell, you want
to run a CGI process via a web page - is that correct?

> I'll look into something like fsockopen() and, yes, I need to include
> off-site links.
>
> In practice I am trying to build my own online media spider, but I
> am very lame.


There are plenty of modules on CPAN that will make writing this dead easy
in Perl, however I suspect you won't have permission to install any?

I suspect that your constraints are a but much here, spidering needs
bandwidth, storage and a proper scripting environment, and I fear that
you don't have all of them. Why can't you do this properly from a Unix/Linux
desktop/server system? you'd be able to use any tool, that way.

--
Adam Trickett
Overton, HANTS, UK

I guess that, if you're in Microsoft's shoes, it makes sense. If you
can't write software or protocols that can stably walk and chew gum,
program in a limit that prevents the user from telling it to do so.
-- Jonathan Patschke, on limitations in Active Directory