Re: [Hampshire] extracting phrases from a file.

Top Page

Reply to this message
Author: James Courtier-Dutton
Date:  
To: Hampshire LUG Discussion List
Subject: Re: [Hampshire] extracting phrases from a file.
On 12 September 2011 10:37, Alan Pope <alan@???> wrote:
> On 12 September 2011 10:17, James Courtier-Dutton
> <james.dutton@???> wrote:
>> I want extract the "some url" bits. I.e. Remove the href.
>> You can probably do this quite easily in perl.
>> Are there any nice short programs to do this?
>> Is it easier to do in some other language?
>>
>
> lynx -dump --hiddenlinks=ignore foo.html
>
> Will dump it to stdout in plain text form with URLs removed.
>


Sorry, I was not very clear.
I wish to keep the "some url" bits, and get rid of all the "some junk" bits.
I.e. I wish to keep the contents of the href only, and drop everything
else, e.g. the href text itself.
I wish to end up with a file listing all the urls.