You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2002/04/20 16:10:04 UTC

Re:_HTML_parser

Laura,

Search the lucene-user and lucene-dev archives for things like:
crawler
spider
spindle
lucene sandbox

Spindle is something you may want to look at, as is MoJo (not mentioned
on lucene lists, use Google).

Otis

> Did someone solve the problem to spider recursively a web pages?

> > >While trying to research the same thing, I found the
> following...here
> 's a 
> > >good example of link extraction.....
> > 
> > Try http://www.quiotix.com/opensource/html-parser
> > 
> > Its easy to write a Visitor which extracts the links; should take
> abou
> t ten 
> > lines of code.


__________________________________________________
Do You Yahoo!?
Yahoo! Games - play chess, backgammon, pool and more
http://games.yahoo.com/

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>