You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jörg Agatz <jo...@googlemail.com> on 2009/11/26 09:24:49 UTC
Fulltext crawler
*Hey guys*,I search a Fulltext crawler for Solr, to index HTML,OpenOffice
and Ms Office documets,PDF and muchmore formates.
How indexed you the Data?
Maby you can help me to find a Crawler.
King
Re: Fulltext crawler
Posted by Christian Weyand <ch...@ekaabo.de>.
As far as i know "Nutch" will satisfy your needs, altough i didn't test
it myself yet..
> *Hey guys*,I search a Fulltext crawler for Solr, to index HTML,OpenOffice
> and Ms Office documets,PDF and muchmore formates.
> How indexed you the Data?
>
> Maby you can help me to find a Crawler.
>
> King
>
>
--
ekaabo GmbH
Christian Weyand
Entwickler
christian.weyand@ekaabo.de
Grundelbachstr. 84
69469 Weinheim
tel: +49-(0)6201-84520-0 (Zentrale)
fax: +49-(0)6201-84520-29
www.ekaabo.de
Amtsgericht Mannheim / HRB 701542
Geschäftsführer: Marco Ripanti
Re: Fulltext crawler
Posted by "Smiley, David W." <ds...@mitre.org>.
Start reading midway page 224.
Additionally, you might want to get the online supplement available at packtpub.com.
FYI my co-author Eric Pugh wrote the last 3 chapters which includes this.
~ David
On Nov 30, 2009, at 1:37 PM, Jörg Agatz wrote:
> book? i order "Solr 1.4" today, i see some examples in this book?
Re: Fulltext crawler
Posted by Jörg Agatz <jo...@googlemail.com>.
book? i order "Solr 1.4" today, i see some examples in this book?
Re: Fulltext crawler
Posted by "Smiley, David W." <ds...@mitre.org>.
And of course Heritrix http://crawler.archive.org/
I think this one's quite cool. You'll see example usage in my book.
~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
On Nov 26, 2009, at 5:01 AM, Shalin Shekhar Mangar wrote:
> On Thu, Nov 26, 2009 at 1:54 PM, Jörg Agatz <jo...@googlemail.com>wrote:
>
>> *Hey guys*,I search a Fulltext crawler for Solr, to index HTML,OpenOffice
>> and Ms Office documets,PDF and muchmore formates.
>> How indexed you the Data?
>>
>> Maby you can help me to find a Crawler.
>>
>
> If you need a web crawler, look at Nutch. Otherwise, you may need to build
> something using Driods or Aperture.
>
> http://lucene.apache.org/nutch/
> http://incubator.apache.org/droids/
> http://aperture.sourceforge.net/
>
> --
> Regards,
> Shalin Shekhar Mangar.
Re: Fulltext crawler
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Thu, Nov 26, 2009 at 1:54 PM, Jörg Agatz <jo...@googlemail.com>wrote:
> *Hey guys*,I search a Fulltext crawler for Solr, to index HTML,OpenOffice
> and Ms Office documets,PDF and muchmore formates.
> How indexed you the Data?
>
> Maby you can help me to find a Crawler.
>
If you need a web crawler, look at Nutch. Otherwise, you may need to build
something using Driods or Aperture.
http://lucene.apache.org/nutch/
http://incubator.apache.org/droids/
http://aperture.sourceforge.net/
--
Regards,
Shalin Shekhar Mangar.