You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jörg Agatz <jo...@googlemail.com> on 2009/11/26 09:24:49 UTC

Fulltext crawler

*Hey guys*,I search a Fulltext crawler for Solr, to index HTML,OpenOffice
and Ms Office documets,PDF and muchmore formates.
How indexed you the Data?

Maby you can help me to find a Crawler.

King

Re: Fulltext crawler

Posted by Christian Weyand <ch...@ekaabo.de>.

As far as i know "Nutch" will satisfy your needs, altough i didn't test 
it myself yet..
> *Hey guys*,I search a Fulltext crawler for Solr, to index HTML,OpenOffice
> and Ms Office documets,PDF and muchmore formates.
> How indexed you the Data?
>
> Maby you can help me to find a Crawler.
>
> King
>
>   


-- 
ekaabo GmbH

Christian Weyand
Entwickler
christian.weyand@ekaabo.de

Grundelbachstr. 84
69469 Weinheim
tel: +49-(0)6201-84520-0 (Zentrale)
fax: +49-(0)6201-84520-29
www.ekaabo.de

Amtsgericht Mannheim / HRB 701542
Geschäftsführer: Marco Ripanti

Re: Fulltext crawler

Posted by "Smiley, David W." <ds...@mitre.org>.

Start reading midway page 224.

Additionally, you might want to get the online supplement available at packtpub.com.

FYI my co-author Eric Pugh wrote the last 3 chapters which includes this.

~ David

On Nov 30, 2009, at 1:37 PM, Jörg Agatz wrote:

> book? i order "Solr 1.4" today, i see some examples in this book?

Re: Fulltext crawler

Posted by Jörg Agatz <jo...@googlemail.com>.

book? i order "Solr 1.4" today, i see some examples in this book?

Re: Fulltext crawler

Posted by "Smiley, David W." <ds...@mitre.org>.

And of course Heritrix   http://crawler.archive.org/
I think this one's quite cool.  You'll see example usage in my book.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

On Nov 26, 2009, at 5:01 AM, Shalin Shekhar Mangar wrote:

> On Thu, Nov 26, 2009 at 1:54 PM, Jörg Agatz <jo...@googlemail.com>wrote:
> 
>> *Hey guys*,I search a Fulltext crawler for Solr, to index HTML,OpenOffice
>> and Ms Office documets,PDF and muchmore formates.
>> How indexed you the Data?
>> 
>> Maby you can help me to find a Crawler.
>> 
> 
> If you need a web crawler, look at Nutch. Otherwise, you may need to build
> something using Driods or Aperture.
> 
> http://lucene.apache.org/nutch/
> http://incubator.apache.org/droids/
> http://aperture.sourceforge.net/
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.

Re: Fulltext crawler

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Thu, Nov 26, 2009 at 1:54 PM, Jörg Agatz <jo...@googlemail.com>wrote:

> *Hey guys*,I search a Fulltext crawler for Solr, to index HTML,OpenOffice
> and Ms Office documets,PDF and muchmore formates.
> How indexed you the Data?
>
> Maby you can help me to find a Crawler.
>

If you need a web crawler, look at Nutch. Otherwise, you may need to build
something using Driods or Aperture.

http://lucene.apache.org/nutch/
http://incubator.apache.org/droids/
http://aperture.sourceforge.net/

-- 
Regards,
Shalin Shekhar Mangar.