You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Fabrice Estiévenart <fa...@cetic.be> on 2009/08/12 09:51:39 UTC
Which Java objects to index a web page ?
Hello,
How can I use Nutch Java objects to index one (or a very limited set of)
web page(s) without crawling them ?
Do I need to use the crawling tools (such as Injector, Generator, ...)
or can I do it by the means of lower-level objects (Content,
ParseResult, ...) ?
Thanks for your help,
Fabrice
Re: Which Java objects to index a web page ?
Posted by Fabrice Estiévenart <fa...@cetic.be>.
I like using Nutch for the crawlDB, scalability, threading, document
parsing, ... but crawling is not important to me as I index targeted
data sources.
Obviously, I'm using it with Solr for indexing and searching documents.
Fabrice
Alexander Aristov a écrit :
> Nutch primarily is a crawler. I would suggest you to take a look at solr
> which is just indexer and searcher. You may use it's API as well as open
> interfaces
>
> Best Regards
> Alexander Aristov
>
>
> 2009/8/12 Fabrice Estiévenart <fa...@cetic.be>
>
>
>> Hello,
>>
>> How can I use Nutch Java objects to index one (or a very limited set of)
>> web page(s) without crawling them ?
>>
>> Do I need to use the crawling tools (such as Injector, Generator, ...) or
>> can I do it by the means of lower-level objects (Content, ParseResult, ...)
>> ?
>>
>> Thanks for your help,
>>
>> Fabrice
>>
>>
>
>
--
Fabrice Estiévenart, Ingénieur R&D, CETIC
Tél : +32 (0)71/49.07.28
Web : http://www.cetic.be
Re: Which Java objects to index a web page ?
Posted by Alexander Aristov <al...@gmail.com>.
Nutch primarily is a crawler. I would suggest you to take a look at solr
which is just indexer and searcher. You may use it's API as well as open
interfaces
Best Regards
Alexander Aristov
2009/8/12 Fabrice Estiévenart <fa...@cetic.be>
> Hello,
>
> How can I use Nutch Java objects to index one (or a very limited set of)
> web page(s) without crawling them ?
>
> Do I need to use the crawling tools (such as Injector, Generator, ...) or
> can I do it by the means of lower-level objects (Content, ParseResult, ...)
> ?
>
> Thanks for your help,
>
> Fabrice
>