You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Fabrice Estiévenart <fa...@cetic.be> on 2009/08/12 09:51:39 UTC

Which Java objects to index a web page ?

Hello,

How can I use Nutch Java objects to index one (or a very limited set of) 
web page(s) without crawling them ?

Do I need to use the crawling tools (such as Injector, Generator, ...) 
or can I do it by the means of lower-level objects (Content, 
ParseResult, ...) ?

Thanks for your help,

Fabrice

Re: Which Java objects to index a web page ?

Posted by Fabrice Estiévenart <fa...@cetic.be>.
I like using Nutch for the crawlDB, scalability, threading, document 
parsing, ... but crawling is not important to me as I index targeted 
data sources.

Obviously, I'm using it with Solr for indexing and searching documents.

Fabrice

Alexander Aristov a écrit :
> Nutch primarily is a crawler. I would suggest you to take a look at solr
> which is just indexer and searcher. You may use it's API as well as open
> interfaces
>
> Best Regards
> Alexander Aristov
>
>
> 2009/8/12 Fabrice Estiévenart <fa...@cetic.be>
>
>   
>> Hello,
>>
>> How can I use Nutch Java objects to index one (or a very limited set of)
>> web page(s) without crawling them ?
>>
>> Do I need to use the crawling tools (such as Injector, Generator, ...) or
>> can I do it by the means of lower-level objects (Content, ParseResult, ...)
>> ?
>>
>> Thanks for your help,
>>
>> Fabrice
>>
>>     
>
>   


-- 
Fabrice Estiévenart, Ingénieur R&D, CETIC
Tél : +32 (0)71/49.07.28
Web : http://www.cetic.be


Re: Which Java objects to index a web page ?

Posted by Alexander Aristov <al...@gmail.com>.
Nutch primarily is a crawler. I would suggest you to take a look at solr
which is just indexer and searcher. You may use it's API as well as open
interfaces

Best Regards
Alexander Aristov


2009/8/12 Fabrice Estiévenart <fa...@cetic.be>

> Hello,
>
> How can I use Nutch Java objects to index one (or a very limited set of)
> web page(s) without crawling them ?
>
> Do I need to use the crawling tools (such as Injector, Generator, ...) or
> can I do it by the means of lower-level objects (Content, ParseResult, ...)
> ?
>
> Thanks for your help,
>
> Fabrice
>