You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Ahmad Al-Amri <am...@yahoo.com> on 2010/02/16 12:47:18 UTC

Inject and index single url

Hello;

I want to inject a single url which is given as a string, I am thinking an add a method in the Injector;; something like this:

injector.injectUrl(crawlDb, "http://example.com");

instead
of the current inject method, which I guess uses hadoop FileInputFormat
to get the urls and inject them into the crawldb... after that, I need
to index it only; I guess just use the current generator and other
stuff with depth equals to one doing it.

what I supposed to use for doing this; and any other missing information I should know ?!!

and is building a plug-in is more suitable for doing this.

thank you .



      

Re: Inject and index single url

Posted by xiao yang <ya...@gmail.com>.
There's no good way to do this.
I'm waiting for Hbase integration with Nutch, which will make this
operation much easier. The data store structure nutch is using now is
not suitable for adding a single url to the index as I know.

Thanks!
Xiao

On Tue, Feb 16, 2010 at 7:47 PM, Ahmad Al-Amri <am...@yahoo.com> wrote:
> Hello;
>
> I want to inject a single url which is given as a string, I am thinking an add a method in the Injector;; something like this:
>
> injector.injectUrl(crawlDb, "http://example.com");
>
> instead
> of the current inject method, which I guess uses hadoop FileInputFormat
> to get the urls and inject them into the crawldb... after that, I need
> to index it only; I guess just use the current generator and other
> stuff with depth equals to one doing it.
>
> what I supposed to use for doing this; and any other missing information I should know ?!!
>
> and is building a plug-in is more suitable for doing this.
>
> thank you .
>
>
>
>