You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Goldschmidt, Dave" <dg...@globalspec.com> on 2006/04/08 22:32:30 UTC

Add new content on the fly!

Hello,

 

Sorry if this topic has arisen before, but we're trying to enhance Nutch
to accept on-the-fly injections of new content.  In other words, we have
a crawler that feeds "page injection" commands to an HTTP server - this
server, in turn, adds the URL to the crawldb (if necessary), generates
the fetcher output, metadata, parsed content, etc. - then reindexes.
We're in the process of making this work.

 

Is this feasible on a large scale?  :-)   The business requirement
behind this is: company A has a search engine; company B pays company A
lots of money to include their content; company B expects injected
content to be available immediately.

 

I'm looking for constructive advice as to how to proceed - I'd be happy
to do the work to make this all happen, just need some guidance.

 

Thanks,

DaveG


Re: [Nutch-general] Add new content on the fly!

Posted by Kelvin Tan <ke...@relevanz.com>.
Dave, you could think about running a separate crawler to handle these ad-hoc requests, perform the crawl, generate the index, then merge with the "live" index. This will result in a shorter turn-around time for the paying customers anyhow..

kelvin

On Sat, 8 Apr 2006 16:32:30 -0400, Goldschmidt, Dave wrote:
> Hello,
>
>
> Sorry if this topic has arisen before, but we're trying to enhance
> Nutch to accept on-the-fly injections of new content.  In other
> words, we have a crawler that feeds "page injection" commands to an
> HTTP server - this server, in turn, adds the URL to the crawldb (if
> necessary), generates the fetcher output, metadata, parsed content,
> etc. - then reindexes. We're in the process of making this work.
>
>
> Is this feasible on a large scale?  :-)   The business requirement
> behind this is: company A has a search engine; company B pays
> company A lots of money to include their content; company B expects
> injected content to be available immediately.
>
>
> I'm looking for constructive advice as to how to proceed - I'd be
> happy to do the work to make this all happen, just need some
> guidance.
>
>
> Thanks,
>
> DaveG