You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by lewis john mcgibbney <le...@apache.org> on 2016/10/18 06:38:38 UTC

Re: Nutch in production

Hi Sachin,
Answering both of your questions here as I am catching up with some mail.

On Fri, Sep 30, 2016 at 5:04 AM, <us...@nutch.apache.org> wrote:

>
> From: Sachin Shaju <sa...@mstack.com>
> To: user@nutch.apache.org
> Cc:
> Date: Fri, 30 Sep 2016 10:00:04 +0530
> Subject: Re: Nutch in production
> Thank you guys for your replies. I will look into the suggestions you gave.
> But I have one more query. How can I trigger nutch from a queue system in a
> distributed environment ?


Well this is a bit more tricky of course, as per my other mailing list
thread, you can easily use the REST API and the Nutchserver for publishing
Nutch workflows so I would advise you to look into that.


> Can REST api be a real option in distributed mode
> ?


As per my other thread... yes :) The one limitation is getting the injected
URLs into HDFS for use within the rest of the workflow.


> Or whether I will have to go for a command line invocation for nutch ?
>
>
I think that we need to provide a patch for Nutch trunk to enable ingestion
of the injected seeds into HDFS via the REST API. Right now this
functionality is lacking. I've created a ticket for it at
https://issues.apache.org/jira/browse/NUTCH-2327

We will try to address this before the pending Nutch 1.13 release however I
cannot promise anything.
Thanjs
Lewis