You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Nayanish Hinge <na...@gmail.com> on 2010/09/02 15:57:18 UTC

Custom HTTP status handling for throttling

Hi,
Some website return HTTP 503 when they throttle hits.
I see that I need to re-implement the HttpBase.java to handle this as a
special case and put a retry logic (with some exponential back-off).
But in order to get HttpBase used by protocol-http and protocol-httpclient,
we need to override their plugin.xml.
Could we just update their plugin.xml and let them use our CustomHttpBase?

Has anyone tried this?
Thanks
-- 
Nayanish
Software Development Engineer
Amazon
Hyderabad

Re: Custom HTTP status handling for throttling

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Nayanish,

Hmmm, why don't you just create a new plugin, protocol-httpretry, and then implement your CustomHttpBase in there. You can reference the libraries and jars of other plugins with the plugin.xml "requires" directive, e.g.,

   <requires>
      <import plugin="lib-http"/>
   </requires>

Then, in your requires section you would put the dependent plugins in there and then implement your plugin's functionality. Then, you'd activate your custom protocol-httpretry plugin by going to $NUTCH/conf/nutch-default.xml and adding your protocol to the plugin.includes property.

HTH,
Chris


On 9/12/10 12:12 AM, "Nayanish Hinge" <na...@gmail.com> wrote:

Could somebody give me a hint here please

On Thu, Sep 2, 2010 at 7:27 PM, Nayanish Hinge <na...@gmail.com>wrote:

> Hi,
> Some website return HTTP 503 when they throttle hits.
> I see that I need to re-implement the HttpBase.java to handle this as a
> special case and put a retry logic (with some exponential back-off).
> But in order to get HttpBase used by protocol-http and protocol-httpclient,
> we need to override their plugin.xml.
> Could we just update their plugin.xml and let them use our CustomHttpBase?
>
> Has anyone tried this?
> Thanks
> --
> Nayanish
> Hyderabad
>



--
Nayanish
Hyderabad



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Custom HTTP status handling for throttling

Posted by Nayanish Hinge <na...@gmail.com>.
Could somebody give me a hint here please

On Thu, Sep 2, 2010 at 7:27 PM, Nayanish Hinge <na...@gmail.com>wrote:

> Hi,
> Some website return HTTP 503 when they throttle hits.
> I see that I need to re-implement the HttpBase.java to handle this as a
> special case and put a retry logic (with some exponential back-off).
> But in order to get HttpBase used by protocol-http and protocol-httpclient,
> we need to override their plugin.xml.
> Could we just update their plugin.xml and let them use our CustomHttpBase?
>
> Has anyone tried this?
> Thanks
> --
> Nayanish
> Hyderabad
>



-- 
Nayanish
Hyderabad