You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/01/11 23:24:54 UTC

[jira] Commented: (NUTCH-751) Upgrade version of HttpClient

    [ https://issues.apache.org/jira/browse/NUTCH-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798890#action_12798890 ] 

Ken Krugler commented on NUTCH-751:
-----------------------------------

i agree that this should be in crawler-commons. E.g. I've recently made changes to avoid synchronization bottlenecks with HttpClient 4.0, and identified a few places in HC where things should be improved.

Though I'm concerned that the level of customization each crawler wants could result in a pretty ugly ball of code. For example, in Bixo I'm looking at how to use a streaming disk buffer for reads, to avoid OOM errors when many threads x big responses. How would that get implemented in a way that's friendly to Nutch, Droids & Heritrix?

If we could define some least-common-denominator API, that would be a good starting point. E.g. here are the set of config values, here are the set of parameters required when making a request, and here's the format of the response from a request.


> Upgrade version of HttpClient 
> ------------------------------
>
>                 Key: NUTCH-751
>                 URL: https://issues.apache.org/jira/browse/NUTCH-751
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>            Reporter: Julien Nioche
>
> The existing version of commons http-client (3.01) should be replaced with the latest version from http://hc.apache.org/.
> Currently the only way of using the https protocol is to enable http-client. The version 3.01 is bugged and causes a lot of issues which have been reported before. Apparently the new version has been redesigned and should fix them. The old v3.01 is too unstable to be used on a large scale.
>  
> I will try to send a patch in the next couple of weeks but would love to hear your thoughts on this.
> J.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.