You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2009/09/04 21:00:58 UTC
[jira] Created: (NUTCH-751) Upgrade version of HttpClient
Upgrade version of HttpClient
------------------------------
Key: NUTCH-751
URL: https://issues.apache.org/jira/browse/NUTCH-751
Project: Nutch
Issue Type: Improvement
Components: fetcher
Reporter: Julien Nioche
The existing version of commons http-client (3.01) should be replaced with the latest version from http://hc.apache.org/.
Currently the only way of using the https protocol is to enable http-client. The version 3.01 is bugged and causes a lot of issues which have been reported before. Apparently the new version has been redesigned and should fix them. The old v3.01 is too unstable to be used on a large scale.
I will try to send a patch in the next couple of weeks but would love to hear your thoughts on this.
J.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-751) Upgrade version of HttpClient
Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753175#action_12753175 ]
Julien Nioche commented on NUTCH-751:
-------------------------------------
Thanks for the pointer Ken, what will be very useful when I start looking into this
> Upgrade version of HttpClient
> ------------------------------
>
> Key: NUTCH-751
> URL: https://issues.apache.org/jira/browse/NUTCH-751
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Reporter: Julien Nioche
>
> The existing version of commons http-client (3.01) should be replaced with the latest version from http://hc.apache.org/.
> Currently the only way of using the https protocol is to enable http-client. The version 3.01 is bugged and causes a lot of issues which have been reported before. Apparently the new version has been redesigned and should fix them. The old v3.01 is too unstable to be used on a large scale.
>
> I will try to send a patch in the next couple of weeks but would love to hear your thoughts on this.
> J.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-751) Upgrade version of HttpClient
Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751893#action_12751893 ]
Andrzej Bialecki commented on NUTCH-751:
-----------------------------------------
In general, if new version of a third-party package doesn't cause regression then we should upgrade.
> Upgrade version of HttpClient
> ------------------------------
>
> Key: NUTCH-751
> URL: https://issues.apache.org/jira/browse/NUTCH-751
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Reporter: Julien Nioche
>
> The existing version of commons http-client (3.01) should be replaced with the latest version from http://hc.apache.org/.
> Currently the only way of using the https protocol is to enable http-client. The version 3.01 is bugged and causes a lot of issues which have been reported before. Apparently the new version has been redesigned and should fix them. The old v3.01 is too unstable to be used on a large scale.
>
> I will try to send a patch in the next couple of weeks but would love to hear your thoughts on this.
> J.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (NUTCH-751) Upgrade version of HttpClient
Posted by "Julien Nioche (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Nioche resolved NUTCH-751.
---------------------------------
Resolution: Later
The changes in the underlying API are quite substantial and this would need a bit of work. Maybe this could be done as part of crawler-commons? In the meantime I'll just mark it as 'later'
> Upgrade version of HttpClient
> ------------------------------
>
> Key: NUTCH-751
> URL: https://issues.apache.org/jira/browse/NUTCH-751
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Reporter: Julien Nioche
>
> The existing version of commons http-client (3.01) should be replaced with the latest version from http://hc.apache.org/.
> Currently the only way of using the https protocol is to enable http-client. The version 3.01 is bugged and causes a lot of issues which have been reported before. Apparently the new version has been redesigned and should fix them. The old v3.01 is too unstable to be used on a large scale.
>
> I will try to send a patch in the next couple of weeks but would love to hear your thoughts on this.
> J.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-751) Upgrade version of HttpClient
Posted by "Ken Krugler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798890#action_12798890 ]
Ken Krugler commented on NUTCH-751:
-----------------------------------
i agree that this should be in crawler-commons. E.g. I've recently made changes to avoid synchronization bottlenecks with HttpClient 4.0, and identified a few places in HC where things should be improved.
Though I'm concerned that the level of customization each crawler wants could result in a pretty ugly ball of code. For example, in Bixo I'm looking at how to use a streaming disk buffer for reads, to avoid OOM errors when many threads x big responses. How would that get implemented in a way that's friendly to Nutch, Droids & Heritrix?
If we could define some least-common-denominator API, that would be a good starting point. E.g. here are the set of config values, here are the set of parameters required when making a request, and here's the format of the response from a request.
> Upgrade version of HttpClient
> ------------------------------
>
> Key: NUTCH-751
> URL: https://issues.apache.org/jira/browse/NUTCH-751
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Reporter: Julien Nioche
>
> The existing version of commons http-client (3.01) should be replaced with the latest version from http://hc.apache.org/.
> Currently the only way of using the https protocol is to enable http-client. The version 3.01 is bugged and causes a lot of issues which have been reported before. Apparently the new version has been redesigned and should fix them. The old v3.01 is too unstable to be used on a large scale.
>
> I will try to send a patch in the next couple of weeks but would love to hear your thoughts on this.
> J.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-751) Upgrade version of HttpClient
Posted by "Ken Krugler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/NUTCH-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753069#action_12753069 ]
Ken Krugler commented on NUTCH-751:
-----------------------------------
I'm using HttpClient 4.0 in Bixo, and I agree that Nutch should upgrade.
But the API has been changed significantly, as I'm sure Julien has seen. Lots of improvements, but this will be a non-trivial patch.
There was a recent (Sept 2nd) post on the HttpClient list by Gerald Turner, and a response by Oleg, that contained a lot of useful info about migrating from 3.1 to 4.0
> Upgrade version of HttpClient
> ------------------------------
>
> Key: NUTCH-751
> URL: https://issues.apache.org/jira/browse/NUTCH-751
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Reporter: Julien Nioche
>
> The existing version of commons http-client (3.01) should be replaced with the latest version from http://hc.apache.org/.
> Currently the only way of using the https protocol is to enable http-client. The version 3.01 is bugged and causes a lot of issues which have been reported before. Apparently the new version has been redesigned and should fix them. The old v3.01 is too unstable to be used on a large scale.
>
> I will try to send a patch in the next couple of weeks but would love to hear your thoughts on this.
> J.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.