You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by d_k <ma...@gmail.com> on 2014/04/04 16:14:07 UTC

upgrading protocol-httpclient to httpclient 4.1.1

I've written a patch for the 2.2.1 source code that upgrades the
protocol-httpclient to httpclient 4.1.1

Unfortunately I had to adjust the test because currently httpclient 4.1.1
does not support authenticating with different credentials against
different realms in the same domain:
HTTPCLIENT-1490<https://issues.apache.org/jira/browse/HTTPCLIENT-1490>
.

The reason I picked version 4.1.1 and not the latest is because I noticed
it is already in the build/lib dir and I wasn't sure I can use two versions
of the jar with the same namespace without creating conflicts.

My questions are:
1) Anyone needs this patch or did I took the wrong path in choosing 4.1.1?
2) If so, under what JIRA issue should I submit it? NUTCH-751? NUTCH-1086?
something else? new issue?

Re: upgrading protocol-httpclient to httpclient 4.1.1

Posted by d_k <ma...@gmail.com>.
i started reimplementing it with 4.3.3 and I though I should consult to
avoid rewriting it a 3rd time. :-P

When using HttpClient I would like to use the same HttpContext between
requests to maintain authentication state and because the fetcher thread is
unaware of the specific protocol in use (and it shouldn't) then I thought
to make the HttpContext object thread static in order to reuse it between
fetches.

I noticed the protocol object is usually cached and as such i can make the
HttpContext simply a member of the class but I wouldn't want to rely on the
fact the protocol object is cached because it might change in the future.

Is there a better way to maintain state/context in Protocol objects other
than thread static members?


On Sat, Apr 5, 2014 at 9:37 AM, d_k <ma...@gmail.com> wrote:

> Alright. I'll look into it. Thanks!
>
>
> On Sat, Apr 5, 2014 at 12:39 AM, Sebastian Nagel <
> wastl.nagel@googlemail.com> wrote:
>
>> > Define 'addressing'. :-)
>> > I didn't refactor because I don't really know which direction will be
>> the
>> > right direction for that plugin. So in a way the plugin is still the
>> same.
>> > All I did was to change all the API calls to httpclient 4.1.1 and check
>> > that the tests still run (it wasn't as easy as it sounds. :-P )
>>
>> That's at least something. Unfortunately, I never had a closer look to
>> the httpclient
>> plugin, and cannot estimate what level of rewriting is required.
>>
>> > So what you are saying is that I can make the protocol-httpclient use
>> the
>> > latest 4.3.x version without breaking anything?
>>
>> Yes, it should be possible. It happens just often that different versions
>> of a lib are used.
>>
>> > So what do you say? Should I redo it with 4.2.6? Go straight for 4.3.x?
>> > I would like to be able to provide a patch for 2.2.1 users and trunk
>> users
>> > considering i'm a 2.2.1 user myself.
>> > What would be the correct approach?
>>
>> Go straight for 4.3.x and not depend on indirectly on the Solr version.
>>
>> > What exactly do I need to change and where?
>> src/plugin/protocol-httpclient/ivy.xml
>>  -> add as dependency
>> src/plugin/protocol-httpclient/plugin.xml
>>  -> add as library
>>  -> add also transitive dependencies
>>
>> The best was is to have a look at another plugin, e.g.,
>> indexer-elastic
>>
>> > Will I still be able to use
>> > Eclipse or will it break because Eclipse won't know how to provide the
>> > correct dependency?
>>
>> You have to update the dependencies:
>> - if you use IvyDE : add the ivy.xml as IvyDE lib to Java build path
>> - if "ant eclipse": change ivy.xml, close the Eclipse project,
>>    call "ant eclipse", open project again and press F5 "Refresh"
>>
>> Sebastian
>>
>> On 04/04/2014 10:56 PM, d_k wrote:
>> > On Fri, Apr 4, 2014 at 11:28 PM, Sebastian Nagel <
>> wastl.nagel@googlemail.com
>> >> wrote:
>> >
>> >> Hi,
>> >>
>> >> does it mean you are (also) addressing NUTCH-1086? Would be great,
>> >> since this issue is waiting for a solution since long!
>> >>
>> >
>> > Define 'addressing'. :-)
>> > I didn't refactor because I don't really know which direction will be
>> the
>> > right direction for that plugin. So in a way the plugin is still the
>> same.
>> > All I did was to change all the API calls to httpclient 4.1.1 and check
>> > that the tests still run (it wasn't as easy as it sounds. :-P )
>> >
>> >
>> >>> The reason I picked version 4.1.1 and not the latest is because I
>> noticed
>> >>> it is already in the build/lib dir and I wasn't sure I can use two
>> >> versions
>> >>> of the jar with the same namespace without creating conflicts.
>> >>
>> >> You should be able to use any version of httpclient, but it must be
>> >> registered as dependency in the plugin's ivy.xml
>> >> (src/plugin/protocol-httpclient/ivy.xml),
>> >> not in the "main" ivy/ivy.xml.
>> >>
>> >
>> > Actually I didn't change any ivy xml. I just changed the code to use the
>> > new imports and it must have picked up the dependencies by itself. I
>> used
>> > Eclipse so maybe it has something to do with it.
>> >
>> >
>> >> Each plugin gets its own class loader to solve the problem of
>> conflicting
>> >> dependencies, see
>> >>
>> https://wiki.apache.org/nutch/WhatsTheProblemWithPluginsAndClass-loading
>> >>
>> >
>> > So what you are saying is that I can make the protocol-httpclient use
>> the
>> > latest 4.3.x version without breaking anything?
>> > What exactly do I need to change and where? Will I still be able to use
>> > Eclipse or will it break because Eclipse won't know how to provide the
>> > correct dependency?
>> >
>> >
>> >> I didn't check 2.2.1, but in head of 2.x httpclient 4.2.6 is a
>> dependency
>> >> of a dependency (solrj) of the indexer-solr plugin. The upgrade has
>> been
>> >> done
>> >> with NUTCH-1568.
>> >>
>> >
>> > So what do you say? Should I redo it with 4.2.6? Go straight for 4.3.x?
>> > I would like to be able to provide a patch for 2.2.1 users and trunk
>> users
>> > considering i'm a 2.2.1 user myself.
>> > What would be the correct approach?
>> >
>> >
>> >
>> >> Sebastian
>> >>
>> >> On 04/04/2014 04:14 PM, d_k wrote:
>> >>> I've written a patch for the 2.2.1 source code that upgrades the
>> >>> protocol-httpclient to httpclient 4.1.1
>> >>>
>> >>> Unfortunately I had to adjust the test because currently httpclient
>> 4.1.1
>> >>> does not support authenticating with different credentials against
>> >>> different realms in the same domain:
>> >>> HTTPCLIENT-1490<https://issues.apache.org/jira/browse/HTTPCLIENT-1490
>> >
>> >>> .
>> >>>
>> >>> The reason I picked version 4.1.1 and not the latest is because I
>> noticed
>> >>> it is already in the build/lib dir and I wasn't sure I can use two
>> >> versions
>> >>> of the jar with the same namespace without creating conflicts.
>> >>>
>> >>> My questions are:
>> >>> 1) Anyone needs this patch or did I took the wrong path in choosing
>> >> 4.1.1?
>> >>> 2) If so, under what JIRA issue should I submit it? NUTCH-751?
>> >> NUTCH-1086?
>> >>> something else? new issue?
>> >>>
>> >>
>> >>
>> >
>>
>>
>

Re: upgrading protocol-httpclient to httpclient 4.1.1

Posted by d_k <ma...@gmail.com>.
Alright. I'll look into it. Thanks!


On Sat, Apr 5, 2014 at 12:39 AM, Sebastian Nagel <wastl.nagel@googlemail.com
> wrote:

> > Define 'addressing'. :-)
> > I didn't refactor because I don't really know which direction will be the
> > right direction for that plugin. So in a way the plugin is still the
> same.
> > All I did was to change all the API calls to httpclient 4.1.1 and check
> > that the tests still run (it wasn't as easy as it sounds. :-P )
>
> That's at least something. Unfortunately, I never had a closer look to the
> httpclient
> plugin, and cannot estimate what level of rewriting is required.
>
> > So what you are saying is that I can make the protocol-httpclient use the
> > latest 4.3.x version without breaking anything?
>
> Yes, it should be possible. It happens just often that different versions
> of a lib are used.
>
> > So what do you say? Should I redo it with 4.2.6? Go straight for 4.3.x?
> > I would like to be able to provide a patch for 2.2.1 users and trunk
> users
> > considering i'm a 2.2.1 user myself.
> > What would be the correct approach?
>
> Go straight for 4.3.x and not depend on indirectly on the Solr version.
>
> > What exactly do I need to change and where?
> src/plugin/protocol-httpclient/ivy.xml
>  -> add as dependency
> src/plugin/protocol-httpclient/plugin.xml
>  -> add as library
>  -> add also transitive dependencies
>
> The best was is to have a look at another plugin, e.g.,
> indexer-elastic
>
> > Will I still be able to use
> > Eclipse or will it break because Eclipse won't know how to provide the
> > correct dependency?
>
> You have to update the dependencies:
> - if you use IvyDE : add the ivy.xml as IvyDE lib to Java build path
> - if "ant eclipse": change ivy.xml, close the Eclipse project,
>    call "ant eclipse", open project again and press F5 "Refresh"
>
> Sebastian
>
> On 04/04/2014 10:56 PM, d_k wrote:
> > On Fri, Apr 4, 2014 at 11:28 PM, Sebastian Nagel <
> wastl.nagel@googlemail.com
> >> wrote:
> >
> >> Hi,
> >>
> >> does it mean you are (also) addressing NUTCH-1086? Would be great,
> >> since this issue is waiting for a solution since long!
> >>
> >
> > Define 'addressing'. :-)
> > I didn't refactor because I don't really know which direction will be the
> > right direction for that plugin. So in a way the plugin is still the
> same.
> > All I did was to change all the API calls to httpclient 4.1.1 and check
> > that the tests still run (it wasn't as easy as it sounds. :-P )
> >
> >
> >>> The reason I picked version 4.1.1 and not the latest is because I
> noticed
> >>> it is already in the build/lib dir and I wasn't sure I can use two
> >> versions
> >>> of the jar with the same namespace without creating conflicts.
> >>
> >> You should be able to use any version of httpclient, but it must be
> >> registered as dependency in the plugin's ivy.xml
> >> (src/plugin/protocol-httpclient/ivy.xml),
> >> not in the "main" ivy/ivy.xml.
> >>
> >
> > Actually I didn't change any ivy xml. I just changed the code to use the
> > new imports and it must have picked up the dependencies by itself. I used
> > Eclipse so maybe it has something to do with it.
> >
> >
> >> Each plugin gets its own class loader to solve the problem of
> conflicting
> >> dependencies, see
> >>
> https://wiki.apache.org/nutch/WhatsTheProblemWithPluginsAndClass-loading
> >>
> >
> > So what you are saying is that I can make the protocol-httpclient use the
> > latest 4.3.x version without breaking anything?
> > What exactly do I need to change and where? Will I still be able to use
> > Eclipse or will it break because Eclipse won't know how to provide the
> > correct dependency?
> >
> >
> >> I didn't check 2.2.1, but in head of 2.x httpclient 4.2.6 is a
> dependency
> >> of a dependency (solrj) of the indexer-solr plugin. The upgrade has been
> >> done
> >> with NUTCH-1568.
> >>
> >
> > So what do you say? Should I redo it with 4.2.6? Go straight for 4.3.x?
> > I would like to be able to provide a patch for 2.2.1 users and trunk
> users
> > considering i'm a 2.2.1 user myself.
> > What would be the correct approach?
> >
> >
> >
> >> Sebastian
> >>
> >> On 04/04/2014 04:14 PM, d_k wrote:
> >>> I've written a patch for the 2.2.1 source code that upgrades the
> >>> protocol-httpclient to httpclient 4.1.1
> >>>
> >>> Unfortunately I had to adjust the test because currently httpclient
> 4.1.1
> >>> does not support authenticating with different credentials against
> >>> different realms in the same domain:
> >>> HTTPCLIENT-1490<https://issues.apache.org/jira/browse/HTTPCLIENT-1490>
> >>> .
> >>>
> >>> The reason I picked version 4.1.1 and not the latest is because I
> noticed
> >>> it is already in the build/lib dir and I wasn't sure I can use two
> >> versions
> >>> of the jar with the same namespace without creating conflicts.
> >>>
> >>> My questions are:
> >>> 1) Anyone needs this patch or did I took the wrong path in choosing
> >> 4.1.1?
> >>> 2) If so, under what JIRA issue should I submit it? NUTCH-751?
> >> NUTCH-1086?
> >>> something else? new issue?
> >>>
> >>
> >>
> >
>
>

Re: upgrading protocol-httpclient to httpclient 4.1.1

Posted by Sebastian Nagel <wa...@googlemail.com>.
> Define 'addressing'. :-)
> I didn't refactor because I don't really know which direction will be the
> right direction for that plugin. So in a way the plugin is still the same.
> All I did was to change all the API calls to httpclient 4.1.1 and check
> that the tests still run (it wasn't as easy as it sounds. :-P )

That's at least something. Unfortunately, I never had a closer look to the httpclient
plugin, and cannot estimate what level of rewriting is required.

> So what you are saying is that I can make the protocol-httpclient use the
> latest 4.3.x version without breaking anything?

Yes, it should be possible. It happens just often that different versions
of a lib are used.

> So what do you say? Should I redo it with 4.2.6? Go straight for 4.3.x?
> I would like to be able to provide a patch for 2.2.1 users and trunk users
> considering i'm a 2.2.1 user myself.
> What would be the correct approach?

Go straight for 4.3.x and not depend on indirectly on the Solr version.

> What exactly do I need to change and where?
src/plugin/protocol-httpclient/ivy.xml
 -> add as dependency
src/plugin/protocol-httpclient/plugin.xml
 -> add as library
 -> add also transitive dependencies

The best was is to have a look at another plugin, e.g.,
indexer-elastic

> Will I still be able to use
> Eclipse or will it break because Eclipse won't know how to provide the
> correct dependency?

You have to update the dependencies:
- if you use IvyDE : add the ivy.xml as IvyDE lib to Java build path
- if "ant eclipse": change ivy.xml, close the Eclipse project,
   call "ant eclipse", open project again and press F5 "Refresh"

Sebastian

On 04/04/2014 10:56 PM, d_k wrote:
> On Fri, Apr 4, 2014 at 11:28 PM, Sebastian Nagel <wastl.nagel@googlemail.com
>> wrote:
> 
>> Hi,
>>
>> does it mean you are (also) addressing NUTCH-1086? Would be great,
>> since this issue is waiting for a solution since long!
>>
> 
> Define 'addressing'. :-)
> I didn't refactor because I don't really know which direction will be the
> right direction for that plugin. So in a way the plugin is still the same.
> All I did was to change all the API calls to httpclient 4.1.1 and check
> that the tests still run (it wasn't as easy as it sounds. :-P )
> 
> 
>>> The reason I picked version 4.1.1 and not the latest is because I noticed
>>> it is already in the build/lib dir and I wasn't sure I can use two
>> versions
>>> of the jar with the same namespace without creating conflicts.
>>
>> You should be able to use any version of httpclient, but it must be
>> registered as dependency in the plugin's ivy.xml
>> (src/plugin/protocol-httpclient/ivy.xml),
>> not in the "main" ivy/ivy.xml.
>>
> 
> Actually I didn't change any ivy xml. I just changed the code to use the
> new imports and it must have picked up the dependencies by itself. I used
> Eclipse so maybe it has something to do with it.
> 
> 
>> Each plugin gets its own class loader to solve the problem of conflicting
>> dependencies, see
>> https://wiki.apache.org/nutch/WhatsTheProblemWithPluginsAndClass-loading
>>
> 
> So what you are saying is that I can make the protocol-httpclient use the
> latest 4.3.x version without breaking anything?
> What exactly do I need to change and where? Will I still be able to use
> Eclipse or will it break because Eclipse won't know how to provide the
> correct dependency?
> 
> 
>> I didn't check 2.2.1, but in head of 2.x httpclient 4.2.6 is a dependency
>> of a dependency (solrj) of the indexer-solr plugin. The upgrade has been
>> done
>> with NUTCH-1568.
>>
> 
> So what do you say? Should I redo it with 4.2.6? Go straight for 4.3.x?
> I would like to be able to provide a patch for 2.2.1 users and trunk users
> considering i'm a 2.2.1 user myself.
> What would be the correct approach?
> 
> 
> 
>> Sebastian
>>
>> On 04/04/2014 04:14 PM, d_k wrote:
>>> I've written a patch for the 2.2.1 source code that upgrades the
>>> protocol-httpclient to httpclient 4.1.1
>>>
>>> Unfortunately I had to adjust the test because currently httpclient 4.1.1
>>> does not support authenticating with different credentials against
>>> different realms in the same domain:
>>> HTTPCLIENT-1490<https://issues.apache.org/jira/browse/HTTPCLIENT-1490>
>>> .
>>>
>>> The reason I picked version 4.1.1 and not the latest is because I noticed
>>> it is already in the build/lib dir and I wasn't sure I can use two
>> versions
>>> of the jar with the same namespace without creating conflicts.
>>>
>>> My questions are:
>>> 1) Anyone needs this patch or did I took the wrong path in choosing
>> 4.1.1?
>>> 2) If so, under what JIRA issue should I submit it? NUTCH-751?
>> NUTCH-1086?
>>> something else? new issue?
>>>
>>
>>
> 


Re: upgrading protocol-httpclient to httpclient 4.1.1

Posted by d_k <ma...@gmail.com>.
On Fri, Apr 4, 2014 at 11:28 PM, Sebastian Nagel <wastl.nagel@googlemail.com
> wrote:

> Hi,
>
> does it mean you are (also) addressing NUTCH-1086? Would be great,
> since this issue is waiting for a solution since long!
>

Define 'addressing'. :-)
I didn't refactor because I don't really know which direction will be the
right direction for that plugin. So in a way the plugin is still the same.
All I did was to change all the API calls to httpclient 4.1.1 and check
that the tests still run (it wasn't as easy as it sounds. :-P )


> > The reason I picked version 4.1.1 and not the latest is because I noticed
> > it is already in the build/lib dir and I wasn't sure I can use two
> versions
> > of the jar with the same namespace without creating conflicts.
>
> You should be able to use any version of httpclient, but it must be
> registered as dependency in the plugin's ivy.xml
> (src/plugin/protocol-httpclient/ivy.xml),
> not in the "main" ivy/ivy.xml.
>

Actually I didn't change any ivy xml. I just changed the code to use the
new imports and it must have picked up the dependencies by itself. I used
Eclipse so maybe it has something to do with it.


> Each plugin gets its own class loader to solve the problem of conflicting
> dependencies, see
> https://wiki.apache.org/nutch/WhatsTheProblemWithPluginsAndClass-loading
>

So what you are saying is that I can make the protocol-httpclient use the
latest 4.3.x version without breaking anything?
What exactly do I need to change and where? Will I still be able to use
Eclipse or will it break because Eclipse won't know how to provide the
correct dependency?


> I didn't check 2.2.1, but in head of 2.x httpclient 4.2.6 is a dependency
> of a dependency (solrj) of the indexer-solr plugin. The upgrade has been
> done
> with NUTCH-1568.
>

So what do you say? Should I redo it with 4.2.6? Go straight for 4.3.x?
I would like to be able to provide a patch for 2.2.1 users and trunk users
considering i'm a 2.2.1 user myself.
What would be the correct approach?



> Sebastian
>
> On 04/04/2014 04:14 PM, d_k wrote:
> > I've written a patch for the 2.2.1 source code that upgrades the
> > protocol-httpclient to httpclient 4.1.1
> >
> > Unfortunately I had to adjust the test because currently httpclient 4.1.1
> > does not support authenticating with different credentials against
> > different realms in the same domain:
> > HTTPCLIENT-1490<https://issues.apache.org/jira/browse/HTTPCLIENT-1490>
> > .
> >
> > The reason I picked version 4.1.1 and not the latest is because I noticed
> > it is already in the build/lib dir and I wasn't sure I can use two
> versions
> > of the jar with the same namespace without creating conflicts.
> >
> > My questions are:
> > 1) Anyone needs this patch or did I took the wrong path in choosing
> 4.1.1?
> > 2) If so, under what JIRA issue should I submit it? NUTCH-751?
> NUTCH-1086?
> > something else? new issue?
> >
>
>

Re: upgrading protocol-httpclient to httpclient 4.1.1

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,

does it mean you are (also) addressing NUTCH-1086? Would be great,
since this issue is waiting for a solution since long!

> The reason I picked version 4.1.1 and not the latest is because I noticed
> it is already in the build/lib dir and I wasn't sure I can use two versions
> of the jar with the same namespace without creating conflicts.

You should be able to use any version of httpclient, but it must be
registered as dependency in the plugin's ivy.xml (src/plugin/protocol-httpclient/ivy.xml),
not in the "main" ivy/ivy.xml.

Each plugin gets its own class loader to solve the problem of conflicting
dependencies, see
https://wiki.apache.org/nutch/WhatsTheProblemWithPluginsAndClass-loading

I didn't check 2.2.1, but in head of 2.x httpclient 4.2.6 is a dependency
of a dependency (solrj) of the indexer-solr plugin. The upgrade has been done
with NUTCH-1568.

Sebastian

On 04/04/2014 04:14 PM, d_k wrote:
> I've written a patch for the 2.2.1 source code that upgrades the
> protocol-httpclient to httpclient 4.1.1
> 
> Unfortunately I had to adjust the test because currently httpclient 4.1.1
> does not support authenticating with different credentials against
> different realms in the same domain:
> HTTPCLIENT-1490<https://issues.apache.org/jira/browse/HTTPCLIENT-1490>
> .
> 
> The reason I picked version 4.1.1 and not the latest is because I noticed
> it is already in the build/lib dir and I wasn't sure I can use two versions
> of the jar with the same namespace without creating conflicts.
> 
> My questions are:
> 1) Anyone needs this patch or did I took the wrong path in choosing 4.1.1?
> 2) If so, under what JIRA issue should I submit it? NUTCH-751? NUTCH-1086?
> something else? new issue?
>