You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by adfel70 <ad...@gmail.com> on 2015/06/16 16:23:02 UTC

mapreduce job using soirj 5

Hi, 

We recently started testing solr 5, our indexer creates mapreduce job that
uses solrj5 to index documents to our SolrCloud. Until now, we used solr
4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5.

The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed
with httpclient-4.2.5
and that causing us jar-hell because hadoop jars are being loaded first and
solrj is using closeablehttpclient class which is in 4.3.1 but not in 4.2.5

Does anyone encounter that? and have a solution? or a workaround?

Right now we are replacing the jar physically in each data node





--
View this message in context: http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: mapreduce job using soirj 5

Posted by "Shenghua(Daniel) Wan" <wa...@gmail.com>.

Hadoop has a switch that lets you use your jar rather than the one hadoop
carries.
google for HADOOP_OPTS
good luck.

On Tue, Jun 16, 2015 at 7:23 AM, adfel70 <ad...@gmail.com> wrote:

> Hi,
>
> We recently started testing solr 5, our indexer creates mapreduce job that
> uses solrj5 to index documents to our SolrCloud. Until now, we used solr
> 4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5.
>
> The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed
> with httpclient-4.2.5
> and that causing us jar-hell because hadoop jars are being loaded first and
> solrj is using closeablehttpclient class which is in 4.3.1 but not in 4.2.5
>
> Does anyone encounter that? and have a solution? or a workaround?
>
> Right now we are replacing the jar physically in each data node
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

Regards,
Shenghua (Daniel) Wan

Re: mapreduce job using soirj 5

Posted by "Shenghua(Daniel) Wan" <wa...@gmail.com>.

Check mapreduce.task.classpath.user.precedence  and its equivalent property
in different hadoop version.
HADOOP_OPTS needs to work with this property being set to true.
I met problem like yours. And playing with these parameters solved my
problem.



On Wed, Jun 17, 2015 at 12:28 AM, adfel70 <ad...@gmail.com> wrote:

> We cannot downgrade httpclient in solrj5 because its using new features and
> we dont want to start altering solr code, anyway we thought about upgrading
> httpclient in hadoop but as Erick said its sounds more work than just put
> the jar in the data nodes.
>
> About that flag we tried it, hadoop even has an environment variable
> HADOOP_USER_CLASSPATH_FIRST but all our tests with that flag failed.
>
> We thought this is an issue that is more likely that solr users will
> encounter rather than cloudera users, so we will be glad for a more elegant
> solution or workaround than to replace the httpclient jar in the data nodes
>
> Thank you all for your responses
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199p4212350.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

Regards,
Shenghua (Daniel) Wan

Re: mapreduce job using soirj 5

Posted by Mark Miller <ma...@gmail.com>.

I think there is some better classpath isolation options in the works for
Hadoop. As it is, there is some harmonization that has to be done depending
on versions used, and it can get tricky.

- Mark

On Wed, Jun 17, 2015 at 9:52 AM Erick Erickson <er...@gmail.com>
wrote:

> For sure there are a few rough edges here....
>
> On Wed, Jun 17, 2015 at 12:28 AM, adfel70 <ad...@gmail.com> wrote:
> > We cannot downgrade httpclient in solrj5 because its using new features
> and
> > we dont want to start altering solr code, anyway we thought about
> upgrading
> > httpclient in hadoop but as Erick said its sounds more work than just put
> > the jar in the data nodes.
> >
> > About that flag we tried it, hadoop even has an environment variable
> > HADOOP_USER_CLASSPATH_FIRST but all our tests with that flag failed.
> >
> > We thought this is an issue that is more likely that solr users will
> > encounter rather than cloudera users, so we will be glad for a more
> elegant
> > solution or workaround than to replace the httpclient jar in the data
> nodes
> >
> > Thank you all for your responses
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199p4212350.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
- Mark
about.me/markrmiller

Re: mapreduce job using soirj 5

Posted by Erick Erickson <er...@gmail.com>.

For sure there are a few rough edges here....

On Wed, Jun 17, 2015 at 12:28 AM, adfel70 <ad...@gmail.com> wrote:
> We cannot downgrade httpclient in solrj5 because its using new features and
> we dont want to start altering solr code, anyway we thought about upgrading
> httpclient in hadoop but as Erick said its sounds more work than just put
> the jar in the data nodes.
>
> About that flag we tried it, hadoop even has an environment variable
> HADOOP_USER_CLASSPATH_FIRST but all our tests with that flag failed.
>
> We thought this is an issue that is more likely that solr users will
> encounter rather than cloudera users, so we will be glad for a more elegant
> solution or workaround than to replace the httpclient jar in the data nodes
>
> Thank you all for your responses
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199p4212350.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: mapreduce job using soirj 5

Posted by adfel70 <ad...@gmail.com>.

We cannot downgrade httpclient in solrj5 because its using new features and
we dont want to start altering solr code, anyway we thought about upgrading
httpclient in hadoop but as Erick said its sounds more work than just put
the jar in the data nodes.

About that flag we tried it, hadoop even has an environment variable
HADOOP_USER_CLASSPATH_FIRST but all our tests with that flag failed.

We thought this is an issue that is more likely that solr users will
encounter rather than cloudera users, so we will be glad for a more elegant
solution or workaround than to replace the httpclient jar in the data nodes

Thank you all for your responses



--
View this message in context: http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199p4212350.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: mapreduce job using soirj 5

Posted by Shawn Heisey <ap...@elyograg.org>.

On 6/16/2015 9:24 AM, Erick Erickson wrote:
> Sounds like a question better asked in one of the Cloudera support
> forums, 'cause all I can do is guess ;).
>
> I suppose, theoretically, that you could check out the Solr5
> code and substitute the httpclient-4.2.5.jar in the build system,
> recompile and go, but that's totally a guess based on zero knowledge
> of whether compiling Solr with an earlier httpclient would even work.
> Frankly, though, that sounds like more work than distributing the older
> jar to the data nodes.
>
> Best,
> Erick
>
> On Tue, Jun 16, 2015 at 7:23 AM, adfel70 <ad...@gmail.com> wrote:
>> Hi,
>>
>> We recently started testing solr 5, our indexer creates mapreduce job that
>> uses solrj5 to index documents to our SolrCloud. Until now, we used solr
>> 4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5.
>>
>> The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed
>> with httpclient-4.2.5

In addition to what Erick said:  When I upgraded the build system in
Solr to from HttpClient 4.2 to 4.3, no code changes were required.  It
worked immediately, and all tests passed.  It is likely that you can
simply use HttpClient 4.3.1 everywhere and hadoop will work properly. 
This is one of Apache's design goals for software libraries.  It's not
always possible to achieve it, but it is something we always try to do.

Thanks,
Shawn

Re: mapreduce job using soirj 5

Posted by Erick Erickson <er...@gmail.com>.

Sounds like a question better asked in one of the Cloudera support
forums, 'cause all I can do is guess ;).

I suppose, theoretically, that you could check out the Solr5
code and substitute the httpclient-4.2.5.jar in the build system,
recompile and go, but that's totally a guess based on zero knowledge
of whether compiling Solr with an earlier httpclient would even work.
Frankly, though, that sounds like more work than distributing the older
jar to the data nodes.

Best,
Erick

On Tue, Jun 16, 2015 at 7:23 AM, adfel70 <ad...@gmail.com> wrote:
> Hi,
>
> We recently started testing solr 5, our indexer creates mapreduce job that
> uses solrj5 to index documents to our SolrCloud. Until now, we used solr
> 4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5.
>
> The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed
> with httpclient-4.2.5
> and that causing us jar-hell because hadoop jars are being loaded first and
> solrj is using closeablehttpclient class which is in 4.3.1 but not in 4.2.5
>
> Does anyone encounter that? and have a solution? or a workaround?
>
> Right now we are replacing the jar physically in each data node
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199.html
> Sent from the Solr - User mailing list archive at Nabble.com.