You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by David Rosenstrauch <da...@darose.net> on 2010/08/13 00:40:23 UTC

How to work around MAPREDUCE-1700

Anyone have any ideas how I might be able to work around 
https://issues.apache.org/jira/browse/MAPREDUCE-1700 ?  It's quite a 
thorny issue!

I have a M/R job that's using Avro (v1.3.3).  Avro, in turn, has a 
dependency on Jackson (of which I'm using v1.5.4).  I'm able to add the 
jars to the distributed cache fine, and my Mapper starts to run and load 
Avro ... and then blammo:  "Error: 
org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;"

The problem is that there's already an older (and obviously 
incompatible) version of Jackson (v1.0.1) that's already included in the 
Hadoop distribution.  And since that appears earlier on the classpath 
than my Jackson jars, I get the error.

There doesn't seem to be any elegant solution to this.  I can't 
downgrade to an earlier version of Avro, as my code relies on features 
in the newer version.  And there doesn't seem to be any way 
configuration-wise to solve this either (i.e., tell Hadoop to use the 
newer Jackson jars for my M/R job, or to add those jars earlier on the 
classpath).

Near as I can tell, the only solutions involve doing a hack on each of 
my slave nodes.  I.e., either:

a) removing the existing jackson jars on each slave.  (Since I have no 
need for the Hadoop feature that requires that Jackson.)

b) putting my newer jackson jars onto each slave in a place where it 
will be loaded before the older one (e.g., 
/usr/lib/hadoop-0.20/lib/aaaaa_jackson-core-asl-1.5.4.jar)

Either of these options is a bit of a hack - and error prone as well, 
since my job tasks will fail on any node that doesn't have this hack 
applied.


Is there any cleaner way to resolve this issue that I'm not seeing?

Thanks,

DR

Re: How to work around MAPREDUCE-1700

Posted by Ted Yu <yu...@gmail.com>.
Hopefully Cloudera will make a build for your needs.
Before that happens, you can produce your own installation.

We do this to produce HBase 0.20.6 + HBASE-2473

Cheers

On Thu, Aug 12, 2010 at 4:21 PM, David Rosenstrauch <da...@darose.net>wrote:

> On 08/12/2010 07:02 PM, Ted Yu wrote:
>
>> How about hack #3:
>> maintain your installation of hadoop where you replace jackson jar with
>> v1.5.4 jar ?
>>
>
> Thanks for the reply Ted.
>
> If I understand correctly, you're suggesting we keep our own customized
> hadoop installation, which we'd install on all the boxes in our cluster?
>
> Not too ideal a solution, though, and arguably harder to maintain than the
> other hacks.  We've standardized on using Cloudera's CDH for all our Hadoop
> boxes, so this would really impose even more of an admin burden.
>
> DR
>

Re: How to work around MAPREDUCE-1700

Posted by David Rosenstrauch <da...@darose.net>.
On 08/12/2010 07:02 PM, Ted Yu wrote:
> How about hack #3:
> maintain your installation of hadoop where you replace jackson jar with
> v1.5.4 jar ?

Thanks for the reply Ted.

If I understand correctly, you're suggesting we keep our own customized 
hadoop installation, which we'd install on all the boxes in our cluster?

Not too ideal a solution, though, and arguably harder to maintain than 
the other hacks.  We've standardized on using Cloudera's CDH for all our 
Hadoop boxes, so this would really impose even more of an admin burden.

DR

Re: How to work around MAPREDUCE-1700

Posted by Ted Yu <yu...@gmail.com>.
How about hack #3:
maintain your installation of hadoop where you replace jackson jar with
v1.5.4 jar ?

On Thu, Aug 12, 2010 at 3:40 PM, David Rosenstrauch <da...@darose.net>wrote:

> Anyone have any ideas how I might be able to work around
> https://issues.apache.org/jira/browse/MAPREDUCE-1700 ?  It's quite a
> thorny issue!
>
> I have a M/R job that's using Avro (v1.3.3).  Avro, in turn, has a
> dependency on Jackson (of which I'm using v1.5.4).  I'm able to add the jars
> to the distributed cache fine, and my Mapper starts to run and load Avro ...
> and then blammo:  "Error:
> org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;"
>
> The problem is that there's already an older (and obviously incompatible)
> version of Jackson (v1.0.1) that's already included in the Hadoop
> distribution.  And since that appears earlier on the classpath than my
> Jackson jars, I get the error.
>
> There doesn't seem to be any elegant solution to this.  I can't downgrade
> to an earlier version of Avro, as my code relies on features in the newer
> version.  And there doesn't seem to be any way configuration-wise to solve
> this either (i.e., tell Hadoop to use the newer Jackson jars for my M/R job,
> or to add those jars earlier on the classpath).
>
> Near as I can tell, the only solutions involve doing a hack on each of my
> slave nodes.  I.e., either:
>
> a) removing the existing jackson jars on each slave.  (Since I have no need
> for the Hadoop feature that requires that Jackson.)
>
> b) putting my newer jackson jars onto each slave in a place where it will
> be loaded before the older one (e.g.,
> /usr/lib/hadoop-0.20/lib/aaaaa_jackson-core-asl-1.5.4.jar)
>
> Either of these options is a bit of a hack - and error prone as well, since
> my job tasks will fail on any node that doesn't have this hack applied.
>
>
> Is there any cleaner way to resolve this issue that I'm not seeing?
>
> Thanks,
>
> DR
>

Re: How to work around MAPREDUCE-1700

Posted by Bo Shi <bs...@gmail.com>.
This tool might be a temporary band-aid for your pains:

http://code.google.com/p/jarjar/

On Thu, Aug 12, 2010 at 7:19 PM, Scott Carey <sc...@richrelevance.com> wrote:
> Yup, Hadoop's jar dependency management is poor.
>
> I just repackaged hadoop and removed jackson 1.0.1 jar from it, so no slaves have Jackson in their lib directory.  Its only used for the 'dump configuration in JSON format' feature, which I don't use (and a minor feature that adds a jar should probably have gotten a lot of scrutiny before adding the jar/feature).
>
> Alternatively, since Jackson 1.x is backwards compatible to 1.0, one can replace 1.0.1 with 1.x.x on the slaves or in the package.
>
>
> On Aug 12, 2010, at 3:40 PM, David Rosenstrauch wrote:
>
>> Anyone have any ideas how I might be able to work around
>> https://issues.apache.org/jira/browse/MAPREDUCE-1700 ?  It's quite a
>> thorny issue!
>>
>> I have a M/R job that's using Avro (v1.3.3).  Avro, in turn, has a
>> dependency on Jackson (of which I'm using v1.5.4).  I'm able to add the
>> jars to the distributed cache fine, and my Mapper starts to run and load
>> Avro ... and then blammo:  "Error:
>> org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;"
>>
>> The problem is that there's already an older (and obviously
>> incompatible) version of Jackson (v1.0.1) that's already included in the
>> Hadoop distribution.  And since that appears earlier on the classpath
>> than my Jackson jars, I get the error.
>>
>> There doesn't seem to be any elegant solution to this.  I can't
>> downgrade to an earlier version of Avro, as my code relies on features
>> in the newer version.  And there doesn't seem to be any way
>> configuration-wise to solve this either (i.e., tell Hadoop to use the
>> newer Jackson jars for my M/R job, or to add those jars earlier on the
>> classpath).
>>
>> Near as I can tell, the only solutions involve doing a hack on each of
>> my slave nodes.  I.e., either:
>>
>> a) removing the existing jackson jars on each slave.  (Since I have no
>> need for the Hadoop feature that requires that Jackson.)
>>
>> b) putting my newer jackson jars onto each slave in a place where it
>> will be loaded before the older one (e.g.,
>> /usr/lib/hadoop-0.20/lib/aaaaa_jackson-core-asl-1.5.4.jar)
>>
>> Either of these options is a bit of a hack - and error prone as well,
>> since my job tasks will fail on any node that doesn't have this hack
>> applied.
>>
>>
>> Is there any cleaner way to resolve this issue that I'm not seeing?
>>
>> Thanks,
>>
>> DR
>
>

Re: How to work around MAPREDUCE-1700

Posted by Tatu Saloranta <ts...@gmail.com>.
On Thu, Aug 12, 2010 at 4:19 PM, Scott Carey <sc...@richrelevance.com> wrote:
> Yup, Hadoop's jar dependency management is poor.
>
> I just repackaged hadoop and removed jackson 1.0.1 jar from it, so no slaves have Jackson in their lib directory.  Its only used for the 'dump configuration in JSON format' feature, which I don't use (and a minor feature that adds a jar should probably have gotten a lot of scrutiny before adding the jar/feature).
>
> Alternatively, since Jackson 1.x is backwards compatible to 1.0, one can replace 1.0.1 with 1.x.x on the slaves or in the package.

FWIW, it would be good to upgrade Jackson version to something more
modern -- incompatibility in question is very unfortunate (and
obviously accidental; minor versions should not introduce such
incompatibilities), and was caused by changing return type from void
to 'this' to allow method chaining (this actually is a major binary
incompatibility, but usually not source!) -- but upgrading to a later
version would be good anyway because 1.0.x itself is not being
maintained.
Safest bet would be latest 1.4.x or 1.5.x release. If any issues are
found please let me know and I can help.

Also: if one really wants to use 1.0.x jar, recompiling code that uses
it should be enough to make things work, since specific issue only
affects binary compatibility.

-+ Tatu +-

Re: How to work around MAPREDUCE-1700

Posted by Scott Carey <sc...@richrelevance.com>.
Yup, Hadoop's jar dependency management is poor.

I just repackaged hadoop and removed jackson 1.0.1 jar from it, so no slaves have Jackson in their lib directory.  Its only used for the 'dump configuration in JSON format' feature, which I don't use (and a minor feature that adds a jar should probably have gotten a lot of scrutiny before adding the jar/feature).  

Alternatively, since Jackson 1.x is backwards compatible to 1.0, one can replace 1.0.1 with 1.x.x on the slaves or in the package.


On Aug 12, 2010, at 3:40 PM, David Rosenstrauch wrote:

> Anyone have any ideas how I might be able to work around 
> https://issues.apache.org/jira/browse/MAPREDUCE-1700 ?  It's quite a 
> thorny issue!
> 
> I have a M/R job that's using Avro (v1.3.3).  Avro, in turn, has a 
> dependency on Jackson (of which I'm using v1.5.4).  I'm able to add the 
> jars to the distributed cache fine, and my Mapper starts to run and load 
> Avro ... and then blammo:  "Error: 
> org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus/jackson/JsonFactory;"
> 
> The problem is that there's already an older (and obviously 
> incompatible) version of Jackson (v1.0.1) that's already included in the 
> Hadoop distribution.  And since that appears earlier on the classpath 
> than my Jackson jars, I get the error.
> 
> There doesn't seem to be any elegant solution to this.  I can't 
> downgrade to an earlier version of Avro, as my code relies on features 
> in the newer version.  And there doesn't seem to be any way 
> configuration-wise to solve this either (i.e., tell Hadoop to use the 
> newer Jackson jars for my M/R job, or to add those jars earlier on the 
> classpath).
> 
> Near as I can tell, the only solutions involve doing a hack on each of 
> my slave nodes.  I.e., either:
> 
> a) removing the existing jackson jars on each slave.  (Since I have no 
> need for the Hadoop feature that requires that Jackson.)
> 
> b) putting my newer jackson jars onto each slave in a place where it 
> will be loaded before the older one (e.g., 
> /usr/lib/hadoop-0.20/lib/aaaaa_jackson-core-asl-1.5.4.jar)
> 
> Either of these options is a bit of a hack - and error prone as well, 
> since my job tasks will fail on any node that doesn't have this hack 
> applied.
> 
> 
> Is there any cleaner way to resolve this issue that I'm not seeing?
> 
> Thanks,
> 
> DR