You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Aaron Kimball <aa...@cloudera.com> on 2010/06/01 22:22:45 UTC

Re: Unable to read sequence file produced by MR job

James,

Which version of Hadoop are you using? HADOOP_CLASSPATH is specifically
intended to mean additional jars which are added to the list of jars Hadoop
itself adds to the classpath.

What's the exact command-line you're trying?

- Aaron

On Fri, May 28, 2010 at 3:12 PM, James Hammerton <
james.hammerton@mendeley.com> wrote:

> Thanks.
>
> I've found that setting the classpath as you suggested gets the command to
> work at the expense of the "hadoop jar" command I use to submit jobs no
> longer finding the hadoop libraries!
>
> Even adding the directories onto the classpath along with the .jar file
> does not fix this. I've worked around it by running the command for viewing
> sequence files from within a script that sets the classpath. Any ideas how
> to get both the .jar file and the hadoop libraries into the classpath
> together so that both job submission and the other commands work?
>
> Regards,
>
> James
>
>
> On Thu, May 27, 2010 at 7:38 PM, Aaron Kimball <aa...@cloudera.com> wrote:
>
>> Put your jar on Hadoop's classpath:
>>
>> $ HADOOP_CLASSPATH=path/to/shortdocwritables.jar hadoop fs -text bla
>> bla....
>>
>> - Aaron
>>
>>
>> On Thu, May 27, 2010 at 11:07 AM, James Hammerton <
>> james.hammerton@mendeley.com> wrote:
>>
>>> Hi,
>>>
>>> I tried using the "hadoop fs -text" command to read a sequence file
>>> generated by a map reduce job and got the following error:
>>>
>>> text: java.io.IOException: WritableName can't load class:
>>> com.mendeley.clusterer.title.ShortDocWritables
>>>
>>> The ShortDocWritables is a Writable I created myself and the sequencefile
>>> contains these objects. How do I tell this command where to find the class?
>>>
>>> There was no trouble at all running the map reduce job that produced the
>>> file.
>>>
>>> James
>>>
>>> --
>>> James Hammerton | Senior Data Mining Engineer
>>> www.mendeley.com/profiles/james-hammerton
>>>
>>> Mendeley Limited | London, UK | www.mendeley.com
>>> Registered in England and Wales | Company Number 6419015
>>>
>>>
>>>
>>>
>>
>
>
> --
> James Hammerton | Senior Data Mining Engineer
> www.mendeley.com/profiles/james-hammerton
>
> Mendeley Limited | London, UK | www.mendeley.com
> Registered in England and Wales | Company Number 6419015
>
>
>
>

Re: Unable to read sequence file produced by MR job

Posted by James Hammerton <ja...@mendeley.com>.
Ted, Aaron,
Thanks for your help.

I'm using HBase v 20.3 (at some point we'll upgrade to 20.4).

It turns out when I run the "hadoop jar mendeley-all.jar ..." command
mendeley-all.jar includes the hbase and zookeeper jars inside it, thus
without setting HADOOP_CLASSPATH, hadoop knows where everything needed to
run the job is.

When I set HADOOP_CLASSPATH, it doesn't find either the hbase.jar or the
zookeeper.jar files despite them being inside mendeley-all.jar (in the lib
directory), but including them explicitly in HADOOP_CLASSPATH fixes this.

I don't know why it finds the hbase jar in mendeley-all.jar in one case but
not the other.

James

On Thu, Jun 3, 2010 at 5:16 PM, Ted Yu <yu...@gmail.com> wrote:

> By default, bin/hadoop wouldn't include hbase jars in CLASSPATH.
> How was hbase client jar get included when HADOOP_CLASSPATH wasn't set ?
>
> Just include hbase client jar in HADOOP_CLASSPATH.
>
>
> On Thu, Jun 3, 2010 at 9:09 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> What's the version of HBase you're using ?
>>
>>
>> On Thu, Jun 3, 2010 at 2:25 AM, James Hammerton <
>> james.hammerton@mendeley.com> wrote:
>>
>>> I was running the commands from the same directory that foo.jar was
>>> located in. When I submit the job, foo.jar is found, but if HADOOP_CLASSPATH
>>> is set the HBase libraries for Hadoop aren't found. E.g. I get errors like:
>>>
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/hadoop/hbase/client/Scan
>>>
>>> Having played around a bit more now, it seems that pure Hadoop code that
>>> doesn't use HBase seems to work whilst code that uses HBase doesn't, yet the
>>> latter works if I don't set HADOOP_CLASSPATH, but perhaps this is an issue
>>> with my HBase install?
>>>
>>> James
>>>
>>>
>>> On Thu, Jun 3, 2010 at 3:52 AM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> The four variations boil down to the same thing - specifying where
>>>> foo.jar is.
>>>> Did you specify the path to foo.jar ?
>>>>
>>>> What error did you get when you tried to submit jobs with
>>>> HADOOP_CLASSPATH set ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Wed, Jun 2, 2010 at 3:08 PM, James Hammerton <
>>>> james.hammerton@mendeley.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The version is 0.20.1+169.88.
>>>>>
>>>>> To submit a job I type e.g.:
>>>>>
>>>>> hadoop jar foo.jar com.mendeley.SomeClass arg1 arg2...
>>>>>
>>>>> When the classpath is set I've also tried
>>>>>
>>>>> hadoop com.mendeley.SomeClass arg1 arg2...
>>>>>
>>>>> I'm using Ubuntu 9.10, and bash is the shell I'm using.
>>>>>
>>>>> To view a sequence file:
>>>>>
>>>>> hadoop fs -text foo.seq
>>>>>
>>>>> When I set the HADOOP_CLASSPATH, I've tried the following variations
>>>>>
>>>>> export HADOOP_CLASSPATH=foo.jar
>>>>> export HADOOP_CLASSPATH=foo.jar:$HADOOP_CLASSPATH
>>>>> export HADOOP_CLASSPATH=foo.jar:/usr/lib/hadoop/lib/
>>>>> export HADOOP_CLASSPATH=foo.jar:/usr/lib/hadoop-0.20/lib/
>>>>>
>>>>> Hadoop was installed via the Ubuntu repositories, i.e. using apt-get.
>>>>> It seems I can either set the HADOOP_CLASSPATH or submit a job but not both.
>>>>>
>>>>> Regards,
>>>>>
>>>>> James
>>>>>
>>>>>
>>>>> On Tue, Jun 1, 2010 at 9:22 PM, Aaron Kimball <aa...@cloudera.com>wrote:
>>>>>
>>>>>> James,
>>>>>>
>>>>>> Which version of Hadoop are you using? HADOOP_CLASSPATH is
>>>>>> specifically intended to mean additional jars which are added to the list of
>>>>>> jars Hadoop itself adds to the classpath.
>>>>>>
>>>>>> What's the exact command-line you're trying?
>>>>>>
>>>>>> - Aaron
>>>>>>
>>>>>>
>>>>>> On Fri, May 28, 2010 at 3:12 PM, James Hammerton <
>>>>>> james.hammerton@mendeley.com> wrote:
>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> I've found that setting the classpath as you suggested gets the
>>>>>>> command to work at the expense of the "hadoop jar" command I use to submit
>>>>>>> jobs no longer finding the hadoop libraries!
>>>>>>>
>>>>>>> Even adding the directories onto the classpath along with the .jar
>>>>>>> file does not fix this. I've worked around it by running the command for
>>>>>>> viewing sequence files from within a script that sets the classpath. Any
>>>>>>> ideas how to get both the .jar file and the hadoop libraries into the
>>>>>>> classpath together so that both job submission and the other commands work?
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> James
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 27, 2010 at 7:38 PM, Aaron Kimball <aa...@cloudera.com>wrote:
>>>>>>>
>>>>>>>> Put your jar on Hadoop's classpath:
>>>>>>>>
>>>>>>>> $ HADOOP_CLASSPATH=path/to/shortdocwritables.jar hadoop fs -text bla
>>>>>>>> bla....
>>>>>>>>
>>>>>>>> - Aaron
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, May 27, 2010 at 11:07 AM, James Hammerton <
>>>>>>>> james.hammerton@mendeley.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I tried using the "hadoop fs -text" command to read a sequence file
>>>>>>>>> generated by a map reduce job and got the following error:
>>>>>>>>>
>>>>>>>>> text: java.io.IOException: WritableName can't load class:
>>>>>>>>> com.mendeley.clusterer.title.ShortDocWritables
>>>>>>>>>
>>>>>>>>> The ShortDocWritables is a Writable I created myself and the
>>>>>>>>> sequencefile contains these objects. How do I tell this command where to
>>>>>>>>> find the class?
>>>>>>>>>
>>>>>>>>> There was no trouble at all running the map reduce job that
>>>>>>>>> produced the file.
>>>>>>>>>
>>>>>>>>> James
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> James Hammerton | Senior Data Mining Engineer
>>>>>>>>> www.mendeley.com/profiles/james-hammerton
>>>>>>>>>
>>>>>>>>> Mendeley Limited | London, UK | www.mendeley.com
>>>>>>>>> Registered in England and Wales | Company Number 6419015
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> James Hammerton | Senior Data Mining Engineer
>>>>>>> www.mendeley.com/profiles/james-hammerton
>>>>>>>
>>>>>>> Mendeley Limited | London, UK | www.mendeley.com
>>>>>>> Registered in England and Wales | Company Number 6419015
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> James Hammerton | Senior Data Mining Engineer
>>>>> www.mendeley.com/profiles/james-hammerton
>>>>>
>>>>> Mendeley Limited | London, UK | www.mendeley.com
>>>>> Registered in England and Wales | Company Number 6419015
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> James Hammerton | Senior Data Mining Engineer
>>> www.mendeley.com/profiles/james-hammerton
>>>
>>> Mendeley Limited | London, UK | www.mendeley.com
>>> Registered in England and Wales | Company Number 6419015
>>>
>>>
>>>
>>>
>>
>


-- 
James Hammerton | Senior Data Mining Engineer
www.mendeley.com/profiles/james-hammerton

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

Re: Unable to read sequence file produced by MR job

Posted by James Hammerton <ja...@mendeley.com>.
Hi,

The version is 0.20.1+169.88.

To submit a job I type e.g.:

hadoop jar foo.jar com.mendeley.SomeClass arg1 arg2...

When the classpath is set I've also tried

hadoop com.mendeley.SomeClass arg1 arg2...

I'm using Ubuntu 9.10, and bash is the shell I'm using.

To view a sequence file:

hadoop fs -text foo.seq

When I set the HADOOP_CLASSPATH, I've tried the following variations

export HADOOP_CLASSPATH=foo.jar
export HADOOP_CLASSPATH=foo.jar:$HADOOP_CLASSPATH
export HADOOP_CLASSPATH=foo.jar:/usr/lib/hadoop/lib/
export HADOOP_CLASSPATH=foo.jar:/usr/lib/hadoop-0.20/lib/

Hadoop was installed via the Ubuntu repositories, i.e. using apt-get. It
seems I can either set the HADOOP_CLASSPATH or submit a job but not both.

Regards,

James

On Tue, Jun 1, 2010 at 9:22 PM, Aaron Kimball <aa...@cloudera.com> wrote:

> James,
>
> Which version of Hadoop are you using? HADOOP_CLASSPATH is specifically
> intended to mean additional jars which are added to the list of jars Hadoop
> itself adds to the classpath.
>
> What's the exact command-line you're trying?
>
> - Aaron
>
>
> On Fri, May 28, 2010 at 3:12 PM, James Hammerton <
> james.hammerton@mendeley.com> wrote:
>
>> Thanks.
>>
>> I've found that setting the classpath as you suggested gets the command to
>> work at the expense of the "hadoop jar" command I use to submit jobs no
>> longer finding the hadoop libraries!
>>
>> Even adding the directories onto the classpath along with the .jar file
>> does not fix this. I've worked around it by running the command for viewing
>> sequence files from within a script that sets the classpath. Any ideas how
>> to get both the .jar file and the hadoop libraries into the classpath
>> together so that both job submission and the other commands work?
>>
>> Regards,
>>
>> James
>>
>>
>> On Thu, May 27, 2010 at 7:38 PM, Aaron Kimball <aa...@cloudera.com>wrote:
>>
>>> Put your jar on Hadoop's classpath:
>>>
>>> $ HADOOP_CLASSPATH=path/to/shortdocwritables.jar hadoop fs -text bla
>>> bla....
>>>
>>> - Aaron
>>>
>>>
>>> On Thu, May 27, 2010 at 11:07 AM, James Hammerton <
>>> james.hammerton@mendeley.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I tried using the "hadoop fs -text" command to read a sequence file
>>>> generated by a map reduce job and got the following error:
>>>>
>>>> text: java.io.IOException: WritableName can't load class:
>>>> com.mendeley.clusterer.title.ShortDocWritables
>>>>
>>>> The ShortDocWritables is a Writable I created myself and the
>>>> sequencefile contains these objects. How do I tell this command where to
>>>> find the class?
>>>>
>>>> There was no trouble at all running the map reduce job that produced the
>>>> file.
>>>>
>>>> James
>>>>
>>>> --
>>>> James Hammerton | Senior Data Mining Engineer
>>>> www.mendeley.com/profiles/james-hammerton
>>>>
>>>> Mendeley Limited | London, UK | www.mendeley.com
>>>> Registered in England and Wales | Company Number 6419015
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> James Hammerton | Senior Data Mining Engineer
>> www.mendeley.com/profiles/james-hammerton
>>
>> Mendeley Limited | London, UK | www.mendeley.com
>> Registered in England and Wales | Company Number 6419015
>>
>>
>>
>>
>


-- 
James Hammerton | Senior Data Mining Engineer
www.mendeley.com/profiles/james-hammerton

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015