You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by lars hofhansl <lh...@yahoo.com> on 2012/02/23 03:36:08 UTC

Some HBase M/R confusion

According to the documentation there are two ways to run HBase M/R jobs:


1. The HBase book states to run M/R jobs like export here: http://hbase.apache.org/book/ops_mgt.html#export
bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]

2. Whereas the Javadoc says here: http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.90.0.jar export ...


In the first case (#1) I find that the job allways fails to create the output dir:
java.io.IOException: Mkdirs failed to create file:/exports/_temporary/_attempt_local_0001_m_000000_0
    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:378)

...


In the 2nd case (#2) I get past the creation of the output dir, and then it fails because it cannot find class com.google.protobuf.Message.
I am using the HBase security branch and find that I need to add com.google.protobuf.Message.class in TableMapReduceUtil.addDependencyJars.
If I do that, I can successfully run an export jobs using method #2.


The 2nd issue I found looks like a bug with the HBase security branch.
I am not sure about the first issue, is the documentation in the HBase book outdated?


-- Lars


Re: Some HBase M/R confusion

Posted by lars hofhansl <lh...@yahoo.com>.
Then we should rename it already :)

On the site it is still called "The HBase Book".


For record. For the first approach I had forgotten to add hadoop/conf to my hbase classpath.
It also works if I add -conf hadoop/conf/core-site.xml to the export command.


-- Lars



________________________________
 From: Stack <st...@duboce.net>
To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com> 
Sent: Wednesday, February 22, 2012 9:20 PM
Subject: Re: Some HBase M/R confusion
 
On Wed, Feb 22, 2012 at 9:14 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Either way we need to update the book I think.
>

Its not 'the book'.  Its the 'reference guide'.  One if published by
O'Reilly.  The latter by our man Doug Meil.


> As for the protobufs. This is trunk, and it looks like this is related to HBASE-5394. Happens also in the non-secure branch.
> Filed HBASE-5460. I assume we just add protobufs as jar dependency, I will do that tonight.
>

Yeah, that should do it (above sounds reasonable).

St.Ack

Re: book -> 'ref guide' rename...

Posted by Doug Meil <do...@explorysmedical.com>.
It's official.. it is now The Reference Guide.





On 2/23/12 2:47 PM, "Doug Meil" <do...@explorysmedical.com> wrote:

>
>Hi folks-
>
>Regarding the rename, the deployment of the book rename will happen by end
>of day.  
>
>https://issues.apache.org/jira/browse/HBASE-5465
>
>The files on the website will still have the same names (e.g., book.html,
>/book/book.html), so this is content-only at this point.
>
>
>
>On 2/23/12 12:20 AM, "Stack" <st...@duboce.net> wrote:
>
>>On Wed, Feb 22, 2012 at 9:14 PM, lars hofhansl <lh...@yahoo.com>
>>wrote:
>>> Either way we need to update the book I think.
>>>
>>
>>Its not 'the book'.  Its the 'reference guide'.  One if published by
>>O'Reilly.  The latter by our man Doug Meil.
>>
>>
>>> As for the protobufs. This is trunk, and it looks like this is related
>>>to HBASE-5394. Happens also in the non-secure branch.
>>> Filed HBASE-5460. I assume we just add protobufs as jar dependency, I
>>>will do that tonight.
>>>
>>
>>Yeah, that should do it (above sounds reasonable).
>>
>>St.Ack
>>
>
>
>



book -> 'ref guide' rename...

Posted by Doug Meil <do...@explorysmedical.com>.
Hi folks-

Regarding the rename, the deployment of the book rename will happen by end
of day.  

https://issues.apache.org/jira/browse/HBASE-5465

The files on the website will still have the same names (e.g., book.html,
/book/book.html), so this is content-only at this point.



On 2/23/12 12:20 AM, "Stack" <st...@duboce.net> wrote:

>On Wed, Feb 22, 2012 at 9:14 PM, lars hofhansl <lh...@yahoo.com>
>wrote:
>> Either way we need to update the book I think.
>>
>
>Its not 'the book'.  Its the 'reference guide'.  One if published by
>O'Reilly.  The latter by our man Doug Meil.
>
>
>> As for the protobufs. This is trunk, and it looks like this is related
>>to HBASE-5394. Happens also in the non-secure branch.
>> Filed HBASE-5460. I assume we just add protobufs as jar dependency, I
>>will do that tonight.
>>
>
>Yeah, that should do it (above sounds reasonable).
>
>St.Ack
>



Re: Some HBase M/R confusion

Posted by Stack <st...@duboce.net>.
On Wed, Feb 22, 2012 at 9:14 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Either way we need to update the book I think.
>

Its not 'the book'.  Its the 'reference guide'.  One if published by
O'Reilly.  The latter by our man Doug Meil.


> As for the protobufs. This is trunk, and it looks like this is related to HBASE-5394. Happens also in the non-secure branch.
> Filed HBASE-5460. I assume we just add protobufs as jar dependency, I will do that tonight.
>

Yeah, that should do it (above sounds reasonable).

St.Ack

Re: Some HBase M/R confusion

Posted by lars hofhansl <lh...@yahoo.com>.
Thanks Stack.

Missed the "file:" part in the first case... Stupid... Must pickup an hbase-site.xml from somewhere else (or more likely just using the defaults, because it can't fine one). 

Either way we need to update the book I think.


As for the protobufs. This is trunk, and it looks like this is related to HBASE-5394. Happens also in the non-secure branch.
Filed HBASE-5460. I assume we just add protobufs as jar dependency, I will do that tonight.

-- Lars



________________________________
 From: Stack <st...@duboce.net>
To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com> 
Sent: Wednesday, February 22, 2012 8:59 PM
Subject: Re: Some HBase M/R confusion
 
On Wed, Feb 22, 2012 at 6:36 PM, lars hofhansl <lh...@yahoo.com> wrote:
> 1. The HBase book states to run M/R jobs like export here: http://hbase.apache.org/book/ops_mgt.html#export
> bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
>

This is running the Export tool, i.e. the Export class's main.  The
CLASSPATH is that built by bin/hbase.


> 2. Whereas the Javadoc says here: http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.90.0.jar export ...
>

Here we're loading the HADOOP_CLASSPATH with hbase classpath.  We then
pass the hbase.jar as a 'mapreduce fat jar' for bin/hadoop to run.
Our hbase.jar, when we make it, we set its Main-Class to be the Driver
class under mapreduce.  In here, it parses args to figure which of our
selection of common mapreduce programs to run.  Here you've chosen
export (leave off the 'export' arg to see the complete list).

Either means should work but #2 is a bit more palatable (excepting the
ugly CLASSPATH preamble).


> In the first case (#1) I find that the job allways fails to create the output dir:
> java.io.IOException: Mkdirs failed to create file:/exports/_temporary/_attempt_local_0001_m_000000_0
>     at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:378)
>

Its running local?   Its trying to write to /export on your local
disk?  Its probably not picking up hadoop configs and so is using
local mapreducing.


> In the 2nd case (#2) I get past the creation of the output dir, and then it fails because it cannot find class com.google.protobuf.Message.

Its not adding protobufs to CLASSPATH?  Or versions disagree?   The
hbase included protobufs is being found first and its not what Hadoop
protobuffing wants?


> I am using the HBase security branch and find that I need to add com.google.protobuf.Message.class in TableMapReduceUtil.addDependencyJars.
> If I do that, I can successfully run an export jobs using method #2.
>

This is probably a bug.

This is 0.92.x?  Or trunk?  The protobufs is a new dependency hbase needs?

>
> The 2nd issue I found looks like a bug with the HBase security branch.
> I am not sure about the first issue, is the documentation in the HBase book outdated?
>

I think yeah, we should encourage #2; e.g. we'll use the proper config
and find the cluster.  Would have to add hadoop config. to #1 to make
it work.

My guess is its not just security branch.

St.Ack

Re: Some HBase M/R confusion

Posted by Stack <st...@duboce.net>.
On Wed, Feb 22, 2012 at 6:36 PM, lars hofhansl <lh...@yahoo.com> wrote:
> 1. The HBase book states to run M/R jobs like export here: http://hbase.apache.org/book/ops_mgt.html#export
> bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
>

This is running the Export tool, i.e. the Export class's main.  The
CLASSPATH is that built by bin/hbase.


> 2. Whereas the Javadoc says here: http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.90.0.jar export ...
>

Here we're loading the HADOOP_CLASSPATH with hbase classpath.  We then
pass the hbase.jar as a 'mapreduce fat jar' for bin/hadoop to run.
Our hbase.jar, when we make it, we set its Main-Class to be the Driver
class under mapreduce.  In here, it parses args to figure which of our
selection of common mapreduce programs to run.  Here you've chosen
export (leave off the 'export' arg to see the complete list).

Either means should work but #2 is a bit more palatable (excepting the
ugly CLASSPATH preamble).


> In the first case (#1) I find that the job allways fails to create the output dir:
> java.io.IOException: Mkdirs failed to create file:/exports/_temporary/_attempt_local_0001_m_000000_0
>     at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:378)
>

Its running local?   Its trying to write to /export on your local
disk?  Its probably not picking up hadoop configs and so is using
local mapreducing.


> In the 2nd case (#2) I get past the creation of the output dir, and then it fails because it cannot find class com.google.protobuf.Message.

Its not adding protobufs to CLASSPATH?  Or versions disagree?   The
hbase included protobufs is being found first and its not what Hadoop
protobuffing wants?


> I am using the HBase security branch and find that I need to add com.google.protobuf.Message.class in TableMapReduceUtil.addDependencyJars.
> If I do that, I can successfully run an export jobs using method #2.
>

This is probably a bug.

This is 0.92.x?  Or trunk?  The protobufs is a new dependency hbase needs?

>
> The 2nd issue I found looks like a bug with the HBase security branch.
> I am not sure about the first issue, is the documentation in the HBase book outdated?
>

I think yeah, we should encourage #2; e.g. we'll use the proper config
and find the cluster.  Would have to add hadoop config. to #1 to make
it work.

My guess is its not just security branch.

St.Ack

Re: Some HBase M/R confusion

Posted by lars hofhansl <lh...@yahoo.com>.
I saw a bunch of security related classes in the trace. I'll try with a non-secure branch and file a jira if the problem is not present there.
Any input on the first issue?


-- Lars



________________________________
 From: Ted Yu <yu...@gmail.com>
To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com> 
Sent: Wednesday, February 22, 2012 7:43 PM
Subject: Re: Some HBase M/R confusion
 

Lars:
Is the second problem present in insecure HBase ?

Thanks


On Wed, Feb 22, 2012 at 6:36 PM, lars hofhansl <lh...@yahoo.com> wrote:

According to the documentation there are two ways to run HBase M/R jobs:
>
>
>1. The HBase book states to run M/R jobs like export here: http://hbase.apache.org/book/ops_mgt.html#export%0Abin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
>
>2. Whereas the Javadoc says here: http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
>HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.90.0.jar export ...
>
>
>In the first case (#1) I find that the job allways fails to create the output dir:
>java.io.IOException: Mkdirs failed to create file:/exports/_temporary/_attempt_local_0001_m_000000_0
>    at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:378)
>
>...
>
>
>In the 2nd case (#2) I get past the creation of the output dir, and then it fails because it cannot find class com.google.protobuf.Message.
>I am using the HBase security branch and find that I need to add com.google.protobuf.Message.class in TableMapReduceUtil.addDependencyJars.
>If I do that, I can successfully run an export jobs using method #2.
>
>
>The 2nd issue I found looks like a bug with the HBase security branch.
>I am not sure about the first issue, is the documentation in the HBase book outdated?
>
>
>-- Lars
>
>

Re: Some HBase M/R confusion

Posted by Ted Yu <yu...@gmail.com>.
Lars:
Is the second problem present in insecure HBase ?

Thanks

On Wed, Feb 22, 2012 at 6:36 PM, lars hofhansl <lh...@yahoo.com> wrote:

> According to the documentation there are two ways to run HBase M/R jobs:
>
>
> 1. The HBase book states to run M/R jobs like export here:
> http://hbase.apache.org/book/ops_mgt.html#export
> bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename>
> <outputdir> [<versions> [<starttime> [<endtime>]]]
>
> 2. Whereas the Javadoc says here:
> http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
> HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
> ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.90.0.jar export ...
>
>
> In the first case (#1) I find that the job allways fails to create the
> output dir:
> java.io.IOException: Mkdirs failed to create
> file:/exports/_temporary/_attempt_local_0001_m_000000_0
>     at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:378)
>
> ...
>
>
> In the 2nd case (#2) I get past the creation of the output dir, and then
> it fails because it cannot find class com.google.protobuf.Message.
> I am using the HBase security branch and find that I need to add
> com.google.protobuf.Message.class in TableMapReduceUtil.addDependencyJars.
> If I do that, I can successfully run an export jobs using method #2.
>
>
> The 2nd issue I found looks like a bug with the HBase security branch.
> I am not sure about the first issue, is the documentation in the HBase
> book outdated?
>
>
> -- Lars
>
>