You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Neil Yalowitz <ne...@gmail.com> on 2012/05/16 02:28:25 UTC

Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)

I've created a simple HBase table (version 0.90.4-cdh3u3) and I'm
attempting to query the contents with Pig (version 0.8.1-cdh3u3).


grunt> A = load 'test' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a');
grunt> dump A;
(...)Success!
myhbasevalue1


This works when pig runs in local mode, but when it is executed in
mapreduce mode, the MR job fails with an all-too-familiar error message:


    org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to
connect to ZooKeeper but the connection closes immediately


To make this work with pig + local mode, I followed suggestions I found via
a web search and added the HBase classpath to PIG_CLASSPATH:


added to:  /usr/lib/pig/bin/pig

export JAVA_HOME=/usr/java/latest
export HBASE_HOME=/usr/lib/hbase
export PIG_CLASSPATH="`${HBASE_HOME}/bin/hbase classpath`:$PIG_CLASSPATH"


added to: /etc/hbase/conf/hbase-site.xml

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>myzookeeper1</value>
</property>


So again, this works with pig in local mode.  To make my job run in
mapreduce mode, I add a target HDFS and Jobtracker service to the pig
properties


added to: /etc/pig/conf/pig.properties

fs.default.name=hdfs://my-mr-cluster/
mapred.job.tracker=my-mr-cluster:8021


When I run the query again on the actual MR cluster, the job fails with the
Zookeeper exception I mentioned above.

When I examine the job.xml (in the MR dashboard as well in the temporary
taskTracker cache) I see the hbase.zookeeper.quorum is correctly set
(myzookeeper1).  However, when I arbitrarily select a Tasktracker node and
examine the TT logs, I see that the Tasktracker thinks the ZK is
"localhost".

Any ideas?  This is mindbending.


Neil Yalowitz
neilyalowitz@gmail.com

Re: Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)

Posted by Norbert Burger <no...@gmail.com>.
Great - glad to hear you're up and running.

On Tue, May 15, 2012 at 11:39 PM, Neil Yalowitz <ne...@gmail.com>wrote:

> > Is your HBase conf dir part of your Hadoop classpath? HBase configuration
> > settings are not pushed down to the mapreduce task level by default
>
> This was the problem.  I was setting the classpath on the machine where the
> Pig query was being executed but not on the MR cluster nodes which were
> executing the job.
>
> The MR cluster in this case is managed by Cloudera's cluster tool (Cloudera
> Manager) which re-generates the conf files upon service restart.  To
> configure the correct target Zookeeper cluster-wide it required adding the
> following to a specific override field in Cloudera Manager under the Mapred
> service (the "Mapreduce Service Configuration Safety Valve" field) and then
> restart the MR service:
>
> <property>
>  <name>hbase.zookeeper.quorum</name>
>  <value>myzookeeper1</value>
> </property>
>
>
> Thanks Norbert, that was the exact tip I needed.
>
>
> Neil Yalowitz
> neilyalowitz@gmail.com
>
> 2012-05-15 22:41:06,157 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:java.class.path=/etc/hbase/conf....(etc)...
>
>
>
>
> On Tue, May 15, 2012 at 8:34 PM, Norbert Burger <norbert.burger@gmail.com
> >wrote:
>
> > Is your HBase conf dir part of your Hadoop classpath?  HBase
> configuration
> > settings are not pushed down to the mapreduce task level by default:
> >
> >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath
> >
> > Norbert
> >
> > On Tue, May 15, 2012 at 8:28 PM, Neil Yalowitz <neilyalowitz@gmail.com
> > >wrote:
> >
> > > I've created a simple HBase table (version 0.90.4-cdh3u3) and I'm
> > > attempting to query the contents with Pig (version 0.8.1-cdh3u3).
> > >
> > >
> > > grunt> A = load 'test' using
> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a');
> > > grunt> dump A;
> > > (...)Success!
> > > myhbasevalue1
> > >
> > >
> > > This works when pig runs in local mode, but when it is executed in
> > > mapreduce mode, the MR job fails with an all-too-familiar error
> message:
> > >
> > >
> > >    org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able
> to
> > > connect to ZooKeeper but the connection closes immediately
> > >
> > >
> > > To make this work with pig + local mode, I followed suggestions I found
> > via
> > > a web search and added the HBase classpath to PIG_CLASSPATH:
> > >
> > >
> > > added to:  /usr/lib/pig/bin/pig
> > >
> > > export JAVA_HOME=/usr/java/latest
> > > export HBASE_HOME=/usr/lib/hbase
> > > export PIG_CLASSPATH="`${HBASE_HOME}/bin/hbase
> classpath`:$PIG_CLASSPATH"
> > >
> > >
> > > added to: /etc/hbase/conf/hbase-site.xml
> > >
> > > <property>
> > >  <name>hbase.zookeeper.quorum</name>
> > >  <value>myzookeeper1</value>
> > > </property>
> > >
> > >
> > > So again, this works with pig in local mode.  To make my job run in
> > > mapreduce mode, I add a target HDFS and Jobtracker service to the pig
> > > properties
> > >
> > >
> > > added to: /etc/pig/conf/pig.properties
> > >
> > > fs.default.name=hdfs://my-mr-cluster/
> > > mapred.job.tracker=my-mr-cluster:8021
> > >
> > >
> > > When I run the query again on the actual MR cluster, the job fails with
> > the
> > > Zookeeper exception I mentioned above.
> > >
> > > When I examine the job.xml (in the MR dashboard as well in the
> temporary
> > > taskTracker cache) I see the hbase.zookeeper.quorum is correctly set
> > > (myzookeeper1).  However, when I arbitrarily select a Tasktracker node
> > and
> > > examine the TT logs, I see that the Tasktracker thinks the ZK is
> > > "localhost".
> > >
> > > Any ideas?  This is mindbending.
> > >
> > >
> > > Neil Yalowitz
> > > neilyalowitz@gmail.com
> > >
> >
>

Re: Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)

Posted by Neil Yalowitz <ne...@gmail.com>.
> Is your HBase conf dir part of your Hadoop classpath? HBase configuration
> settings are not pushed down to the mapreduce task level by default

This was the problem.  I was setting the classpath on the machine where the
Pig query was being executed but not on the MR cluster nodes which were
executing the job.

The MR cluster in this case is managed by Cloudera's cluster tool (Cloudera
Manager) which re-generates the conf files upon service restart.  To
configure the correct target Zookeeper cluster-wide it required adding the
following to a specific override field in Cloudera Manager under the Mapred
service (the "Mapreduce Service Configuration Safety Valve" field) and then
restart the MR service:

<property>
  <name>hbase.zookeeper.quorum</name>
  <value>myzookeeper1</value>
</property>


Thanks Norbert, that was the exact tip I needed.


Neil Yalowitz
neilyalowitz@gmail.com

2012-05-15 22:41:06,157 [main] INFO  org.apache.zookeeper.ZooKeeper -
Client environment:java.class.path=/etc/hbase/conf....(etc)...




On Tue, May 15, 2012 at 8:34 PM, Norbert Burger <no...@gmail.com>wrote:

> Is your HBase conf dir part of your Hadoop classpath?  HBase configuration
> settings are not pushed down to the mapreduce task level by default:
>
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath
>
> Norbert
>
> On Tue, May 15, 2012 at 8:28 PM, Neil Yalowitz <neilyalowitz@gmail.com
> >wrote:
>
> > I've created a simple HBase table (version 0.90.4-cdh3u3) and I'm
> > attempting to query the contents with Pig (version 0.8.1-cdh3u3).
> >
> >
> > grunt> A = load 'test' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a');
> > grunt> dump A;
> > (...)Success!
> > myhbasevalue1
> >
> >
> > This works when pig runs in local mode, but when it is executed in
> > mapreduce mode, the MR job fails with an all-too-familiar error message:
> >
> >
> >    org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to
> > connect to ZooKeeper but the connection closes immediately
> >
> >
> > To make this work with pig + local mode, I followed suggestions I found
> via
> > a web search and added the HBase classpath to PIG_CLASSPATH:
> >
> >
> > added to:  /usr/lib/pig/bin/pig
> >
> > export JAVA_HOME=/usr/java/latest
> > export HBASE_HOME=/usr/lib/hbase
> > export PIG_CLASSPATH="`${HBASE_HOME}/bin/hbase classpath`:$PIG_CLASSPATH"
> >
> >
> > added to: /etc/hbase/conf/hbase-site.xml
> >
> > <property>
> >  <name>hbase.zookeeper.quorum</name>
> >  <value>myzookeeper1</value>
> > </property>
> >
> >
> > So again, this works with pig in local mode.  To make my job run in
> > mapreduce mode, I add a target HDFS and Jobtracker service to the pig
> > properties
> >
> >
> > added to: /etc/pig/conf/pig.properties
> >
> > fs.default.name=hdfs://my-mr-cluster/
> > mapred.job.tracker=my-mr-cluster:8021
> >
> >
> > When I run the query again on the actual MR cluster, the job fails with
> the
> > Zookeeper exception I mentioned above.
> >
> > When I examine the job.xml (in the MR dashboard as well in the temporary
> > taskTracker cache) I see the hbase.zookeeper.quorum is correctly set
> > (myzookeeper1).  However, when I arbitrarily select a Tasktracker node
> and
> > examine the TT logs, I see that the Tasktracker thinks the ZK is
> > "localhost".
> >
> > Any ideas?  This is mindbending.
> >
> >
> > Neil Yalowitz
> > neilyalowitz@gmail.com
> >
>

Re: Pig against HBase table - successful in local mode, fails in mapreduce mode (Tasktracker thinks ZK is localhost)

Posted by Norbert Burger <no...@gmail.com>.
Is your HBase conf dir part of your Hadoop classpath?  HBase configuration
settings are not pushed down to the mapreduce task level by default:

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath

Norbert

On Tue, May 15, 2012 at 8:28 PM, Neil Yalowitz <ne...@gmail.com>wrote:

> I've created a simple HBase table (version 0.90.4-cdh3u3) and I'm
> attempting to query the contents with Pig (version 0.8.1-cdh3u3).
>
>
> grunt> A = load 'test' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a');
> grunt> dump A;
> (...)Success!
> myhbasevalue1
>
>
> This works when pig runs in local mode, but when it is executed in
> mapreduce mode, the MR job fails with an all-too-familiar error message:
>
>
>    org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to
> connect to ZooKeeper but the connection closes immediately
>
>
> To make this work with pig + local mode, I followed suggestions I found via
> a web search and added the HBase classpath to PIG_CLASSPATH:
>
>
> added to:  /usr/lib/pig/bin/pig
>
> export JAVA_HOME=/usr/java/latest
> export HBASE_HOME=/usr/lib/hbase
> export PIG_CLASSPATH="`${HBASE_HOME}/bin/hbase classpath`:$PIG_CLASSPATH"
>
>
> added to: /etc/hbase/conf/hbase-site.xml
>
> <property>
>  <name>hbase.zookeeper.quorum</name>
>  <value>myzookeeper1</value>
> </property>
>
>
> So again, this works with pig in local mode.  To make my job run in
> mapreduce mode, I add a target HDFS and Jobtracker service to the pig
> properties
>
>
> added to: /etc/pig/conf/pig.properties
>
> fs.default.name=hdfs://my-mr-cluster/
> mapred.job.tracker=my-mr-cluster:8021
>
>
> When I run the query again on the actual MR cluster, the job fails with the
> Zookeeper exception I mentioned above.
>
> When I examine the job.xml (in the MR dashboard as well in the temporary
> taskTracker cache) I see the hbase.zookeeper.quorum is correctly set
> (myzookeeper1).  However, when I arbitrarily select a Tasktracker node and
> examine the TT logs, I see that the Tasktracker thinks the ZK is
> "localhost".
>
> Any ideas?  This is mindbending.
>
>
> Neil Yalowitz
> neilyalowitz@gmail.com
>