You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Daniel Eklund <do...@gmail.com> on 2011/04/12 15:53:18 UTC

Greetings: first question

This question might be better diagnosed as an Hbase issue, but since it's
ultimately a Pig script I want to use, I figure someone on this group could
help me out. I tried asking the IRC channel, but I think it was in a lull.

My scenario: I want to use Pig to call an HBase store.
My installs: Apache Pig version 0.8.0-CDH3B4 --- hbase version:
hbase-0.90.1-CDH3B4.
My sample script:

-----------
A = load 'passwd' using PigStorage(':');
rawDocs = LOAD 'hbase://daniel_product'
USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
vals = foreach rawDocs generate $0 as val;
dump vals;
store vals into 'daniel.out';
-----------

I am consistently getting a
Failed Jobs:
JobId Alias Feature Message Outputs
N/A rawDocs,vals MAP_ONLY Message:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed out
trying to locate root region
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)

Googling shows me similar issues:

http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig

My current understanding is that somewhere in the interaction between Pig,
Hadoop, HBase, and Zookeper, there is a configuration file that needs to be
included in a classpath or a configuration directory somewhere. I have
tried various combinations of making hadoop aware of Hbase and vice-versa.
I have tried ZK running on its own, and also managed by HBase.

Can someone explain the dependencies here? Any insight as to what I am
missing? What would your diagnosis of the above message be?

thanks,
daniel

Re: Greetings: first question

Posted by Daniel Eklund <do...@gmail.com>.

success.  thanks.

On Tue, Apr 12, 2011 at 7:11 PM, Daniel Eklund <do...@gmail.com> wrote:

> Looks like it:
> http://archive.cloudera.com/cdh/3/pig-0.8.0+20.3.CHANGES.txt
>
> I am assuming the issue was
>
>
> PIG-1680
>
> and it shows that your change was rolled in to the update.  Thanks a
> bunch.  I'll try it out.
>
> daniel
>
>
> On Tue, Apr 12, 2011 at 6:42 PM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>
>> Daniel,
>> Please upgrade your pig version to the latest in the 0.8 branch. The 0.8
>> release is not compatible with 0.20+ versions of hbase; we bumped up the
>> support in 0.8.1, which is nearing release.  Cloudera's latest CDH3 GA might
>> have these patches (it was just released today) but CDH3B4 didn't.
>>
>> D
>>
>>
>> On Tue, Apr 12, 2011 at 3:38 PM, Daniel Eklund <do...@gmail.com>wrote:
>>
>>> Interesting.  My exact stacktrace is:
>>>
>>> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed
>>> out
>>> trying to locate root region
>>>    at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
>>>     at
>>> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944)
>>>    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961)
>>>    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>>>    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
>>>    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
>>>    at java.security.AccessController.doPrivileged(Native Method)
>>>    at javax.security.auth.Subject.doAs(Subject.java:396)
>>>    at
>>>
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>    at
>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
>>>    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
>>>    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
>>>    at
>>>
>>> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>>>    at
>>> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>>>    at java.lang.Thread.run(Thread.java:662)
>>> Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException:
>>> Timed
>>> out trying to locate root region
>>>    at
>>>
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:983)
>>>    at
>>>
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625)
>>>    at
>>>
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
>>>    at
>>>
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:670)
>>>    at
>>>
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:630)
>>>
>>>
>>> So, I go to
>>>
>>> https://repository.cloudera.com/content/repositories/releases/org/apache/hbase/hbase/0.90.1-CDH3B4/hbase-0.90.1-CDH3B4-sources.jar
>>> to look at HConnectionManager and see that there's no locateRootRegion()
>>> method there.
>>>
>>> So, it looks like while I am running an HBase 0.90, the pig libs show me
>>> in
>>> /usr/lib/pig/lib
>>> hbase-0.20.6.jar  zookeeper-hbase-1329.jar
>>>
>>> I am not quite sure about the cloudera versus apache versioning schemes
>>> going on here.
>>>
>>>
>>>
>>> On Tue, Apr 12, 2011 at 6:35 PM, Bill Graham <bi...@gmail.com>
>>> wrote:
>>>
>>> > Can you include more of your stack trace? I'm not sure of the
>>> > specifics of what is stored where in ZK, but it seems you're timing
>>> > out just trying to connect to ZK. Are you seeing any exceptions on the
>>> > TT nodes, or just on the client?
>>> >
>>> >
>>> > On Tue, Apr 12, 2011 at 3:24 PM, Daniel Eklund <do...@gmail.com>
>>> wrote:
>>> > > Bill,  I have done all that both you and Jameson have suggested and
>>> still
>>> > > get the same error.
>>> > >
>>> > > I can telnet into the zookeeper.  I have also used the zkClient.sh
>>> and
>>> > can
>>> > > look at /hbase/rs to see the regionservers.
>>> > > Should I be able to see anything at /hbase/root-region-server?
>>> > >
>>> > > thanks,
>>> > > daniel
>>> > >
>>> > >
>>> > > On Tue, Apr 12, 2011 at 11:58 AM, Bill Graham <bi...@gmail.com>
>>> > wrote:
>>> > >
>>> > >> Yes, Pig's HBaseStorage using the HBase client to read/write
>>> directly
>>> > >> to HBase from within a MR job, but chains to other Pig-generated MR
>>> > >> jobs as needed to transform.
>>> > >>
>>> > >> Daniel, check that you have defined HBASE_CONF_DIR properly, or that
>>> > >> you have hbase-site.xml in your classpath. Then try to telnet to the
>>> > >> defined zookeeper host from the machine where the exception is being
>>> > >> generated. There is some communication from Pig to HBase/ZK from the
>>> > >> node that the client runs on before the MR jobs start on the cluster
>>> > >> FYI.
>>> > >>
>>> > >>
>>> > >> On Tue, Apr 12, 2011 at 8:40 AM, Jameson Lopp <ja...@bronto.com>
>>> > wrote:
>>> > >> > I'm by no means an expert, but I think it's the latter. My
>>> rudimentary
>>> > >> > understanding is that pig uses HBaseStorage to load the data from
>>> > hbase
>>> > >> and
>>> > >> > passes the input splits along to hadoop/MR. Feel free to correct
>>> me if
>>> > >> I'm
>>> > >> > wrong.
>>> > >> > --
>>> > >> > Jameson Lopp
>>> > >> > Software Engineer
>>> > >> > Bronto Software, Inc.
>>> > >> >
>>> > >> > On 04/12/2011 10:50 AM, Daniel Eklund wrote:
>>> > >> >>
>>> > >> >> As a follow-up to my own question, which accurately describes the
>>> > >> >> component
>>> > >> >> call-stack of the pig script I included in my post?
>>> > >> >>
>>> > >> >> pig ->  mapreduce/hadoop ->  Hbase
>>> > >> >> pig  ->  Hbase ->  mapreduce/hadoop
>>> > >> >>
>>> > >> >>
>>> > >> >>
>>> > >> >> On Tue, Apr 12, 2011 at 9:53 AM, Daniel Eklund<
>>> doeklund@gmail.com>
>>> > >>  wrote:
>>> > >> >>
>>> > >> >>> This question might be better diagnosed as an Hbase issue, but
>>> since
>>> > >> it's
>>> > >> >>> ultimately a Pig script I want to use, I figure someone on this
>>> > group
>>> > >> >>> could
>>> > >> >>> help me out. I tried asking the IRC channel, but I think it was
>>> in a
>>> > >> >>> lull.
>>> > >> >>>
>>> > >> >>> My scenario:  I want to use Pig to call an HBase store.
>>> > >> >>> My installs:   Apache Pig version 0.8.0-CDH3B4  --- hbase
>>> version:
>>> > >> >>> hbase-0.90.1-CDH3B4.
>>> > >> >>> My sample script:
>>> > >> >>>
>>> > >> >>> -----------
>>> > >> >>> A = load 'passwd' using PigStorage(':');
>>> > >> >>> rawDocs = LOAD 'hbase://daniel_product'
>>> > >> >>>         USING
>>> > >> >>>
>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
>>> > >> >>> vals = foreach rawDocs generate $0 as val;
>>> > >> >>> dump vals;
>>> > >> >>> store vals into 'daniel.out';
>>> > >> >>> -----------
>>> > >> >>>
>>> > >> >>> I am consistently getting a
>>> > >> >>> Failed Jobs:
>>> > >> >>> JobId   Alias   Feature Message Outputs
>>> > >> >>> N/A     rawDocs,vals    MAP_ONLY        Message:
>>> > >> >>> org.apache.pig.backend.executionengine.ExecException: ERROR
>>> 2118:
>>> > Timed
>>> > >> >>> out
>>> > >> >>> trying to locate root region
>>> > >> >>>         at
>>> > >> >>>
>>> > >> >>>
>>> > >>
>>> >
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
>>> > >> >>>
>>> > >> >>>
>>> > >> >>> Googling shows me similar issues:
>>> > >> >>>
>>> > >> >>>
>>> > >> >>>
>>> > >>
>>> >
>>> http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig
>>> > >> >>>
>>> > >> >>> My current understanding is that somewhere in the interaction
>>> > between
>>> > >> >>> Pig,
>>> > >> >>> Hadoop, HBase, and Zookeper, there is a configuration file that
>>> > needs
>>> > >> to
>>> > >> >>> be
>>> > >> >>> included in a classpath or a configuration directory somewhere.
>>>  I
>>> > have
>>> > >> >>> tried various combinations of making hadoop aware of Hbase and
>>> > >> >>> vice-versa.
>>> > >> >>> I have tried ZK running on its own, and also managed by HBase.
>>> > >> >>>
>>> > >> >>> Can someone explain the dependencies here?  Any insight as to
>>> what I
>>> > am
>>> > >> >>> missing?  What would your diagnosis of the above message be?
>>> > >> >>>
>>> > >> >>> thanks,
>>> > >> >>> daniel
>>> > >> >>>
>>> > >> >>>
>>> > >> >>>
>>> > >> >>>
>>> > >> >>
>>> > >> >
>>> > >>
>>> > >
>>> >
>>>
>>
>>
>

Re: Greetings: first question

Posted by Daniel Eklund <do...@gmail.com>.

Looks like it:
http://archive.cloudera.com/cdh/3/pig-0.8.0+20.3.CHANGES.txt

I am assuming the issue was


PIG-1680

and it shows that your change was rolled in to the update.  Thanks a bunch.
I'll try it out.

daniel

On Tue, Apr 12, 2011 at 6:42 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Daniel,
> Please upgrade your pig version to the latest in the 0.8 branch. The 0.8
> release is not compatible with 0.20+ versions of hbase; we bumped up the
> support in 0.8.1, which is nearing release.  Cloudera's latest CDH3 GA might
> have these patches (it was just released today) but CDH3B4 didn't.
>
> D
>
>
> On Tue, Apr 12, 2011 at 3:38 PM, Daniel Eklund <do...@gmail.com> wrote:
>
>> Interesting.  My exact stacktrace is:
>>
>> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed
>> out
>> trying to locate root region
>>    at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
>>     at
>> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944)
>>    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961)
>>    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>>    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
>>    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
>>    at java.security.AccessController.doPrivileged(Native Method)
>>    at javax.security.auth.Subject.doAs(Subject.java:396)
>>    at
>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>    at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
>>    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
>>    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
>>    at
>>
>> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>>    at
>> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>>    at java.lang.Thread.run(Thread.java:662)
>> Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException:
>> Timed
>> out trying to locate root region
>>    at
>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:983)
>>    at
>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625)
>>    at
>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
>>    at
>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:670)
>>    at
>>
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:630)
>>
>>
>> So, I go to
>>
>> https://repository.cloudera.com/content/repositories/releases/org/apache/hbase/hbase/0.90.1-CDH3B4/hbase-0.90.1-CDH3B4-sources.jar
>> to look at HConnectionManager and see that there's no locateRootRegion()
>> method there.
>>
>> So, it looks like while I am running an HBase 0.90, the pig libs show me
>> in
>> /usr/lib/pig/lib
>> hbase-0.20.6.jar  zookeeper-hbase-1329.jar
>>
>> I am not quite sure about the cloudera versus apache versioning schemes
>> going on here.
>>
>>
>>
>> On Tue, Apr 12, 2011 at 6:35 PM, Bill Graham <bi...@gmail.com>
>> wrote:
>>
>> > Can you include more of your stack trace? I'm not sure of the
>> > specifics of what is stored where in ZK, but it seems you're timing
>> > out just trying to connect to ZK. Are you seeing any exceptions on the
>> > TT nodes, or just on the client?
>> >
>> >
>> > On Tue, Apr 12, 2011 at 3:24 PM, Daniel Eklund <do...@gmail.com>
>> wrote:
>> > > Bill,  I have done all that both you and Jameson have suggested and
>> still
>> > > get the same error.
>> > >
>> > > I can telnet into the zookeeper.  I have also used the zkClient.sh and
>> > can
>> > > look at /hbase/rs to see the regionservers.
>> > > Should I be able to see anything at /hbase/root-region-server?
>> > >
>> > > thanks,
>> > > daniel
>> > >
>> > >
>> > > On Tue, Apr 12, 2011 at 11:58 AM, Bill Graham <bi...@gmail.com>
>> > wrote:
>> > >
>> > >> Yes, Pig's HBaseStorage using the HBase client to read/write directly
>> > >> to HBase from within a MR job, but chains to other Pig-generated MR
>> > >> jobs as needed to transform.
>> > >>
>> > >> Daniel, check that you have defined HBASE_CONF_DIR properly, or that
>> > >> you have hbase-site.xml in your classpath. Then try to telnet to the
>> > >> defined zookeeper host from the machine where the exception is being
>> > >> generated. There is some communication from Pig to HBase/ZK from the
>> > >> node that the client runs on before the MR jobs start on the cluster
>> > >> FYI.
>> > >>
>> > >>
>> > >> On Tue, Apr 12, 2011 at 8:40 AM, Jameson Lopp <ja...@bronto.com>
>> > wrote:
>> > >> > I'm by no means an expert, but I think it's the latter. My
>> rudimentary
>> > >> > understanding is that pig uses HBaseStorage to load the data from
>> > hbase
>> > >> and
>> > >> > passes the input splits along to hadoop/MR. Feel free to correct me
>> if
>> > >> I'm
>> > >> > wrong.
>> > >> > --
>> > >> > Jameson Lopp
>> > >> > Software Engineer
>> > >> > Bronto Software, Inc.
>> > >> >
>> > >> > On 04/12/2011 10:50 AM, Daniel Eklund wrote:
>> > >> >>
>> > >> >> As a follow-up to my own question, which accurately describes the
>> > >> >> component
>> > >> >> call-stack of the pig script I included in my post?
>> > >> >>
>> > >> >> pig ->  mapreduce/hadoop ->  Hbase
>> > >> >> pig  ->  Hbase ->  mapreduce/hadoop
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> On Tue, Apr 12, 2011 at 9:53 AM, Daniel Eklund<doeklund@gmail.com
>> >
>> > >>  wrote:
>> > >> >>
>> > >> >>> This question might be better diagnosed as an Hbase issue, but
>> since
>> > >> it's
>> > >> >>> ultimately a Pig script I want to use, I figure someone on this
>> > group
>> > >> >>> could
>> > >> >>> help me out. I tried asking the IRC channel, but I think it was
>> in a
>> > >> >>> lull.
>> > >> >>>
>> > >> >>> My scenario:  I want to use Pig to call an HBase store.
>> > >> >>> My installs:   Apache Pig version 0.8.0-CDH3B4  --- hbase
>> version:
>> > >> >>> hbase-0.90.1-CDH3B4.
>> > >> >>> My sample script:
>> > >> >>>
>> > >> >>> -----------
>> > >> >>> A = load 'passwd' using PigStorage(':');
>> > >> >>> rawDocs = LOAD 'hbase://daniel_product'
>> > >> >>>         USING
>> > >> >>>
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
>> > >> >>> vals = foreach rawDocs generate $0 as val;
>> > >> >>> dump vals;
>> > >> >>> store vals into 'daniel.out';
>> > >> >>> -----------
>> > >> >>>
>> > >> >>> I am consistently getting a
>> > >> >>> Failed Jobs:
>> > >> >>> JobId   Alias   Feature Message Outputs
>> > >> >>> N/A     rawDocs,vals    MAP_ONLY        Message:
>> > >> >>> org.apache.pig.backend.executionengine.ExecException: ERROR 2118:
>> > Timed
>> > >> >>> out
>> > >> >>> trying to locate root region
>> > >> >>>         at
>> > >> >>>
>> > >> >>>
>> > >>
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
>> > >> >>>
>> > >> >>>
>> > >> >>> Googling shows me similar issues:
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >>
>> >
>> http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig
>> > >> >>>
>> > >> >>> My current understanding is that somewhere in the interaction
>> > between
>> > >> >>> Pig,
>> > >> >>> Hadoop, HBase, and Zookeper, there is a configuration file that
>> > needs
>> > >> to
>> > >> >>> be
>> > >> >>> included in a classpath or a configuration directory somewhere.
>>  I
>> > have
>> > >> >>> tried various combinations of making hadoop aware of Hbase and
>> > >> >>> vice-versa.
>> > >> >>> I have tried ZK running on its own, and also managed by HBase.
>> > >> >>>
>> > >> >>> Can someone explain the dependencies here?  Any insight as to
>> what I
>> > am
>> > >> >>> missing?  What would your diagnosis of the above message be?
>> > >> >>>
>> > >> >>> thanks,
>> > >> >>> daniel
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >> >>
>> > >> >
>> > >>
>> > >
>> >
>>
>
>

Re: Greetings: first question

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Daniel,
Please upgrade your pig version to the latest in the 0.8 branch. The 0.8
release is not compatible with 0.20+ versions of hbase; we bumped up the
support in 0.8.1, which is nearing release.  Cloudera's latest CDH3 GA might
have these patches (it was just released today) but CDH3B4 didn't.

D

On Tue, Apr 12, 2011 at 3:38 PM, Daniel Eklund <do...@gmail.com> wrote:

> Interesting.  My exact stacktrace is:
>
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed out
> trying to locate root region
>    at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
>     at
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944)
>    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961)
>    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
>    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>    at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
>    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
>    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
>    at
>
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>    at
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>    at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Timed
> out trying to locate root region
>    at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:983)
>    at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625)
>    at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
>    at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:670)
>    at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:630)
>
>
> So, I go to
>
> https://repository.cloudera.com/content/repositories/releases/org/apache/hbase/hbase/0.90.1-CDH3B4/hbase-0.90.1-CDH3B4-sources.jar
> to look at HConnectionManager and see that there's no locateRootRegion()
> method there.
>
> So, it looks like while I am running an HBase 0.90, the pig libs show me in
> /usr/lib/pig/lib
> hbase-0.20.6.jar  zookeeper-hbase-1329.jar
>
> I am not quite sure about the cloudera versus apache versioning schemes
> going on here.
>
>
>
> On Tue, Apr 12, 2011 at 6:35 PM, Bill Graham <bi...@gmail.com> wrote:
>
> > Can you include more of your stack trace? I'm not sure of the
> > specifics of what is stored where in ZK, but it seems you're timing
> > out just trying to connect to ZK. Are you seeing any exceptions on the
> > TT nodes, or just on the client?
> >
> >
> > On Tue, Apr 12, 2011 at 3:24 PM, Daniel Eklund <do...@gmail.com>
> wrote:
> > > Bill,  I have done all that both you and Jameson have suggested and
> still
> > > get the same error.
> > >
> > > I can telnet into the zookeeper.  I have also used the zkClient.sh and
> > can
> > > look at /hbase/rs to see the regionservers.
> > > Should I be able to see anything at /hbase/root-region-server?
> > >
> > > thanks,
> > > daniel
> > >
> > >
> > > On Tue, Apr 12, 2011 at 11:58 AM, Bill Graham <bi...@gmail.com>
> > wrote:
> > >
> > >> Yes, Pig's HBaseStorage using the HBase client to read/write directly
> > >> to HBase from within a MR job, but chains to other Pig-generated MR
> > >> jobs as needed to transform.
> > >>
> > >> Daniel, check that you have defined HBASE_CONF_DIR properly, or that
> > >> you have hbase-site.xml in your classpath. Then try to telnet to the
> > >> defined zookeeper host from the machine where the exception is being
> > >> generated. There is some communication from Pig to HBase/ZK from the
> > >> node that the client runs on before the MR jobs start on the cluster
> > >> FYI.
> > >>
> > >>
> > >> On Tue, Apr 12, 2011 at 8:40 AM, Jameson Lopp <ja...@bronto.com>
> > wrote:
> > >> > I'm by no means an expert, but I think it's the latter. My
> rudimentary
> > >> > understanding is that pig uses HBaseStorage to load the data from
> > hbase
> > >> and
> > >> > passes the input splits along to hadoop/MR. Feel free to correct me
> if
> > >> I'm
> > >> > wrong.
> > >> > --
> > >> > Jameson Lopp
> > >> > Software Engineer
> > >> > Bronto Software, Inc.
> > >> >
> > >> > On 04/12/2011 10:50 AM, Daniel Eklund wrote:
> > >> >>
> > >> >> As a follow-up to my own question, which accurately describes the
> > >> >> component
> > >> >> call-stack of the pig script I included in my post?
> > >> >>
> > >> >> pig ->  mapreduce/hadoop ->  Hbase
> > >> >> pig  ->  Hbase ->  mapreduce/hadoop
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Tue, Apr 12, 2011 at 9:53 AM, Daniel Eklund<do...@gmail.com>
> > >>  wrote:
> > >> >>
> > >> >>> This question might be better diagnosed as an Hbase issue, but
> since
> > >> it's
> > >> >>> ultimately a Pig script I want to use, I figure someone on this
> > group
> > >> >>> could
> > >> >>> help me out. I tried asking the IRC channel, but I think it was in
> a
> > >> >>> lull.
> > >> >>>
> > >> >>> My scenario:  I want to use Pig to call an HBase store.
> > >> >>> My installs:   Apache Pig version 0.8.0-CDH3B4  --- hbase version:
> > >> >>> hbase-0.90.1-CDH3B4.
> > >> >>> My sample script:
> > >> >>>
> > >> >>> -----------
> > >> >>> A = load 'passwd' using PigStorage(':');
> > >> >>> rawDocs = LOAD 'hbase://daniel_product'
> > >> >>>         USING
> > >> >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
> > >> >>> vals = foreach rawDocs generate $0 as val;
> > >> >>> dump vals;
> > >> >>> store vals into 'daniel.out';
> > >> >>> -----------
> > >> >>>
> > >> >>> I am consistently getting a
> > >> >>> Failed Jobs:
> > >> >>> JobId   Alias   Feature Message Outputs
> > >> >>> N/A     rawDocs,vals    MAP_ONLY        Message:
> > >> >>> org.apache.pig.backend.executionengine.ExecException: ERROR 2118:
> > Timed
> > >> >>> out
> > >> >>> trying to locate root region
> > >> >>>         at
> > >> >>>
> > >> >>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
> > >> >>>
> > >> >>>
> > >> >>> Googling shows me similar issues:
> > >> >>>
> > >> >>>
> > >> >>>
> > >>
> >
> http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig
> > >> >>>
> > >> >>> My current understanding is that somewhere in the interaction
> > between
> > >> >>> Pig,
> > >> >>> Hadoop, HBase, and Zookeper, there is a configuration file that
> > needs
> > >> to
> > >> >>> be
> > >> >>> included in a classpath or a configuration directory somewhere.  I
> > have
> > >> >>> tried various combinations of making hadoop aware of Hbase and
> > >> >>> vice-versa.
> > >> >>> I have tried ZK running on its own, and also managed by HBase.
> > >> >>>
> > >> >>> Can someone explain the dependencies here?  Any insight as to what
> I
> > am
> > >> >>> missing?  What would your diagnosis of the above message be?
> > >> >>>
> > >> >>> thanks,
> > >> >>> daniel
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>
> > >> >
> > >>
> > >
> >
>

Re: Greetings: first question

Posted by Daniel Eklund <do...@gmail.com>.

Interesting.  My exact stacktrace is:

org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed out
trying to locate root region
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961)
    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
    at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
    at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
    at
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
    at
org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
    at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Timed
out trying to locate root region
    at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:983)
    at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:625)
    at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
    at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:670)
    at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:630)


So, I go to
https://repository.cloudera.com/content/repositories/releases/org/apache/hbase/hbase/0.90.1-CDH3B4/hbase-0.90.1-CDH3B4-sources.jar
to look at HConnectionManager and see that there's no locateRootRegion()
method there.

So, it looks like while I am running an HBase 0.90, the pig libs show me in
/usr/lib/pig/lib
hbase-0.20.6.jar  zookeeper-hbase-1329.jar

I am not quite sure about the cloudera versus apache versioning schemes
going on here.



On Tue, Apr 12, 2011 at 6:35 PM, Bill Graham <bi...@gmail.com> wrote:

> Can you include more of your stack trace? I'm not sure of the
> specifics of what is stored where in ZK, but it seems you're timing
> out just trying to connect to ZK. Are you seeing any exceptions on the
> TT nodes, or just on the client?
>
>
> On Tue, Apr 12, 2011 at 3:24 PM, Daniel Eklund <do...@gmail.com> wrote:
> > Bill,  I have done all that both you and Jameson have suggested and still
> > get the same error.
> >
> > I can telnet into the zookeeper.  I have also used the zkClient.sh and
> can
> > look at /hbase/rs to see the regionservers.
> > Should I be able to see anything at /hbase/root-region-server?
> >
> > thanks,
> > daniel
> >
> >
> > On Tue, Apr 12, 2011 at 11:58 AM, Bill Graham <bi...@gmail.com>
> wrote:
> >
> >> Yes, Pig's HBaseStorage using the HBase client to read/write directly
> >> to HBase from within a MR job, but chains to other Pig-generated MR
> >> jobs as needed to transform.
> >>
> >> Daniel, check that you have defined HBASE_CONF_DIR properly, or that
> >> you have hbase-site.xml in your classpath. Then try to telnet to the
> >> defined zookeeper host from the machine where the exception is being
> >> generated. There is some communication from Pig to HBase/ZK from the
> >> node that the client runs on before the MR jobs start on the cluster
> >> FYI.
> >>
> >>
> >> On Tue, Apr 12, 2011 at 8:40 AM, Jameson Lopp <ja...@bronto.com>
> wrote:
> >> > I'm by no means an expert, but I think it's the latter. My rudimentary
> >> > understanding is that pig uses HBaseStorage to load the data from
> hbase
> >> and
> >> > passes the input splits along to hadoop/MR. Feel free to correct me if
> >> I'm
> >> > wrong.
> >> > --
> >> > Jameson Lopp
> >> > Software Engineer
> >> > Bronto Software, Inc.
> >> >
> >> > On 04/12/2011 10:50 AM, Daniel Eklund wrote:
> >> >>
> >> >> As a follow-up to my own question, which accurately describes the
> >> >> component
> >> >> call-stack of the pig script I included in my post?
> >> >>
> >> >> pig ->  mapreduce/hadoop ->  Hbase
> >> >> pig  ->  Hbase ->  mapreduce/hadoop
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Apr 12, 2011 at 9:53 AM, Daniel Eklund<do...@gmail.com>
> >>  wrote:
> >> >>
> >> >>> This question might be better diagnosed as an Hbase issue, but since
> >> it's
> >> >>> ultimately a Pig script I want to use, I figure someone on this
> group
> >> >>> could
> >> >>> help me out. I tried asking the IRC channel, but I think it was in a
> >> >>> lull.
> >> >>>
> >> >>> My scenario:  I want to use Pig to call an HBase store.
> >> >>> My installs:   Apache Pig version 0.8.0-CDH3B4  --- hbase version:
> >> >>> hbase-0.90.1-CDH3B4.
> >> >>> My sample script:
> >> >>>
> >> >>> -----------
> >> >>> A = load 'passwd' using PigStorage(':');
> >> >>> rawDocs = LOAD 'hbase://daniel_product'
> >> >>>         USING
> >> >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
> >> >>> vals = foreach rawDocs generate $0 as val;
> >> >>> dump vals;
> >> >>> store vals into 'daniel.out';
> >> >>> -----------
> >> >>>
> >> >>> I am consistently getting a
> >> >>> Failed Jobs:
> >> >>> JobId   Alias   Feature Message Outputs
> >> >>> N/A     rawDocs,vals    MAP_ONLY        Message:
> >> >>> org.apache.pig.backend.executionengine.ExecException: ERROR 2118:
> Timed
> >> >>> out
> >> >>> trying to locate root region
> >> >>>         at
> >> >>>
> >> >>>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
> >> >>>
> >> >>>
> >> >>> Googling shows me similar issues:
> >> >>>
> >> >>>
> >> >>>
> >>
> http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig
> >> >>>
> >> >>> My current understanding is that somewhere in the interaction
> between
> >> >>> Pig,
> >> >>> Hadoop, HBase, and Zookeper, there is a configuration file that
> needs
> >> to
> >> >>> be
> >> >>> included in a classpath or a configuration directory somewhere.  I
> have
> >> >>> tried various combinations of making hadoop aware of Hbase and
> >> >>> vice-versa.
> >> >>> I have tried ZK running on its own, and also managed by HBase.
> >> >>>
> >> >>> Can someone explain the dependencies here?  Any insight as to what I
> am
> >> >>> missing?  What would your diagnosis of the above message be?
> >> >>>
> >> >>> thanks,
> >> >>> daniel
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>
> >> >
> >>
> >
>

Re: Greetings: first question

Posted by Bill Graham <bi...@gmail.com>.

Can you include more of your stack trace? I'm not sure of the
specifics of what is stored where in ZK, but it seems you're timing
out just trying to connect to ZK. Are you seeing any exceptions on the
TT nodes, or just on the client?


On Tue, Apr 12, 2011 at 3:24 PM, Daniel Eklund <do...@gmail.com> wrote:
> Bill,  I have done all that both you and Jameson have suggested and still
> get the same error.
>
> I can telnet into the zookeeper.  I have also used the zkClient.sh and can
> look at /hbase/rs to see the regionservers.
> Should I be able to see anything at /hbase/root-region-server?
>
> thanks,
> daniel
>
>
> On Tue, Apr 12, 2011 at 11:58 AM, Bill Graham <bi...@gmail.com> wrote:
>
>> Yes, Pig's HBaseStorage using the HBase client to read/write directly
>> to HBase from within a MR job, but chains to other Pig-generated MR
>> jobs as needed to transform.
>>
>> Daniel, check that you have defined HBASE_CONF_DIR properly, or that
>> you have hbase-site.xml in your classpath. Then try to telnet to the
>> defined zookeeper host from the machine where the exception is being
>> generated. There is some communication from Pig to HBase/ZK from the
>> node that the client runs on before the MR jobs start on the cluster
>> FYI.
>>
>>
>> On Tue, Apr 12, 2011 at 8:40 AM, Jameson Lopp <ja...@bronto.com> wrote:
>> > I'm by no means an expert, but I think it's the latter. My rudimentary
>> > understanding is that pig uses HBaseStorage to load the data from hbase
>> and
>> > passes the input splits along to hadoop/MR. Feel free to correct me if
>> I'm
>> > wrong.
>> > --
>> > Jameson Lopp
>> > Software Engineer
>> > Bronto Software, Inc.
>> >
>> > On 04/12/2011 10:50 AM, Daniel Eklund wrote:
>> >>
>> >> As a follow-up to my own question, which accurately describes the
>> >> component
>> >> call-stack of the pig script I included in my post?
>> >>
>> >> pig ->  mapreduce/hadoop ->  Hbase
>> >> pig  ->  Hbase ->  mapreduce/hadoop
>> >>
>> >>
>> >>
>> >> On Tue, Apr 12, 2011 at 9:53 AM, Daniel Eklund<do...@gmail.com>
>>  wrote:
>> >>
>> >>> This question might be better diagnosed as an Hbase issue, but since
>> it's
>> >>> ultimately a Pig script I want to use, I figure someone on this group
>> >>> could
>> >>> help me out. I tried asking the IRC channel, but I think it was in a
>> >>> lull.
>> >>>
>> >>> My scenario:  I want to use Pig to call an HBase store.
>> >>> My installs:   Apache Pig version 0.8.0-CDH3B4  --- hbase version:
>> >>> hbase-0.90.1-CDH3B4.
>> >>> My sample script:
>> >>>
>> >>> -----------
>> >>> A = load 'passwd' using PigStorage(':');
>> >>> rawDocs = LOAD 'hbase://daniel_product'
>> >>>         USING
>> >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
>> >>> vals = foreach rawDocs generate $0 as val;
>> >>> dump vals;
>> >>> store vals into 'daniel.out';
>> >>> -----------
>> >>>
>> >>> I am consistently getting a
>> >>> Failed Jobs:
>> >>> JobId   Alias   Feature Message Outputs
>> >>> N/A     rawDocs,vals    MAP_ONLY        Message:
>> >>> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed
>> >>> out
>> >>> trying to locate root region
>> >>>         at
>> >>>
>> >>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
>> >>>
>> >>>
>> >>> Googling shows me similar issues:
>> >>>
>> >>>
>> >>>
>> http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig
>> >>>
>> >>> My current understanding is that somewhere in the interaction between
>> >>> Pig,
>> >>> Hadoop, HBase, and Zookeper, there is a configuration file that needs
>> to
>> >>> be
>> >>> included in a classpath or a configuration directory somewhere.  I have
>> >>> tried various combinations of making hadoop aware of Hbase and
>> >>> vice-versa.
>> >>> I have tried ZK running on its own, and also managed by HBase.
>> >>>
>> >>> Can someone explain the dependencies here?  Any insight as to what I am
>> >>> missing?  What would your diagnosis of the above message be?
>> >>>
>> >>> thanks,
>> >>> daniel
>> >>>
>> >>>
>> >>>
>> >>>
>> >>
>> >
>>
>

Re: Greetings: first question

Posted by Daniel Eklund <do...@gmail.com>.

Bill,  I have done all that both you and Jameson have suggested and still
get the same error.

I can telnet into the zookeeper.  I have also used the zkClient.sh and can
look at /hbase/rs to see the regionservers.
Should I be able to see anything at /hbase/root-region-server?

thanks,
daniel


On Tue, Apr 12, 2011 at 11:58 AM, Bill Graham <bi...@gmail.com> wrote:

> Yes, Pig's HBaseStorage using the HBase client to read/write directly
> to HBase from within a MR job, but chains to other Pig-generated MR
> jobs as needed to transform.
>
> Daniel, check that you have defined HBASE_CONF_DIR properly, or that
> you have hbase-site.xml in your classpath. Then try to telnet to the
> defined zookeeper host from the machine where the exception is being
> generated. There is some communication from Pig to HBase/ZK from the
> node that the client runs on before the MR jobs start on the cluster
> FYI.
>
>
> On Tue, Apr 12, 2011 at 8:40 AM, Jameson Lopp <ja...@bronto.com> wrote:
> > I'm by no means an expert, but I think it's the latter. My rudimentary
> > understanding is that pig uses HBaseStorage to load the data from hbase
> and
> > passes the input splits along to hadoop/MR. Feel free to correct me if
> I'm
> > wrong.
> > --
> > Jameson Lopp
> > Software Engineer
> > Bronto Software, Inc.
> >
> > On 04/12/2011 10:50 AM, Daniel Eklund wrote:
> >>
> >> As a follow-up to my own question, which accurately describes the
> >> component
> >> call-stack of the pig script I included in my post?
> >>
> >> pig ->  mapreduce/hadoop ->  Hbase
> >> pig  ->  Hbase ->  mapreduce/hadoop
> >>
> >>
> >>
> >> On Tue, Apr 12, 2011 at 9:53 AM, Daniel Eklund<do...@gmail.com>
>  wrote:
> >>
> >>> This question might be better diagnosed as an Hbase issue, but since
> it's
> >>> ultimately a Pig script I want to use, I figure someone on this group
> >>> could
> >>> help me out. I tried asking the IRC channel, but I think it was in a
> >>> lull.
> >>>
> >>> My scenario:  I want to use Pig to call an HBase store.
> >>> My installs:   Apache Pig version 0.8.0-CDH3B4  --- hbase version:
> >>> hbase-0.90.1-CDH3B4.
> >>> My sample script:
> >>>
> >>> -----------
> >>> A = load 'passwd' using PigStorage(':');
> >>> rawDocs = LOAD 'hbase://daniel_product'
> >>>         USING
> >>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
> >>> vals = foreach rawDocs generate $0 as val;
> >>> dump vals;
> >>> store vals into 'daniel.out';
> >>> -----------
> >>>
> >>> I am consistently getting a
> >>> Failed Jobs:
> >>> JobId   Alias   Feature Message Outputs
> >>> N/A     rawDocs,vals    MAP_ONLY        Message:
> >>> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed
> >>> out
> >>> trying to locate root region
> >>>         at
> >>>
> >>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
> >>>
> >>>
> >>> Googling shows me similar issues:
> >>>
> >>>
> >>>
> http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig
> >>>
> >>> My current understanding is that somewhere in the interaction between
> >>> Pig,
> >>> Hadoop, HBase, and Zookeper, there is a configuration file that needs
> to
> >>> be
> >>> included in a classpath or a configuration directory somewhere.  I have
> >>> tried various combinations of making hadoop aware of Hbase and
> >>> vice-versa.
> >>> I have tried ZK running on its own, and also managed by HBase.
> >>>
> >>> Can someone explain the dependencies here?  Any insight as to what I am
> >>> missing?  What would your diagnosis of the above message be?
> >>>
> >>> thanks,
> >>> daniel
> >>>
> >>>
> >>>
> >>>
> >>
> >
>

Re: Greetings: first question

Posted by Bill Graham <bi...@gmail.com>.

Yes, Pig's HBaseStorage using the HBase client to read/write directly
to HBase from within a MR job, but chains to other Pig-generated MR
jobs as needed to transform.

Daniel, check that you have defined HBASE_CONF_DIR properly, or that
you have hbase-site.xml in your classpath. Then try to telnet to the
defined zookeeper host from the machine where the exception is being
generated. There is some communication from Pig to HBase/ZK from the
node that the client runs on before the MR jobs start on the cluster
FYI.


On Tue, Apr 12, 2011 at 8:40 AM, Jameson Lopp <ja...@bronto.com> wrote:
> I'm by no means an expert, but I think it's the latter. My rudimentary
> understanding is that pig uses HBaseStorage to load the data from hbase and
> passes the input splits along to hadoop/MR. Feel free to correct me if I'm
> wrong.
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc.
>
> On 04/12/2011 10:50 AM, Daniel Eklund wrote:
>>
>> As a follow-up to my own question, which accurately describes the
>> component
>> call-stack of the pig script I included in my post?
>>
>> pig ->  mapreduce/hadoop ->  Hbase
>> pig  ->  Hbase ->  mapreduce/hadoop
>>
>>
>>
>> On Tue, Apr 12, 2011 at 9:53 AM, Daniel Eklund<do...@gmail.com>  wrote:
>>
>>> This question might be better diagnosed as an Hbase issue, but since it's
>>> ultimately a Pig script I want to use, I figure someone on this group
>>> could
>>> help me out. I tried asking the IRC channel, but I think it was in a
>>> lull.
>>>
>>> My scenario:  I want to use Pig to call an HBase store.
>>> My installs:   Apache Pig version 0.8.0-CDH3B4  --- hbase version:
>>> hbase-0.90.1-CDH3B4.
>>> My sample script:
>>>
>>> -----------
>>> A = load 'passwd' using PigStorage(':');
>>> rawDocs = LOAD 'hbase://daniel_product'
>>>         USING
>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
>>> vals = foreach rawDocs generate $0 as val;
>>> dump vals;
>>> store vals into 'daniel.out';
>>> -----------
>>>
>>> I am consistently getting a
>>> Failed Jobs:
>>> JobId   Alias   Feature Message Outputs
>>> N/A     rawDocs,vals    MAP_ONLY        Message:
>>> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed
>>> out
>>> trying to locate root region
>>>         at
>>>
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
>>>
>>>
>>> Googling shows me similar issues:
>>>
>>>
>>> http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig
>>>
>>> My current understanding is that somewhere in the interaction between
>>> Pig,
>>> Hadoop, HBase, and Zookeper, there is a configuration file that needs to
>>> be
>>> included in a classpath or a configuration directory somewhere.  I have
>>> tried various combinations of making hadoop aware of Hbase and
>>> vice-versa.
>>> I have tried ZK running on its own, and also managed by HBase.
>>>
>>> Can someone explain the dependencies here?  Any insight as to what I am
>>> missing?  What would your diagnosis of the above message be?
>>>
>>> thanks,
>>> daniel
>>>
>>>
>>>
>>>
>>
>

Re: Greetings: first question

Posted by Jameson Lopp <ja...@bronto.com>.

I'm by no means an expert, but I think it's the latter. My rudimentary understanding is that pig 
uses HBaseStorage to load the data from hbase and passes the input splits along to hadoop/MR. Feel 
free to correct me if I'm wrong.
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

On 04/12/2011 10:50 AM, Daniel Eklund wrote:
> As a follow-up to my own question, which accurately describes the component
> call-stack of the pig script I included in my post?
>
> pig ->  mapreduce/hadoop ->  Hbase
> pig  ->  Hbase ->  mapreduce/hadoop
>
>
>
> On Tue, Apr 12, 2011 at 9:53 AM, Daniel Eklund<do...@gmail.com>  wrote:
>
>> This question might be better diagnosed as an Hbase issue, but since it's
>> ultimately a Pig script I want to use, I figure someone on this group could
>> help me out. I tried asking the IRC channel, but I think it was in a lull.
>>
>> My scenario:  I want to use Pig to call an HBase store.
>> My installs:   Apache Pig version 0.8.0-CDH3B4  --- hbase version:
>> hbase-0.90.1-CDH3B4.
>> My sample script:
>>
>> -----------
>> A = load 'passwd' using PigStorage(':');
>> rawDocs = LOAD 'hbase://daniel_product'
>>          USING
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
>> vals = foreach rawDocs generate $0 as val;
>> dump vals;
>> store vals into 'daniel.out';
>> -----------
>>
>> I am consistently getting a
>> Failed Jobs:
>> JobId   Alias   Feature Message Outputs
>> N/A     rawDocs,vals    MAP_ONLY        Message:
>> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed out
>> trying to locate root region
>>          at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
>>
>>
>> Googling shows me similar issues:
>>
>> http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig
>>
>> My current understanding is that somewhere in the interaction between Pig,
>> Hadoop, HBase, and Zookeper, there is a configuration file that needs to be
>> included in a classpath or a configuration directory somewhere.  I have
>> tried various combinations of making hadoop aware of Hbase and vice-versa.
>> I have tried ZK running on its own, and also managed by HBase.
>>
>> Can someone explain the dependencies here?  Any insight as to what I am
>> missing?  What would your diagnosis of the above message be?
>>
>> thanks,
>> daniel
>>
>>
>>
>>
>

Re: Greetings: first question

Posted by Daniel Eklund <do...@gmail.com>.

As a follow-up to my own question, which accurately describes the component
call-stack of the pig script I included in my post?

pig -> mapreduce/hadoop -> Hbase
pig  -> Hbase -> mapreduce/hadoop



On Tue, Apr 12, 2011 at 9:53 AM, Daniel Eklund <do...@gmail.com> wrote:

> This question might be better diagnosed as an Hbase issue, but since it's
> ultimately a Pig script I want to use, I figure someone on this group could
> help me out. I tried asking the IRC channel, but I think it was in a lull.
>
> My scenario:  I want to use Pig to call an HBase store.
> My installs:   Apache Pig version 0.8.0-CDH3B4  --- hbase version:
> hbase-0.90.1-CDH3B4.
> My sample script:
>
> -----------
> A = load 'passwd' using PigStorage(':');
> rawDocs = LOAD 'hbase://daniel_product'
>         USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
> vals = foreach rawDocs generate $0 as val;
> dump vals;
> store vals into 'daniel.out';
> -----------
>
> I am consistently getting a
> Failed Jobs:
> JobId   Alias   Feature Message Outputs
> N/A     rawDocs,vals    MAP_ONLY        Message:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed out
> trying to locate root region
>         at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
>
>
> Googling shows me similar issues:
>
> http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig
>
> My current understanding is that somewhere in the interaction between Pig,
> Hadoop, HBase, and Zookeper, there is a configuration file that needs to be
> included in a classpath or a configuration directory somewhere.  I have
> tried various combinations of making hadoop aware of Hbase and vice-versa.
> I have tried ZK running on its own, and also managed by HBase.
>
> Can someone explain the dependencies here?  Any insight as to what I am
> missing?  What would your diagnosis of the above message be?
>
> thanks,
> daniel
>
>
>
>

Re: Greetings: first question

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Jameson,
FYI we fixed that in the latest version of hbaseStorage that's in 0.8.1 and
trunk (both unreleased yet).  No more registering things, though they do
need to be on your classpath to begin with.

On Tue, Apr 12, 2011 at 7:54 AM, Jameson Lopp <ja...@bronto.com> wrote:

> 1) Do you have the hadoop nodes defined in zoo.cfg ?
>
> 2) these are the environment vars I set:
> export PIG_HOME=/usr/local/pig
> export HADOOP_HOME=/usr/local/hadoop
> export PIG_CLASSPATH=$PIG_HOME/pig.jar:$HADOOP_HOME/conf
> export HADOOPDIR=$HADOOP_HOME/conf
>
> 3) after a week of frustration trying to get pig talking to hbase, I ended
> up fixing it by manually registering jars in my pig script like so:
>
> REGISTER /usr/local/pig/lib/google-collections-1.0.jar;
> REGISTER /usr/local/pig/lib/hbase-0.20.3-1.cloudera.jar;
> REGISTER /usr/local/pig/lib/zookeeper-hbase-1329.jar
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc.
>
>
> On 04/12/2011 09:53 AM, Daniel Eklund wrote:
>
>> This question might be better diagnosed as an Hbase issue, but since it's
>> ultimately a Pig script I want to use, I figure someone on this group
>> could
>> help me out. I tried asking the IRC channel, but I think it was in a lull.
>>
>> My scenario:  I want to use Pig to call an HBase store.
>> My installs:   Apache Pig version 0.8.0-CDH3B4  --- hbase version:
>> hbase-0.90.1-CDH3B4.
>> My sample script:
>>
>> -----------
>> A = load 'passwd' using PigStorage(':');
>> rawDocs = LOAD 'hbase://daniel_product'
>>         USING
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
>> vals = foreach rawDocs generate $0 as val;
>> dump vals;
>> store vals into 'daniel.out';
>> -----------
>>
>> I am consistently getting a
>> Failed Jobs:
>> JobId   Alias   Feature Message Outputs
>> N/A     rawDocs,vals    MAP_ONLY        Message:
>> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed
>> out
>> trying to locate root region
>>         at
>>
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
>>
>>
>> Googling shows me similar issues:
>>
>> http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig
>>
>> My current understanding is that somewhere in the interaction between Pig,
>> Hadoop, HBase, and Zookeper, there is a configuration file that needs to
>> be
>> included in a classpath or a configuration directory somewhere.  I have
>> tried various combinations of making hadoop aware of Hbase and vice-versa.
>> I have tried ZK running on its own, and also managed by HBase.
>>
>> Can someone explain the dependencies here?  Any insight as to what I am
>> missing?  What would your diagnosis of the above message be?
>>
>> thanks,
>> daniel
>>
>>

Re: Greetings: first question

Posted by Jameson Lopp <ja...@bronto.com>.

1) Do you have the hadoop nodes defined in zoo.cfg ?

2) these are the environment vars I set:
export PIG_HOME=/usr/local/pig
export HADOOP_HOME=/usr/local/hadoop
export PIG_CLASSPATH=$PIG_HOME/pig.jar:$HADOOP_HOME/conf
export HADOOPDIR=$HADOOP_HOME/conf

3) after a week of frustration trying to get pig talking to hbase, I ended up fixing it by manually 
registering jars in my pig script like so:

REGISTER /usr/local/pig/lib/google-collections-1.0.jar;
REGISTER /usr/local/pig/lib/hbase-0.20.3-1.cloudera.jar;
REGISTER /usr/local/pig/lib/zookeeper-hbase-1329.jar
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

On 04/12/2011 09:53 AM, Daniel Eklund wrote:
> This question might be better diagnosed as an Hbase issue, but since it's
> ultimately a Pig script I want to use, I figure someone on this group could
> help me out. I tried asking the IRC channel, but I think it was in a lull.
>
> My scenario:  I want to use Pig to call an HBase store.
> My installs:   Apache Pig version 0.8.0-CDH3B4  --- hbase version:
> hbase-0.90.1-CDH3B4.
> My sample script:
>
> -----------
> A = load 'passwd' using PigStorage(':');
> rawDocs = LOAD 'hbase://daniel_product'
>          USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('base:testCol1');
> vals = foreach rawDocs generate $0 as val;
> dump vals;
> store vals into 'daniel.out';
> -----------
>
> I am consistently getting a
> Failed Jobs:
> JobId   Alias   Feature Message Outputs
> N/A     rawDocs,vals    MAP_ONLY        Message:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Timed out
> trying to locate root region
>          at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:280)
>
>
> Googling shows me similar issues:
>
> http://search-hadoop.com/m/RPLkD1bmY4l&subj=Re+Cannot+connect+HBase+to+Pig
>
> My current understanding is that somewhere in the interaction between Pig,
> Hadoop, HBase, and Zookeper, there is a configuration file that needs to be
> included in a classpath or a configuration directory somewhere.  I have
> tried various combinations of making hadoop aware of Hbase and vice-versa.
> I have tried ZK running on its own, and also managed by HBase.
>
> Can someone explain the dependencies here?  Any insight as to what I am
> missing?  What would your diagnosis of the above message be?
>
> thanks,
> daniel
>