You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Royston Sellman <ro...@googlemail.com> on 2012/04/19 22:41:24 UTC

HBaseStorage not working

Does HBaseStorage work with HBase 0.95?

 

This code was working with HBase 0.92 and Pig 0.9 but fails on HBase 0.95
and Pig 0.11 (built from source):

 

register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar

register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar

 

 

tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS (

      ID:chararray,

      hp:chararray,

      pf:chararray,

      gz:chararray,

      hid:chararray,

      hst:chararray,

      mgz:chararray,

      gg:chararray,

      epc:chararray );

 

STORE tbl1 INTO 'hbase://sse.tbl1' 

USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp edrp:pf
edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');

 

The job output (using either Grunt or PigServer makes no difference) shows
the family:descriptors being added by HBaseStorage then starts up the MR job
which (after a long pause) reports:

------------

Input(s):

Failed to read data from
"hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"

 

Output(s):

Failed to produce result in "hbase://sse.tbl1"

 

 

INFO mapReduceLayer.MapReduceLauncher: Failed!

INFO hbase.HBaseStorage: Adding family:descriptor filters with values
edrp:hp

INFO hbase.HBaseStorage: Adding family:descriptor filters with values
edrp:pf

INFO hbase.HBaseStorage: Adding family:descriptor filters with values
edrp:gz

INFO hbase.HBaseStorage: Adding family:descriptor filters with values
edrp:hid

INFO hbase.HBaseStorage: Adding family:descriptor filters with values
edrp:hst

INFO hbase.HBaseStorage: Adding family:descriptor filters with values
edrp:mgz

INFO hbase.HBaseStorage: Adding family:descriptor filters with values
edrp:gg

INFO hbase.HBaseStorage: Adding family:descriptor filters with values
edrp:epc

------------

 

The "Failed to read" is misleading I think because dump tbl1; in place of
the store works fine. 

 

I get nothing in the HBase logs and nothing in the Pig log.

 

HBase works fine from the shell and can read and write to the table. Pig
works fine in and out of HDFS on CSVs.

 

Any ideas?

 

Royston

 


Re: HBaseStorage not working

Posted by Royston Sellman <ro...@googlemail.com>.
Thanks Rajgopal. Should I create a Jira? (never did that before).

Do you know if anybody is successfully running Pig 0.11 on HBase 0.95 & Hadoop 1.0.3? 

Regards,
Royston


On 20 Apr 2012, at 14:42, Rajgopal Vaithiyanathan wrote:

> That is a mistake. Should be corrected.!
> 
> The way you are using is right.
> The ID (first column) will be the hbase's rowkey. and the other's will get
> into the columns you mention in the arg of HBaseStorage.
> 
> 
> On Fri, Apr 20, 2012 at 6:23 PM, Royston Sellman <
> royston.sellman@googlemail.com> wrote:
> 
>> OK, I'll ping the HBase folks...
>> 
>> Meanwhile, are the HBaseStorage docs correct? The example shows the STORE
>> command having 'USING' and 'AS' clauses, but 'AS' gives a parse error. 'AS'
>> is valid in LOADs though.
>> 
>> Cheers,
>> Royston
>> 
>> 
>> -----Original Message-----
>> From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com]
>> Sent: 20 April 2012 00:03
>> To: user@pig.apache.org
>> Subject: Re: HBaseStorage not working
>> 
>> Nothing significant changed in Pig trunk, so I am guessing HBase changed
>> something; you are more likely to get help from them (they should at least
>> be able to point at APIs that changed and are likely to cause this sort of
>> thing).
>> 
>> You might also want to check if any of the started MR jobs have anything
>> interesting in their task logs.
>> 
>> D
>> 
>> On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman
>> <ro...@googlemail.com> wrote:
>>> Does HBaseStorage work with HBase 0.95?
>>> 
>>> 
>>> 
>>> This code was working with HBase 0.92 and Pig 0.9 but fails on HBase
>>> 0.95 and Pig 0.11 (built from source):
>>> 
>>> 
>>> 
>>> register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
>>> 
>>> register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
>>> 
>>> 
>>> 
>>> 
>>> 
>>> tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS
>>> (
>>> 
>>>     ID:chararray,
>>> 
>>>     hp:chararray,
>>> 
>>>     pf:chararray,
>>> 
>>>     gz:chararray,
>>> 
>>>     hid:chararray,
>>> 
>>>     hst:chararray,
>>> 
>>>     mgz:chararray,
>>> 
>>>     gg:chararray,
>>> 
>>>     epc:chararray );
>>> 
>>> 
>>> 
>>> STORE tbl1 INTO 'hbase://sse.tbl1'
>>> 
>>> USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp
>>> edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
>>> 
>>> 
>>> 
>>> The job output (using either Grunt or PigServer makes no difference)
>>> shows the family:descriptors being added by HBaseStorage then starts
>>> up the MR job which (after a long pause) reports:
>>> 
>>> ------------
>>> 
>>> Input(s):
>>> 
>>> Failed to read data from
>>> "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
>>> 
>>> 
>>> 
>>> Output(s):
>>> 
>>> Failed to produce result in "hbase://sse.tbl1"
>>> 
>>> 
>>> 
>>> 
>>> 
>>> INFO mapReduceLayer.MapReduceLauncher: Failed!
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:hp
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:pf
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:gz
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:hid
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:hst
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:mgz
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:gg
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:epc
>>> 
>>> ------------
>>> 
>>> 
>>> 
>>> The "Failed to read" is misleading I think because dump tbl1; in place
>>> of the store works fine.
>>> 
>>> 
>>> 
>>> I get nothing in the HBase logs and nothing in the Pig log.
>>> 
>>> 
>>> 
>>> HBase works fine from the shell and can read and write to the table.
>>> Pig works fine in and out of HDFS on CSVs.
>>> 
>>> 
>>> 
>>> Any ideas?
>>> 
>>> 
>>> 
>>> Royston
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> Thanks and Regards,
> Rajgopal Vaithiyanathan.


Re: HBaseStorage not working

Posted by Rajgopal Vaithiyanathan <ra...@gmail.com>.
That is a mistake. Should be corrected.!

The way you are using is right.
The ID (first column) will be the hbase's rowkey. and the other's will get
into the columns you mention in the arg of HBaseStorage.


On Fri, Apr 20, 2012 at 6:23 PM, Royston Sellman <
royston.sellman@googlemail.com> wrote:

> OK, I'll ping the HBase folks...
>
> Meanwhile, are the HBaseStorage docs correct? The example shows the STORE
> command having 'USING' and 'AS' clauses, but 'AS' gives a parse error. 'AS'
> is valid in LOADs though.
>
> Cheers,
> Royston
>
>
> -----Original Message-----
> From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com]
> Sent: 20 April 2012 00:03
> To: user@pig.apache.org
> Subject: Re: HBaseStorage not working
>
> Nothing significant changed in Pig trunk, so I am guessing HBase changed
> something; you are more likely to get help from them (they should at least
> be able to point at APIs that changed and are likely to cause this sort of
> thing).
>
> You might also want to check if any of the started MR jobs have anything
> interesting in their task logs.
>
> D
>
> On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman
> <ro...@googlemail.com> wrote:
> > Does HBaseStorage work with HBase 0.95?
> >
> >
> >
> > This code was working with HBase 0.92 and Pig 0.9 but fails on HBase
> > 0.95 and Pig 0.11 (built from source):
> >
> >
> >
> > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> >
> > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> >
> >
> >
> >
> >
> > tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS
> > (
> >
> >      ID:chararray,
> >
> >      hp:chararray,
> >
> >      pf:chararray,
> >
> >      gz:chararray,
> >
> >      hid:chararray,
> >
> >      hst:chararray,
> >
> >      mgz:chararray,
> >
> >      gg:chararray,
> >
> >      epc:chararray );
> >
> >
> >
> > STORE tbl1 INTO 'hbase://sse.tbl1'
> >
> > USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp
> > edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
> >
> >
> >
> > The job output (using either Grunt or PigServer makes no difference)
> > shows the family:descriptors being added by HBaseStorage then starts
> > up the MR job which (after a long pause) reports:
> >
> > ------------
> >
> > Input(s):
> >
> > Failed to read data from
> > "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
> >
> >
> >
> > Output(s):
> >
> > Failed to produce result in "hbase://sse.tbl1"
> >
> >
> >
> >
> >
> > INFO mapReduceLayer.MapReduceLauncher: Failed!
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:hp
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:pf
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:gz
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:hid
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:hst
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:mgz
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:gg
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:epc
> >
> > ------------
> >
> >
> >
> > The "Failed to read" is misleading I think because dump tbl1; in place
> > of the store works fine.
> >
> >
> >
> > I get nothing in the HBase logs and nothing in the Pig log.
> >
> >
> >
> > HBase works fine from the shell and can read and write to the table.
> > Pig works fine in and out of HDFS on CSVs.
> >
> >
> >
> > Any ideas?
> >
> >
> >
> > Royston
> >
> >
> >
>
>


-- 
Thanks and Regards,
Rajgopal Vaithiyanathan.

RE: HBaseStorage not working

Posted by Royston Sellman <ro...@googlemail.com>.
OK, I'll ping the HBase folks...

Meanwhile, are the HBaseStorage docs correct? The example shows the STORE
command having 'USING' and 'AS' clauses, but 'AS' gives a parse error. 'AS'
is valid in LOADs though.

Cheers,
Royston


-----Original Message-----
From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com] 
Sent: 20 April 2012 00:03
To: user@pig.apache.org
Subject: Re: HBaseStorage not working

Nothing significant changed in Pig trunk, so I am guessing HBase changed
something; you are more likely to get help from them (they should at least
be able to point at APIs that changed and are likely to cause this sort of
thing).

You might also want to check if any of the started MR jobs have anything
interesting in their task logs.

D

On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman
<ro...@googlemail.com> wrote:
> Does HBaseStorage work with HBase 0.95?
>
>
>
> This code was working with HBase 0.92 and Pig 0.9 but fails on HBase 
> 0.95 and Pig 0.11 (built from source):
>
>
>
> register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
>
> register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
>
>
>
>
>
> tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS 
> (
>
>      ID:chararray,
>
>      hp:chararray,
>
>      pf:chararray,
>
>      gz:chararray,
>
>      hid:chararray,
>
>      hst:chararray,
>
>      mgz:chararray,
>
>      gg:chararray,
>
>      epc:chararray );
>
>
>
> STORE tbl1 INTO 'hbase://sse.tbl1'
>
> USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp 
> edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
>
>
>
> The job output (using either Grunt or PigServer makes no difference) 
> shows the family:descriptors being added by HBaseStorage then starts 
> up the MR job which (after a long pause) reports:
>
> ------------
>
> Input(s):
>
> Failed to read data from
> "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
>
>
>
> Output(s):
>
> Failed to produce result in "hbase://sse.tbl1"
>
>
>
>
>
> INFO mapReduceLayer.MapReduceLauncher: Failed!
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:hp
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:pf
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:gz
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:hid
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:hst
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:mgz
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:gg
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:epc
>
> ------------
>
>
>
> The "Failed to read" is misleading I think because dump tbl1; in place 
> of the store works fine.
>
>
>
> I get nothing in the HBase logs and nothing in the Pig log.
>
>
>
> HBase works fine from the shell and can read and write to the table. 
> Pig works fine in and out of HDFS on CSVs.
>
>
>
> Any ideas?
>
>
>
> Royston
>
>
>


RE: HBaseStorage not working

Posted by Royston Sellman <ro...@googlemail.com>.
OK, so we have solved the problem and yes it is a config/classpath problem.

Our solution is to put a symlink to zoo.cfg into the HADOOP_INSTALL/conf
directory. Maybe this will help someone else in future...

On our installation (hadoop1.0.3-SNAPSHOT/HBase 0.95-SNAPSHOT/Pig
0.11-SNAPSHOT) our code using PigServer does not work with zoo.cfg just
being on the Pig, HBase, Hadoop CLASSPATHs. The tasktrackers do not get the
right IP address for zookeeper and hang with connection refused errors. The
symlink fixes it.

However, code not using PigServer but going direct to HBase client DOES work
WITHOUT the symlink.

Our understanding of the Pig/HBase/Hadoop stack CLASSPATH/config universe is
not perfect but it seems that the PigServer map-reduce launcher does not
pass through the path to zoo.cfg?

Royston


-----Original Message-----
From: Norbert Burger [mailto:norbert.burger@gmail.com] 
Sent: 02 May 2012 12:54
To: user@pig.apache.org
Cc: user@hbase.apache.org
Subject: Re: HBaseStorage not working

This is a config/classpath issue, no?  At the lowest level, Hadoop MR tasks
don't pick up settings from the HBase conf directory unless they're
explicitly added to the classpath, usually via hadoop/conf/hadoop-env.sh:

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-su
mmary.html#classpath

Perhaps the classpath that's being added to your Java jobs is slightly
different?

Norbert

On Wed, May 2, 2012 at 6:48 AM, Royston Sellman <
royston.sellman@googlemail.com> wrote:

> Hi,
>
> We are still experiencing 40-60 minutes of task failure before our 
> HBaseStorage jobs run but we think we've narrowed the problem down to 
> a specific zookeeper issue.
>
> The HBaseStorage map task only works when it lands on a machine that 
> actually is running zookeeper server as part of the quorum. It 
> typically attempts from several different nodes in the cluster, 
> failing repeatedly before it hits on a zookeeper node.
>
> Logs show the failing task attempts are trying to connect to the 
> localhost machine on port 2181 to make a ZooKeeper connection (as part 
> of the Load/HBaseStorage map task):
>
> ...
> > 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: 
> > Opening socket connection to server /127.0.0.1:2181
> ...
> > java.net.ConnectException: Connection refused
> ...
>
> This explains why the job succeeds eventually, as we have a zookeeper 
> quorum server running on one of our worker nodes, but not on the other 
> 3.
> Therefore, the job fails repeatedly until it is redistributed onto the 
> node with the ZK server, at which point it succeeds immediately.
>
> We therefore suspect the issue is in our ZK configuration. Our 
> hbase-site.xml defines the zookeeper quorum as follows:
>
>    <property>
>      <name>hbase.zookeeper.quorum</name>
>      <value>namenode,jobtracker,slave0</value>
>    </property>
>
> Therefore, we would expect the tasks to connect to one of those hosts 
> when attempting a zookeeper connection, however it appears to be 
> attempting to connect to "localhost" (which is the default). It is as 
> if the hbase configuration settings here are not used.
>
> Does anyone have any suggestions as to what might be the cause of this 
> behaviour?
>
> Sending this to both lists although it is only Pig HBaseStorage jobs 
> that suffer this problem on our cluster. HBase Java client jobs work
normally.
>
> Thanks,
> Royston
>
> -----Original Message-----
> From: Subir S [mailto:subir.sasikumar@gmail.com]
> Sent: 24 April 2012 13:29
> To: user@pig.apache.org; user@hbase.apache.org
> Subject: Re: HBaseStorage not working
>
> Looping HBase group.
>
> On Tue, Apr 24, 2012 at 5:18 PM, Royston Sellman < 
> royston.sellman@googlemail.com> wrote:
>
> > We still haven't cracked this but  bit more info (HBase 0.95; Pig 0.11):
> >
> > The script below runs fine in a few seconds using Pig in local mode 
> > but with Pig in MR mode it sometimes works rapidly but usually takes
> > 40 minutes to an hour.
> >
> > --hbaseuploadtest.pig
> > register /opt/hbase/hbase-trunk/lib/protobuf-java-2.4.0a.jar
> > register /opt/hbase/hbase-trunk/lib/guava-r09.jar
> > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> > raw_data = LOAD '/data/sse.tbl1.HEADERLESS.csv' USING PigStorage( ','
> > ) AS (mid : chararray, hid : chararray, mf : chararray, mt : 
> > chararray,
> mind :
> > chararray, mimd : chararray, mst : chararray ); dump raw_data; STORE 
> > raw_data INTO 'hbase://hbaseuploadtest' USING 
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:hid info:mf 
> > info:mt info:mind info:mimd info:mst);
> >
> > i.e.
> > [hadoop1@namenode hadoop-1.0.2]$ pig -x local 
> > ../pig-scripts/hbaseuploadtest.pig
> > WORKS EVERY TIME!!
> > But
> > [hadoop1@namenode hadoop-1.0.2]$ pig -x mapreduce 
> > ../pig-scripts/hbaseuploadtest.pig
> > Sometimes (but rarely) runs in under a minute, often takes more than
> > 40 minutes to get to 50% but then completes to 100% in seconds. The 
> > dataset is very small.
> >
> > Note that the dump of raw_data works in both cases. However the 
> > STORE command causes the MR job to stall and the job setup task 
> > shows the following errors:
> > Task attempt_201204240854_0006_m_000002_0 failed to report status 
> > for
> > 602 seconds. Killing!
> > Task attempt_201204240854_0006_m_000002_1 failed to report status 
> > for
> > 601 seconds. Killing!
> >
> > And task log shows the following stream of errors:
> >
> > 2012-04-24 11:57:27,427 INFO org.apache.zookeeper.ZooKeeper:
> > Initiating client connection, connectString=localhost:2181
> > sessionTimeout=180000 watcher=hconnection 0x5567d7fb
> > 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: 
> > Opening socket connection to server /127.0.0.1:2181
> > 2012-04-24 11:57:27,443 WARN
> > org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> > java.lang.SecurityException: Unable to locate a login configuration 
> > occurred when trying to find JAAS configuration.
> > 2012-04-24 11:57:27,443 INFO
> > org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not 
> > SASL-authenticate because the default JAAS configuration section
'Client'
> > could not be found. If you are not using SASL, you may ignore this. 
> > On the other hand, if you expected SASL to work, please fix your 
> > JAAS configuration.
> > 2012-04-24 11:57:27,444 WARN org.apache.zookeeper.ClientCnxn: 
> > Session
> > 0x0 for server null, unexpected error, closing socket connection and 
> > attempting reconnect
> > java.net.ConnectException: Connection refused
> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >        at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >        at
> >
> > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocke
> > tN
> > IO.jav
> > a:286)
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> > 2012-04-24 11:57:27,445 INFO
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The 
> > identifier of this process is 6846@slave2
> > 2012-04-24 11:57:27,551 INFO org.apache.zookeeper.ClientCnxn: 
> > Opening socket connection to server /127.0.0.1:2181
> > 2012-04-24 11:57:27,552 WARN
> > org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> > java.lang.SecurityException: Unable to locate a login configuration 
> > occurred when trying to find JAAS configuration.
> > 2012-04-24 11:57:27,552 INFO
> > org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not 
> > SASL-authenticate because the default JAAS configuration section
'Client'
> > could not be found. If you are not using SASL, you may ignore this. 
> > On the other hand, if you expected SASL to work, please fix your 
> > JAAS configuration.
> > 2012-04-24 11:57:27,552 WARN org.apache.zookeeper.ClientCnxn: 
> > Session
> > 0x0 for server null, unexpected error, closing socket connection and 
> > attempting reconnect
> > java.net.ConnectException: Connection refused
> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >        at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >        at
> >
> > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocke
> > tN
> > IO.jav
> > a:286)
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> > 2012-04-24 11:57:27,553 WARN
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly 
> > transient ZooKeeper exception:
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
> > 2012-04-24 11:57:27,553 INFO org.apache.hadoop.hbase.util.RetryCounter:
> > Sleeping 2000ms before retry #1...
> > 2012-04-24 11:57:28,652 INFO org.apache.zookeeper.ClientCnxn: 
> > Opening socket connection to server localhost/127.0.0.1:2181
> > 2012-04-24 11:57:28,653 WARN
> > org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> > java.lang.SecurityException: Unable to locate a login configuration 
> > occurred when trying to find JAAS configuration.
> > 2012-04-24 11:57:28,653 INFO
> > org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not 
> > SASL-authenticate because the default JAAS configuration section
'Client'
> > could not be found. If you are not using SASL, you may ignore this. 
> > On the other hand, if you expected SASL to work, please fix your 
> > JAAS configuration.
> > 2012-04-24 11:57:28,653 WARN org.apache.zookeeper.ClientCnxn: 
> > Session
> > 0x0 for server null, unexpected error, closing socket connection and 
> > attempting reconnect
> > java.net.ConnectException: Connection refused etc etc
> >
> > Any ideas? Anyone else out there successfully running Pig 0.11
> > HBaseStorage() against HBase 0.95?
> >
> > Thanks,
> > Royston
> >
> >
> >
> > -----Original Message-----
> > From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com]
> > Sent: 20 April 2012 00:03
> > To: user@pig.apache.org
> > Subject: Re: HBaseStorage not working
> >
> > Nothing significant changed in Pig trunk, so I am guessing HBase 
> > changed something; you are more likely to get help from them (they 
> > should at least be able to point at APIs that changed and are likely 
> > to cause this sort of thing).
> >
> > You might also want to check if any of the started MR jobs have 
> > anything interesting in their task logs.
> >
> > D
> >
> > On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman 
> > <ro...@googlemail.com> wrote:
> > > Does HBaseStorage work with HBase 0.95?
> > >
> > >
> > >
> > > This code was working with HBase 0.92 and Pig 0.9 but fails on 
> > > HBase
> > > 0.95 and Pig 0.11 (built from source):
> > >
> > >
> > >
> > > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> > >
> > > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> > >
> > >
> > >
> > >
> > >
> > > tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' 
> > > ) AS (
> > >
> > >      ID:chararray,
> > >
> > >      hp:chararray,
> > >
> > >      pf:chararray,
> > >
> > >      gz:chararray,
> > >
> > >      hid:chararray,
> > >
> > >      hst:chararray,
> > >
> > >      mgz:chararray,
> > >
> > >      gg:chararray,
> > >
> > >      epc:chararray );
> > >
> > >
> > >
> > > STORE tbl1 INTO 'hbase://sse.tbl1'
> > >
> > > USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp
> > > edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
> > >
> > >
> > >
> > > The job output (using either Grunt or PigServer makes no 
> > > difference) shows the family:descriptors being added by 
> > > HBaseStorage then starts up the MR job which (after a long pause)
reports:
> > >
> > > ------------
> > >
> > > Input(s):
> > >
> > > Failed to read data from
> > > "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
> > >
> > >
> > >
> > > Output(s):
> > >
> > > Failed to produce result in "hbase://sse.tbl1"
> > >
> > >
> > >
> > >
> > >
> > > INFO mapReduceLayer.MapReduceLauncher: Failed!
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > > values edrp:hp
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > > values edrp:pf
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > > values edrp:gz
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > > values edrp:hid
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > > values edrp:hst
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > > values edrp:mgz
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > > values edrp:gg
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > > values edrp:epc
> > >
> > > ------------
> > >
> > >
> > >
> > > The "Failed to read" is misleading I think because dump tbl1; in 
> > > place of the store works fine.
> > >
> > >
> > >
> > > I get nothing in the HBase logs and nothing in the Pig log.
> > >
> > >
> > >
> > > HBase works fine from the shell and can read and write to the table.
> > > Pig works fine in and out of HDFS on CSVs.
> > >
> > >
> > >
> > > Any ideas?
> > >
> > >
> > >
> > > Royston
> > >
> > >
> > >
> >
> >
>
>


Re: HBaseStorage not working

Posted by Norbert Burger <no...@gmail.com>.
This is a config/classpath issue, no?  At the lowest level, Hadoop MR tasks
don't pick up settings from the HBase conf directory unless they're
explicitly added to the classpath, usually via hadoop/conf/hadoop-env.sh:

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath

Perhaps the classpath that's being added to your Java jobs is slightly
different?

Norbert

On Wed, May 2, 2012 at 6:48 AM, Royston Sellman <
royston.sellman@googlemail.com> wrote:

> Hi,
>
> We are still experiencing 40-60 minutes of task failure before our
> HBaseStorage jobs run but we think we've narrowed the problem down to a
> specific zookeeper issue.
>
> The HBaseStorage map task only works when it lands on a machine that
> actually is running zookeeper server as part of the quorum. It typically
> attempts from several different nodes in the cluster, failing repeatedly
> before it hits on a zookeeper node.
>
> Logs show the failing task attempts are trying to connect to the localhost
> machine on port 2181 to make a ZooKeeper connection (as part of the
> Load/HBaseStorage map task):
>
> ...
> > 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server /127.0.0.1:2181
> ...
> > java.net.ConnectException: Connection refused
> ...
>
> This explains why the job succeeds eventually, as we have a zookeeper
> quorum
> server running on one of our worker nodes, but not on the other 3.
> Therefore, the job fails repeatedly until it is redistributed onto the node
> with the ZK server, at which point it succeeds immediately.
>
> We therefore suspect the issue is in our ZK configuration. Our
> hbase-site.xml defines the zookeeper quorum as follows:
>
>    <property>
>      <name>hbase.zookeeper.quorum</name>
>      <value>namenode,jobtracker,slave0</value>
>    </property>
>
> Therefore, we would expect the tasks to connect to one of those hosts when
> attempting a zookeeper connection, however it appears to be attempting to
> connect to "localhost" (which is the default). It is as if the hbase
> configuration settings here are not used.
>
> Does anyone have any suggestions as to what might be the cause of this
> behaviour?
>
> Sending this to both lists although it is only Pig HBaseStorage jobs that
> suffer this problem on our cluster. HBase Java client jobs work normally.
>
> Thanks,
> Royston
>
> -----Original Message-----
> From: Subir S [mailto:subir.sasikumar@gmail.com]
> Sent: 24 April 2012 13:29
> To: user@pig.apache.org; user@hbase.apache.org
> Subject: Re: HBaseStorage not working
>
> Looping HBase group.
>
> On Tue, Apr 24, 2012 at 5:18 PM, Royston Sellman <
> royston.sellman@googlemail.com> wrote:
>
> > We still haven't cracked this but  bit more info (HBase 0.95; Pig 0.11):
> >
> > The script below runs fine in a few seconds using Pig in local mode
> > but with Pig in MR mode it sometimes works rapidly but usually takes
> > 40 minutes to an hour.
> >
> > --hbaseuploadtest.pig
> > register /opt/hbase/hbase-trunk/lib/protobuf-java-2.4.0a.jar
> > register /opt/hbase/hbase-trunk/lib/guava-r09.jar
> > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> > raw_data = LOAD '/data/sse.tbl1.HEADERLESS.csv' USING PigStorage( ','
> > ) AS (mid : chararray, hid : chararray, mf : chararray, mt : chararray,
> mind :
> > chararray, mimd : chararray, mst : chararray ); dump raw_data; STORE
> > raw_data INTO 'hbase://hbaseuploadtest' USING
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:hid info:mf
> > info:mt info:mind info:mimd info:mst);
> >
> > i.e.
> > [hadoop1@namenode hadoop-1.0.2]$ pig -x local
> > ../pig-scripts/hbaseuploadtest.pig
> > WORKS EVERY TIME!!
> > But
> > [hadoop1@namenode hadoop-1.0.2]$ pig -x mapreduce
> > ../pig-scripts/hbaseuploadtest.pig
> > Sometimes (but rarely) runs in under a minute, often takes more than
> > 40 minutes to get to 50% but then completes to 100% in seconds. The
> > dataset is very small.
> >
> > Note that the dump of raw_data works in both cases. However the STORE
> > command causes the MR job to stall and the job setup task shows the
> > following errors:
> > Task attempt_201204240854_0006_m_000002_0 failed to report status for
> > 602 seconds. Killing!
> > Task attempt_201204240854_0006_m_000002_1 failed to report status for
> > 601 seconds. Killing!
> >
> > And task log shows the following stream of errors:
> >
> > 2012-04-24 11:57:27,427 INFO org.apache.zookeeper.ZooKeeper:
> > Initiating client connection, connectString=localhost:2181
> > sessionTimeout=180000 watcher=hconnection 0x5567d7fb
> > 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server /127.0.0.1:2181
> > 2012-04-24 11:57:27,443 WARN
> > org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> > java.lang.SecurityException: Unable to locate a login configuration
> > occurred when trying to find JAAS configuration.
> > 2012-04-24 11:57:27,443 INFO
> > org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
> > SASL-authenticate because the default JAAS configuration section 'Client'
> > could not be found. If you are not using SASL, you may ignore this. On
> > the other hand, if you expected SASL to work, please fix your JAAS
> > configuration.
> > 2012-04-24 11:57:27,444 WARN org.apache.zookeeper.ClientCnxn: Session
> > 0x0 for server null, unexpected error, closing socket connection and
> > attempting reconnect
> > java.net.ConnectException: Connection refused
> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >        at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >        at
> >
> > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN
> > IO.jav
> > a:286)
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> > 2012-04-24 11:57:27,445 INFO
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier
> > of this process is 6846@slave2
> > 2012-04-24 11:57:27,551 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server /127.0.0.1:2181
> > 2012-04-24 11:57:27,552 WARN
> > org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> > java.lang.SecurityException: Unable to locate a login configuration
> > occurred when trying to find JAAS configuration.
> > 2012-04-24 11:57:27,552 INFO
> > org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
> > SASL-authenticate because the default JAAS configuration section 'Client'
> > could not be found. If you are not using SASL, you may ignore this. On
> > the other hand, if you expected SASL to work, please fix your JAAS
> > configuration.
> > 2012-04-24 11:57:27,552 WARN org.apache.zookeeper.ClientCnxn: Session
> > 0x0 for server null, unexpected error, closing socket connection and
> > attempting reconnect
> > java.net.ConnectException: Connection refused
> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >        at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >        at
> >
> > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN
> > IO.jav
> > a:286)
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> > 2012-04-24 11:57:27,553 WARN
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
> > transient ZooKeeper exception:
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
> > 2012-04-24 11:57:27,553 INFO org.apache.hadoop.hbase.util.RetryCounter:
> > Sleeping 2000ms before retry #1...
> > 2012-04-24 11:57:28,652 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server localhost/127.0.0.1:2181
> > 2012-04-24 11:57:28,653 WARN
> > org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> > java.lang.SecurityException: Unable to locate a login configuration
> > occurred when trying to find JAAS configuration.
> > 2012-04-24 11:57:28,653 INFO
> > org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
> > SASL-authenticate because the default JAAS configuration section 'Client'
> > could not be found. If you are not using SASL, you may ignore this. On
> > the other hand, if you expected SASL to work, please fix your JAAS
> > configuration.
> > 2012-04-24 11:57:28,653 WARN org.apache.zookeeper.ClientCnxn: Session
> > 0x0 for server null, unexpected error, closing socket connection and
> > attempting reconnect
> > java.net.ConnectException: Connection refused etc etc
> >
> > Any ideas? Anyone else out there successfully running Pig 0.11
> > HBaseStorage() against HBase 0.95?
> >
> > Thanks,
> > Royston
> >
> >
> >
> > -----Original Message-----
> > From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com]
> > Sent: 20 April 2012 00:03
> > To: user@pig.apache.org
> > Subject: Re: HBaseStorage not working
> >
> > Nothing significant changed in Pig trunk, so I am guessing HBase
> > changed something; you are more likely to get help from them (they
> > should at least be able to point at APIs that changed and are likely
> > to cause this sort of thing).
> >
> > You might also want to check if any of the started MR jobs have
> > anything interesting in their task logs.
> >
> > D
> >
> > On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman
> > <ro...@googlemail.com> wrote:
> > > Does HBaseStorage work with HBase 0.95?
> > >
> > >
> > >
> > > This code was working with HBase 0.92 and Pig 0.9 but fails on HBase
> > > 0.95 and Pig 0.11 (built from source):
> > >
> > >
> > >
> > > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> > >
> > > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> > >
> > >
> > >
> > >
> > >
> > > tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' )
> > > AS (
> > >
> > >      ID:chararray,
> > >
> > >      hp:chararray,
> > >
> > >      pf:chararray,
> > >
> > >      gz:chararray,
> > >
> > >      hid:chararray,
> > >
> > >      hst:chararray,
> > >
> > >      mgz:chararray,
> > >
> > >      gg:chararray,
> > >
> > >      epc:chararray );
> > >
> > >
> > >
> > > STORE tbl1 INTO 'hbase://sse.tbl1'
> > >
> > > USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp
> > > edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
> > >
> > >
> > >
> > > The job output (using either Grunt or PigServer makes no difference)
> > > shows the family:descriptors being added by HBaseStorage then starts
> > > up the MR job which (after a long pause) reports:
> > >
> > > ------------
> > >
> > > Input(s):
> > >
> > > Failed to read data from
> > > "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
> > >
> > >
> > >
> > > Output(s):
> > >
> > > Failed to produce result in "hbase://sse.tbl1"
> > >
> > >
> > >
> > >
> > >
> > > INFO mapReduceLayer.MapReduceLauncher: Failed!
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:hp
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:pf
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:gz
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:hid
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:hst
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:mgz
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:gg
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:epc
> > >
> > > ------------
> > >
> > >
> > >
> > > The "Failed to read" is misleading I think because dump tbl1; in
> > > place of the store works fine.
> > >
> > >
> > >
> > > I get nothing in the HBase logs and nothing in the Pig log.
> > >
> > >
> > >
> > > HBase works fine from the shell and can read and write to the table.
> > > Pig works fine in and out of HDFS on CSVs.
> > >
> > >
> > >
> > > Any ideas?
> > >
> > >
> > >
> > > Royston
> > >
> > >
> > >
> >
> >
>
>

Re: HBaseStorage not working

Posted by Norbert Burger <no...@gmail.com>.
This is a config/classpath issue, no?  At the lowest level, Hadoop MR tasks
don't pick up settings from the HBase conf directory unless they're
explicitly added to the classpath, usually via hadoop/conf/hadoop-env.sh:

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath

Perhaps the classpath that's being added to your Java jobs is slightly
different?

Norbert

On Wed, May 2, 2012 at 6:48 AM, Royston Sellman <
royston.sellman@googlemail.com> wrote:

> Hi,
>
> We are still experiencing 40-60 minutes of task failure before our
> HBaseStorage jobs run but we think we've narrowed the problem down to a
> specific zookeeper issue.
>
> The HBaseStorage map task only works when it lands on a machine that
> actually is running zookeeper server as part of the quorum. It typically
> attempts from several different nodes in the cluster, failing repeatedly
> before it hits on a zookeeper node.
>
> Logs show the failing task attempts are trying to connect to the localhost
> machine on port 2181 to make a ZooKeeper connection (as part of the
> Load/HBaseStorage map task):
>
> ...
> > 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server /127.0.0.1:2181
> ...
> > java.net.ConnectException: Connection refused
> ...
>
> This explains why the job succeeds eventually, as we have a zookeeper
> quorum
> server running on one of our worker nodes, but not on the other 3.
> Therefore, the job fails repeatedly until it is redistributed onto the node
> with the ZK server, at which point it succeeds immediately.
>
> We therefore suspect the issue is in our ZK configuration. Our
> hbase-site.xml defines the zookeeper quorum as follows:
>
>    <property>
>      <name>hbase.zookeeper.quorum</name>
>      <value>namenode,jobtracker,slave0</value>
>    </property>
>
> Therefore, we would expect the tasks to connect to one of those hosts when
> attempting a zookeeper connection, however it appears to be attempting to
> connect to "localhost" (which is the default). It is as if the hbase
> configuration settings here are not used.
>
> Does anyone have any suggestions as to what might be the cause of this
> behaviour?
>
> Sending this to both lists although it is only Pig HBaseStorage jobs that
> suffer this problem on our cluster. HBase Java client jobs work normally.
>
> Thanks,
> Royston
>
> -----Original Message-----
> From: Subir S [mailto:subir.sasikumar@gmail.com]
> Sent: 24 April 2012 13:29
> To: user@pig.apache.org; user@hbase.apache.org
> Subject: Re: HBaseStorage not working
>
> Looping HBase group.
>
> On Tue, Apr 24, 2012 at 5:18 PM, Royston Sellman <
> royston.sellman@googlemail.com> wrote:
>
> > We still haven't cracked this but  bit more info (HBase 0.95; Pig 0.11):
> >
> > The script below runs fine in a few seconds using Pig in local mode
> > but with Pig in MR mode it sometimes works rapidly but usually takes
> > 40 minutes to an hour.
> >
> > --hbaseuploadtest.pig
> > register /opt/hbase/hbase-trunk/lib/protobuf-java-2.4.0a.jar
> > register /opt/hbase/hbase-trunk/lib/guava-r09.jar
> > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> > raw_data = LOAD '/data/sse.tbl1.HEADERLESS.csv' USING PigStorage( ','
> > ) AS (mid : chararray, hid : chararray, mf : chararray, mt : chararray,
> mind :
> > chararray, mimd : chararray, mst : chararray ); dump raw_data; STORE
> > raw_data INTO 'hbase://hbaseuploadtest' USING
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:hid info:mf
> > info:mt info:mind info:mimd info:mst);
> >
> > i.e.
> > [hadoop1@namenode hadoop-1.0.2]$ pig -x local
> > ../pig-scripts/hbaseuploadtest.pig
> > WORKS EVERY TIME!!
> > But
> > [hadoop1@namenode hadoop-1.0.2]$ pig -x mapreduce
> > ../pig-scripts/hbaseuploadtest.pig
> > Sometimes (but rarely) runs in under a minute, often takes more than
> > 40 minutes to get to 50% but then completes to 100% in seconds. The
> > dataset is very small.
> >
> > Note that the dump of raw_data works in both cases. However the STORE
> > command causes the MR job to stall and the job setup task shows the
> > following errors:
> > Task attempt_201204240854_0006_m_000002_0 failed to report status for
> > 602 seconds. Killing!
> > Task attempt_201204240854_0006_m_000002_1 failed to report status for
> > 601 seconds. Killing!
> >
> > And task log shows the following stream of errors:
> >
> > 2012-04-24 11:57:27,427 INFO org.apache.zookeeper.ZooKeeper:
> > Initiating client connection, connectString=localhost:2181
> > sessionTimeout=180000 watcher=hconnection 0x5567d7fb
> > 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server /127.0.0.1:2181
> > 2012-04-24 11:57:27,443 WARN
> > org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> > java.lang.SecurityException: Unable to locate a login configuration
> > occurred when trying to find JAAS configuration.
> > 2012-04-24 11:57:27,443 INFO
> > org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
> > SASL-authenticate because the default JAAS configuration section 'Client'
> > could not be found. If you are not using SASL, you may ignore this. On
> > the other hand, if you expected SASL to work, please fix your JAAS
> > configuration.
> > 2012-04-24 11:57:27,444 WARN org.apache.zookeeper.ClientCnxn: Session
> > 0x0 for server null, unexpected error, closing socket connection and
> > attempting reconnect
> > java.net.ConnectException: Connection refused
> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >        at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >        at
> >
> > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN
> > IO.jav
> > a:286)
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> > 2012-04-24 11:57:27,445 INFO
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier
> > of this process is 6846@slave2
> > 2012-04-24 11:57:27,551 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server /127.0.0.1:2181
> > 2012-04-24 11:57:27,552 WARN
> > org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> > java.lang.SecurityException: Unable to locate a login configuration
> > occurred when trying to find JAAS configuration.
> > 2012-04-24 11:57:27,552 INFO
> > org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
> > SASL-authenticate because the default JAAS configuration section 'Client'
> > could not be found. If you are not using SASL, you may ignore this. On
> > the other hand, if you expected SASL to work, please fix your JAAS
> > configuration.
> > 2012-04-24 11:57:27,552 WARN org.apache.zookeeper.ClientCnxn: Session
> > 0x0 for server null, unexpected error, closing socket connection and
> > attempting reconnect
> > java.net.ConnectException: Connection refused
> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >        at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >        at
> >
> > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN
> > IO.jav
> > a:286)
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> > 2012-04-24 11:57:27,553 WARN
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
> > transient ZooKeeper exception:
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
> > 2012-04-24 11:57:27,553 INFO org.apache.hadoop.hbase.util.RetryCounter:
> > Sleeping 2000ms before retry #1...
> > 2012-04-24 11:57:28,652 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server localhost/127.0.0.1:2181
> > 2012-04-24 11:57:28,653 WARN
> > org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> > java.lang.SecurityException: Unable to locate a login configuration
> > occurred when trying to find JAAS configuration.
> > 2012-04-24 11:57:28,653 INFO
> > org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
> > SASL-authenticate because the default JAAS configuration section 'Client'
> > could not be found. If you are not using SASL, you may ignore this. On
> > the other hand, if you expected SASL to work, please fix your JAAS
> > configuration.
> > 2012-04-24 11:57:28,653 WARN org.apache.zookeeper.ClientCnxn: Session
> > 0x0 for server null, unexpected error, closing socket connection and
> > attempting reconnect
> > java.net.ConnectException: Connection refused etc etc
> >
> > Any ideas? Anyone else out there successfully running Pig 0.11
> > HBaseStorage() against HBase 0.95?
> >
> > Thanks,
> > Royston
> >
> >
> >
> > -----Original Message-----
> > From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com]
> > Sent: 20 April 2012 00:03
> > To: user@pig.apache.org
> > Subject: Re: HBaseStorage not working
> >
> > Nothing significant changed in Pig trunk, so I am guessing HBase
> > changed something; you are more likely to get help from them (they
> > should at least be able to point at APIs that changed and are likely
> > to cause this sort of thing).
> >
> > You might also want to check if any of the started MR jobs have
> > anything interesting in their task logs.
> >
> > D
> >
> > On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman
> > <ro...@googlemail.com> wrote:
> > > Does HBaseStorage work with HBase 0.95?
> > >
> > >
> > >
> > > This code was working with HBase 0.92 and Pig 0.9 but fails on HBase
> > > 0.95 and Pig 0.11 (built from source):
> > >
> > >
> > >
> > > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> > >
> > > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> > >
> > >
> > >
> > >
> > >
> > > tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' )
> > > AS (
> > >
> > >      ID:chararray,
> > >
> > >      hp:chararray,
> > >
> > >      pf:chararray,
> > >
> > >      gz:chararray,
> > >
> > >      hid:chararray,
> > >
> > >      hst:chararray,
> > >
> > >      mgz:chararray,
> > >
> > >      gg:chararray,
> > >
> > >      epc:chararray );
> > >
> > >
> > >
> > > STORE tbl1 INTO 'hbase://sse.tbl1'
> > >
> > > USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp
> > > edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
> > >
> > >
> > >
> > > The job output (using either Grunt or PigServer makes no difference)
> > > shows the family:descriptors being added by HBaseStorage then starts
> > > up the MR job which (after a long pause) reports:
> > >
> > > ------------
> > >
> > > Input(s):
> > >
> > > Failed to read data from
> > > "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
> > >
> > >
> > >
> > > Output(s):
> > >
> > > Failed to produce result in "hbase://sse.tbl1"
> > >
> > >
> > >
> > >
> > >
> > > INFO mapReduceLayer.MapReduceLauncher: Failed!
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:hp
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:pf
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:gz
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:hid
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:hst
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:mgz
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:gg
> > >
> > > INFO hbase.HBaseStorage: Adding family:descriptor filters with
> > > values edrp:epc
> > >
> > > ------------
> > >
> > >
> > >
> > > The "Failed to read" is misleading I think because dump tbl1; in
> > > place of the store works fine.
> > >
> > >
> > >
> > > I get nothing in the HBase logs and nothing in the Pig log.
> > >
> > >
> > >
> > > HBase works fine from the shell and can read and write to the table.
> > > Pig works fine in and out of HDFS on CSVs.
> > >
> > >
> > >
> > > Any ideas?
> > >
> > >
> > >
> > > Royston
> > >
> > >
> > >
> >
> >
>
>

RE: HBaseStorage not working

Posted by Royston Sellman <ro...@googlemail.com>.
Hi,

We are still experiencing 40-60 minutes of task failure before our
HBaseStorage jobs run but we think we've narrowed the problem down to a
specific zookeeper issue.

The HBaseStorage map task only works when it lands on a machine that
actually is running zookeeper server as part of the quorum. It typically
attempts from several different nodes in the cluster, failing repeatedly
before it hits on a zookeeper node.

Logs show the failing task attempts are trying to connect to the localhost
machine on port 2181 to make a ZooKeeper connection (as part of the
Load/HBaseStorage map task):

...
> 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server /127.0.0.1:2181
...
> java.net.ConnectException: Connection refused
...

This explains why the job succeeds eventually, as we have a zookeeper quorum
server running on one of our worker nodes, but not on the other 3.
Therefore, the job fails repeatedly until it is redistributed onto the node
with the ZK server, at which point it succeeds immediately.

We therefore suspect the issue is in our ZK configuration. Our
hbase-site.xml defines the zookeeper quorum as follows:

    <property>
      <name>hbase.zookeeper.quorum</name>
      <value>namenode,jobtracker,slave0</value>
    </property>

Therefore, we would expect the tasks to connect to one of those hosts when
attempting a zookeeper connection, however it appears to be attempting to
connect to "localhost" (which is the default). It is as if the hbase
configuration settings here are not used.

Does anyone have any suggestions as to what might be the cause of this
behaviour?

Sending this to both lists although it is only Pig HBaseStorage jobs that
suffer this problem on our cluster. HBase Java client jobs work normally.

Thanks,
Royston

-----Original Message-----
From: Subir S [mailto:subir.sasikumar@gmail.com] 
Sent: 24 April 2012 13:29
To: user@pig.apache.org; user@hbase.apache.org
Subject: Re: HBaseStorage not working

Looping HBase group.

On Tue, Apr 24, 2012 at 5:18 PM, Royston Sellman <
royston.sellman@googlemail.com> wrote:

> We still haven't cracked this but  bit more info (HBase 0.95; Pig 0.11):
>
> The script below runs fine in a few seconds using Pig in local mode 
> but with Pig in MR mode it sometimes works rapidly but usually takes 
> 40 minutes to an hour.
>
> --hbaseuploadtest.pig
> register /opt/hbase/hbase-trunk/lib/protobuf-java-2.4.0a.jar
> register /opt/hbase/hbase-trunk/lib/guava-r09.jar
> register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> raw_data = LOAD '/data/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' 
> ) AS (mid : chararray, hid : chararray, mf : chararray, mt : chararray,
mind :
> chararray, mimd : chararray, mst : chararray ); dump raw_data; STORE 
> raw_data INTO 'hbase://hbaseuploadtest' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:hid info:mf 
> info:mt info:mind info:mimd info:mst);
>
> i.e.
> [hadoop1@namenode hadoop-1.0.2]$ pig -x local 
> ../pig-scripts/hbaseuploadtest.pig
> WORKS EVERY TIME!!
> But
> [hadoop1@namenode hadoop-1.0.2]$ pig -x mapreduce 
> ../pig-scripts/hbaseuploadtest.pig
> Sometimes (but rarely) runs in under a minute, often takes more than 
> 40 minutes to get to 50% but then completes to 100% in seconds. The 
> dataset is very small.
>
> Note that the dump of raw_data works in both cases. However the STORE 
> command causes the MR job to stall and the job setup task shows the 
> following errors:
> Task attempt_201204240854_0006_m_000002_0 failed to report status for 
> 602 seconds. Killing!
> Task attempt_201204240854_0006_m_000002_1 failed to report status for 
> 601 seconds. Killing!
>
> And task log shows the following stream of errors:
>
> 2012-04-24 11:57:27,427 INFO org.apache.zookeeper.ZooKeeper: 
> Initiating client connection, connectString=localhost:2181 
> sessionTimeout=180000 watcher=hconnection 0x5567d7fb
> 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server /127.0.0.1:2181
> 2012-04-24 11:57:27,443 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration 
> occurred when trying to find JAAS configuration.
> 2012-04-24 11:57:27,443 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not 
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On 
> the other hand, if you expected SASL to work, please fix your JAAS 
> configuration.
> 2012-04-24 11:57:27,444 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x0 for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN
> IO.jav
> a:286)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> 2012-04-24 11:57:27,445 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier 
> of this process is 6846@slave2
> 2012-04-24 11:57:27,551 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server /127.0.0.1:2181
> 2012-04-24 11:57:27,552 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration 
> occurred when trying to find JAAS configuration.
> 2012-04-24 11:57:27,552 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not 
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On 
> the other hand, if you expected SASL to work, please fix your JAAS 
> configuration.
> 2012-04-24 11:57:27,552 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x0 for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN
> IO.jav
> a:286)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> 2012-04-24 11:57:27,553 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly 
> transient ZooKeeper exception:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
> 2012-04-24 11:57:27,553 INFO org.apache.hadoop.hbase.util.RetryCounter:
> Sleeping 2000ms before retry #1...
> 2012-04-24 11:57:28,652 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server localhost/127.0.0.1:2181
> 2012-04-24 11:57:28,653 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration 
> occurred when trying to find JAAS configuration.
> 2012-04-24 11:57:28,653 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not 
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On 
> the other hand, if you expected SASL to work, please fix your JAAS 
> configuration.
> 2012-04-24 11:57:28,653 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x0 for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused etc etc
>
> Any ideas? Anyone else out there successfully running Pig 0.11
> HBaseStorage() against HBase 0.95?
>
> Thanks,
> Royston
>
>
>
> -----Original Message-----
> From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com]
> Sent: 20 April 2012 00:03
> To: user@pig.apache.org
> Subject: Re: HBaseStorage not working
>
> Nothing significant changed in Pig trunk, so I am guessing HBase 
> changed something; you are more likely to get help from them (they 
> should at least be able to point at APIs that changed and are likely 
> to cause this sort of thing).
>
> You might also want to check if any of the started MR jobs have 
> anything interesting in their task logs.
>
> D
>
> On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman 
> <ro...@googlemail.com> wrote:
> > Does HBaseStorage work with HBase 0.95?
> >
> >
> >
> > This code was working with HBase 0.92 and Pig 0.9 but fails on HBase
> > 0.95 and Pig 0.11 (built from source):
> >
> >
> >
> > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> >
> > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> >
> >
> >
> >
> >
> > tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) 
> > AS (
> >
> >      ID:chararray,
> >
> >      hp:chararray,
> >
> >      pf:chararray,
> >
> >      gz:chararray,
> >
> >      hid:chararray,
> >
> >      hst:chararray,
> >
> >      mgz:chararray,
> >
> >      gg:chararray,
> >
> >      epc:chararray );
> >
> >
> >
> > STORE tbl1 INTO 'hbase://sse.tbl1'
> >
> > USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp
> > edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
> >
> >
> >
> > The job output (using either Grunt or PigServer makes no difference) 
> > shows the family:descriptors being added by HBaseStorage then starts 
> > up the MR job which (after a long pause) reports:
> >
> > ------------
> >
> > Input(s):
> >
> > Failed to read data from
> > "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
> >
> >
> >
> > Output(s):
> >
> > Failed to produce result in "hbase://sse.tbl1"
> >
> >
> >
> >
> >
> > INFO mapReduceLayer.MapReduceLauncher: Failed!
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:hp
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:pf
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:gz
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:hid
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:hst
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:mgz
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:gg
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:epc
> >
> > ------------
> >
> >
> >
> > The "Failed to read" is misleading I think because dump tbl1; in 
> > place of the store works fine.
> >
> >
> >
> > I get nothing in the HBase logs and nothing in the Pig log.
> >
> >
> >
> > HBase works fine from the shell and can read and write to the table.
> > Pig works fine in and out of HDFS on CSVs.
> >
> >
> >
> > Any ideas?
> >
> >
> >
> > Royston
> >
> >
> >
>
>


RE: HBaseStorage not working

Posted by Royston Sellman <ro...@googlemail.com>.
Hi,

We are still experiencing 40-60 minutes of task failure before our
HBaseStorage jobs run but we think we've narrowed the problem down to a
specific zookeeper issue.

The HBaseStorage map task only works when it lands on a machine that
actually is running zookeeper server as part of the quorum. It typically
attempts from several different nodes in the cluster, failing repeatedly
before it hits on a zookeeper node.

Logs show the failing task attempts are trying to connect to the localhost
machine on port 2181 to make a ZooKeeper connection (as part of the
Load/HBaseStorage map task):

...
> 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server /127.0.0.1:2181
...
> java.net.ConnectException: Connection refused
...

This explains why the job succeeds eventually, as we have a zookeeper quorum
server running on one of our worker nodes, but not on the other 3.
Therefore, the job fails repeatedly until it is redistributed onto the node
with the ZK server, at which point it succeeds immediately.

We therefore suspect the issue is in our ZK configuration. Our
hbase-site.xml defines the zookeeper quorum as follows:

    <property>
      <name>hbase.zookeeper.quorum</name>
      <value>namenode,jobtracker,slave0</value>
    </property>

Therefore, we would expect the tasks to connect to one of those hosts when
attempting a zookeeper connection, however it appears to be attempting to
connect to "localhost" (which is the default). It is as if the hbase
configuration settings here are not used.

Does anyone have any suggestions as to what might be the cause of this
behaviour?

Sending this to both lists although it is only Pig HBaseStorage jobs that
suffer this problem on our cluster. HBase Java client jobs work normally.

Thanks,
Royston

-----Original Message-----
From: Subir S [mailto:subir.sasikumar@gmail.com] 
Sent: 24 April 2012 13:29
To: user@pig.apache.org; user@hbase.apache.org
Subject: Re: HBaseStorage not working

Looping HBase group.

On Tue, Apr 24, 2012 at 5:18 PM, Royston Sellman <
royston.sellman@googlemail.com> wrote:

> We still haven't cracked this but  bit more info (HBase 0.95; Pig 0.11):
>
> The script below runs fine in a few seconds using Pig in local mode 
> but with Pig in MR mode it sometimes works rapidly but usually takes 
> 40 minutes to an hour.
>
> --hbaseuploadtest.pig
> register /opt/hbase/hbase-trunk/lib/protobuf-java-2.4.0a.jar
> register /opt/hbase/hbase-trunk/lib/guava-r09.jar
> register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> raw_data = LOAD '/data/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' 
> ) AS (mid : chararray, hid : chararray, mf : chararray, mt : chararray,
mind :
> chararray, mimd : chararray, mst : chararray ); dump raw_data; STORE 
> raw_data INTO 'hbase://hbaseuploadtest' USING 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:hid info:mf 
> info:mt info:mind info:mimd info:mst);
>
> i.e.
> [hadoop1@namenode hadoop-1.0.2]$ pig -x local 
> ../pig-scripts/hbaseuploadtest.pig
> WORKS EVERY TIME!!
> But
> [hadoop1@namenode hadoop-1.0.2]$ pig -x mapreduce 
> ../pig-scripts/hbaseuploadtest.pig
> Sometimes (but rarely) runs in under a minute, often takes more than 
> 40 minutes to get to 50% but then completes to 100% in seconds. The 
> dataset is very small.
>
> Note that the dump of raw_data works in both cases. However the STORE 
> command causes the MR job to stall and the job setup task shows the 
> following errors:
> Task attempt_201204240854_0006_m_000002_0 failed to report status for 
> 602 seconds. Killing!
> Task attempt_201204240854_0006_m_000002_1 failed to report status for 
> 601 seconds. Killing!
>
> And task log shows the following stream of errors:
>
> 2012-04-24 11:57:27,427 INFO org.apache.zookeeper.ZooKeeper: 
> Initiating client connection, connectString=localhost:2181 
> sessionTimeout=180000 watcher=hconnection 0x5567d7fb
> 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server /127.0.0.1:2181
> 2012-04-24 11:57:27,443 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration 
> occurred when trying to find JAAS configuration.
> 2012-04-24 11:57:27,443 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not 
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On 
> the other hand, if you expected SASL to work, please fix your JAAS 
> configuration.
> 2012-04-24 11:57:27,444 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x0 for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN
> IO.jav
> a:286)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> 2012-04-24 11:57:27,445 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier 
> of this process is 6846@slave2
> 2012-04-24 11:57:27,551 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server /127.0.0.1:2181
> 2012-04-24 11:57:27,552 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration 
> occurred when trying to find JAAS configuration.
> 2012-04-24 11:57:27,552 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not 
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On 
> the other hand, if you expected SASL to work, please fix your JAAS 
> configuration.
> 2012-04-24 11:57:27,552 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x0 for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN
> IO.jav
> a:286)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> 2012-04-24 11:57:27,553 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly 
> transient ZooKeeper exception:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
> 2012-04-24 11:57:27,553 INFO org.apache.hadoop.hbase.util.RetryCounter:
> Sleeping 2000ms before retry #1...
> 2012-04-24 11:57:28,652 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server localhost/127.0.0.1:2181
> 2012-04-24 11:57:28,653 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration 
> occurred when trying to find JAAS configuration.
> 2012-04-24 11:57:28,653 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not 
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On 
> the other hand, if you expected SASL to work, please fix your JAAS 
> configuration.
> 2012-04-24 11:57:28,653 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x0 for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused etc etc
>
> Any ideas? Anyone else out there successfully running Pig 0.11
> HBaseStorage() against HBase 0.95?
>
> Thanks,
> Royston
>
>
>
> -----Original Message-----
> From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com]
> Sent: 20 April 2012 00:03
> To: user@pig.apache.org
> Subject: Re: HBaseStorage not working
>
> Nothing significant changed in Pig trunk, so I am guessing HBase 
> changed something; you are more likely to get help from them (they 
> should at least be able to point at APIs that changed and are likely 
> to cause this sort of thing).
>
> You might also want to check if any of the started MR jobs have 
> anything interesting in their task logs.
>
> D
>
> On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman 
> <ro...@googlemail.com> wrote:
> > Does HBaseStorage work with HBase 0.95?
> >
> >
> >
> > This code was working with HBase 0.92 and Pig 0.9 but fails on HBase
> > 0.95 and Pig 0.11 (built from source):
> >
> >
> >
> > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> >
> > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> >
> >
> >
> >
> >
> > tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) 
> > AS (
> >
> >      ID:chararray,
> >
> >      hp:chararray,
> >
> >      pf:chararray,
> >
> >      gz:chararray,
> >
> >      hid:chararray,
> >
> >      hst:chararray,
> >
> >      mgz:chararray,
> >
> >      gg:chararray,
> >
> >      epc:chararray );
> >
> >
> >
> > STORE tbl1 INTO 'hbase://sse.tbl1'
> >
> > USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp
> > edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
> >
> >
> >
> > The job output (using either Grunt or PigServer makes no difference) 
> > shows the family:descriptors being added by HBaseStorage then starts 
> > up the MR job which (after a long pause) reports:
> >
> > ------------
> >
> > Input(s):
> >
> > Failed to read data from
> > "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
> >
> >
> >
> > Output(s):
> >
> > Failed to produce result in "hbase://sse.tbl1"
> >
> >
> >
> >
> >
> > INFO mapReduceLayer.MapReduceLauncher: Failed!
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:hp
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:pf
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:gz
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:hid
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:hst
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:mgz
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:gg
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with 
> > values edrp:epc
> >
> > ------------
> >
> >
> >
> > The "Failed to read" is misleading I think because dump tbl1; in 
> > place of the store works fine.
> >
> >
> >
> > I get nothing in the HBase logs and nothing in the Pig log.
> >
> >
> >
> > HBase works fine from the shell and can read and write to the table.
> > Pig works fine in and out of HDFS on CSVs.
> >
> >
> >
> > Any ideas?
> >
> >
> >
> > Royston
> >
> >
> >
>
>


Re: HBaseStorage not working

Posted by Royston Sellman <ro...@googlemail.com>.
Thanks Subir, wasn't sure if I should cross-post.

Royston


On 24 Apr 2012, at 13:29, Subir S wrote:

> Looping HBase group.
> 
> On Tue, Apr 24, 2012 at 5:18 PM, Royston Sellman <
> royston.sellman@googlemail.com> wrote:
> 
>> We still haven't cracked this but  bit more info (HBase 0.95; Pig 0.11):
>> 
>> The script below runs fine in a few seconds using Pig in local mode but
>> with
>> Pig in MR mode it sometimes works rapidly but usually takes 40 minutes to
>> an
>> hour.
>> 
>> --hbaseuploadtest.pig
>> register /opt/hbase/hbase-trunk/lib/protobuf-java-2.4.0a.jar
>> register /opt/hbase/hbase-trunk/lib/guava-r09.jar
>> register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
>> register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
>> raw_data = LOAD '/data/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS
>> (mid : chararray, hid : chararray, mf : chararray, mt : chararray, mind :
>> chararray, mimd : chararray, mst : chararray );
>> dump raw_data;
>> STORE raw_data INTO 'hbase://hbaseuploadtest' USING
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:hid info:mf info:mt
>> info:mind info:mimd info:mst);
>> 
>> i.e.
>> [hadoop1@namenode hadoop-1.0.2]$ pig -x local
>> ../pig-scripts/hbaseuploadtest.pig
>> WORKS EVERY TIME!!
>> But
>> [hadoop1@namenode hadoop-1.0.2]$ pig -x mapreduce
>> ../pig-scripts/hbaseuploadtest.pig
>> Sometimes (but rarely) runs in under a minute, often takes more than 40
>> minutes to get to 50% but then completes to 100% in seconds. The dataset is
>> very small.
>> 
>> Note that the dump of raw_data works in both cases. However the STORE
>> command causes the MR job to stall and the job setup task shows the
>> following errors:
>> Task attempt_201204240854_0006_m_000002_0 failed to report status for 602
>> seconds. Killing!
>> Task attempt_201204240854_0006_m_000002_1 failed to report status for 601
>> seconds. Killing!
>> 
>> And task log shows the following stream of errors:
>> 
>> 2012-04-24 11:57:27,427 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> client connection, connectString=localhost:2181 sessionTimeout=180000
>> watcher=hconnection 0x5567d7fb
>> 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket
>> connection to server /127.0.0.1:2181
>> 2012-04-24 11:57:27,443 WARN
>> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
>> java.lang.SecurityException: Unable to locate a login configuration
>> occurred
>> when trying to find JAAS configuration.
>> 2012-04-24 11:57:27,443 INFO
>> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
>> SASL-authenticate because the default JAAS configuration section 'Client'
>> could not be found. If you are not using SASL, you may ignore this. On the
>> other hand, if you expected SASL to work, please fix your JAAS
>> configuration.
>> 2012-04-24 11:57:27,444 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
>> for server null, unexpected error, closing socket connection and attempting
>> reconnect
>> java.net.ConnectException: Connection refused
>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>       at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>       at
>> 
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.jav
>> a:286)
>>       at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
>> 2012-04-24 11:57:27,445 INFO
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
>> this process is 6846@slave2
>> 2012-04-24 11:57:27,551 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket
>> connection to server /127.0.0.1:2181
>> 2012-04-24 11:57:27,552 WARN
>> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
>> java.lang.SecurityException: Unable to locate a login configuration
>> occurred
>> when trying to find JAAS configuration.
>> 2012-04-24 11:57:27,552 INFO
>> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
>> SASL-authenticate because the default JAAS configuration section 'Client'
>> could not be found. If you are not using SASL, you may ignore this. On the
>> other hand, if you expected SASL to work, please fix your JAAS
>> configuration.
>> 2012-04-24 11:57:27,552 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
>> for server null, unexpected error, closing socket connection and attempting
>> reconnect
>> java.net.ConnectException: Connection refused
>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>       at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>       at
>> 
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.jav
>> a:286)
>>       at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
>> 2012-04-24 11:57:27,553 WARN
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
>> ZooKeeper exception:
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
>> 2012-04-24 11:57:27,553 INFO org.apache.hadoop.hbase.util.RetryCounter:
>> Sleeping 2000ms before retry #1...
>> 2012-04-24 11:57:28,652 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket
>> connection to server localhost/127.0.0.1:2181
>> 2012-04-24 11:57:28,653 WARN
>> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
>> java.lang.SecurityException: Unable to locate a login configuration
>> occurred
>> when trying to find JAAS configuration.
>> 2012-04-24 11:57:28,653 INFO
>> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
>> SASL-authenticate because the default JAAS configuration section 'Client'
>> could not be found. If you are not using SASL, you may ignore this. On the
>> other hand, if you expected SASL to work, please fix your JAAS
>> configuration.
>> 2012-04-24 11:57:28,653 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
>> for server null, unexpected error, closing socket connection and attempting
>> reconnect
>> java.net.ConnectException: Connection refused etc etc
>> 
>> Any ideas? Anyone else out there successfully running Pig 0.11
>> HBaseStorage() against HBase 0.95?
>> 
>> Thanks,
>> Royston
>> 
>> 
>> 
>> -----Original Message-----
>> From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com]
>> Sent: 20 April 2012 00:03
>> To: user@pig.apache.org
>> Subject: Re: HBaseStorage not working
>> 
>> Nothing significant changed in Pig trunk, so I am guessing HBase changed
>> something; you are more likely to get help from them (they should at least
>> be able to point at APIs that changed and are likely to cause this sort of
>> thing).
>> 
>> You might also want to check if any of the started MR jobs have anything
>> interesting in their task logs.
>> 
>> D
>> 
>> On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman
>> <ro...@googlemail.com> wrote:
>>> Does HBaseStorage work with HBase 0.95?
>>> 
>>> 
>>> 
>>> This code was working with HBase 0.92 and Pig 0.9 but fails on HBase
>>> 0.95 and Pig 0.11 (built from source):
>>> 
>>> 
>>> 
>>> register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
>>> 
>>> register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
>>> 
>>> 
>>> 
>>> 
>>> 
>>> tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS
>>> (
>>> 
>>>     ID:chararray,
>>> 
>>>     hp:chararray,
>>> 
>>>     pf:chararray,
>>> 
>>>     gz:chararray,
>>> 
>>>     hid:chararray,
>>> 
>>>     hst:chararray,
>>> 
>>>     mgz:chararray,
>>> 
>>>     gg:chararray,
>>> 
>>>     epc:chararray );
>>> 
>>> 
>>> 
>>> STORE tbl1 INTO 'hbase://sse.tbl1'
>>> 
>>> USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp
>>> edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
>>> 
>>> 
>>> 
>>> The job output (using either Grunt or PigServer makes no difference)
>>> shows the family:descriptors being added by HBaseStorage then starts
>>> up the MR job which (after a long pause) reports:
>>> 
>>> ------------
>>> 
>>> Input(s):
>>> 
>>> Failed to read data from
>>> "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
>>> 
>>> 
>>> 
>>> Output(s):
>>> 
>>> Failed to produce result in "hbase://sse.tbl1"
>>> 
>>> 
>>> 
>>> 
>>> 
>>> INFO mapReduceLayer.MapReduceLauncher: Failed!
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:hp
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:pf
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:gz
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:hid
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:hst
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:mgz
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:gg
>>> 
>>> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
>>> edrp:epc
>>> 
>>> ------------
>>> 
>>> 
>>> 
>>> The "Failed to read" is misleading I think because dump tbl1; in place
>>> of the store works fine.
>>> 
>>> 
>>> 
>>> I get nothing in the HBase logs and nothing in the Pig log.
>>> 
>>> 
>>> 
>>> HBase works fine from the shell and can read and write to the table.
>>> Pig works fine in and out of HDFS on CSVs.
>>> 
>>> 
>>> 
>>> Any ideas?
>>> 
>>> 
>>> 
>>> Royston
>>> 
>>> 
>>> 
>> 
>> 


Re: HBaseStorage not working

Posted by Subir S <su...@gmail.com>.
Looping HBase group.

On Tue, Apr 24, 2012 at 5:18 PM, Royston Sellman <
royston.sellman@googlemail.com> wrote:

> We still haven't cracked this but  bit more info (HBase 0.95; Pig 0.11):
>
> The script below runs fine in a few seconds using Pig in local mode but
> with
> Pig in MR mode it sometimes works rapidly but usually takes 40 minutes to
> an
> hour.
>
> --hbaseuploadtest.pig
> register /opt/hbase/hbase-trunk/lib/protobuf-java-2.4.0a.jar
> register /opt/hbase/hbase-trunk/lib/guava-r09.jar
> register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> raw_data = LOAD '/data/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS
> (mid : chararray, hid : chararray, mf : chararray, mt : chararray, mind :
> chararray, mimd : chararray, mst : chararray );
> dump raw_data;
> STORE raw_data INTO 'hbase://hbaseuploadtest' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:hid info:mf info:mt
> info:mind info:mimd info:mst);
>
> i.e.
> [hadoop1@namenode hadoop-1.0.2]$ pig -x local
> ../pig-scripts/hbaseuploadtest.pig
> WORKS EVERY TIME!!
> But
> [hadoop1@namenode hadoop-1.0.2]$ pig -x mapreduce
> ../pig-scripts/hbaseuploadtest.pig
> Sometimes (but rarely) runs in under a minute, often takes more than 40
> minutes to get to 50% but then completes to 100% in seconds. The dataset is
> very small.
>
> Note that the dump of raw_data works in both cases. However the STORE
> command causes the MR job to stall and the job setup task shows the
> following errors:
> Task attempt_201204240854_0006_m_000002_0 failed to report status for 602
> seconds. Killing!
> Task attempt_201204240854_0006_m_000002_1 failed to report status for 601
> seconds. Killing!
>
> And task log shows the following stream of errors:
>
> 2012-04-24 11:57:27,427 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=localhost:2181 sessionTimeout=180000
> watcher=hconnection 0x5567d7fb
> 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server /127.0.0.1:2181
> 2012-04-24 11:57:27,443 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration
> occurred
> when trying to find JAAS configuration.
> 2012-04-24 11:57:27,443 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On the
> other hand, if you expected SASL to work, please fix your JAAS
> configuration.
> 2012-04-24 11:57:27,444 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.jav
> a:286)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> 2012-04-24 11:57:27,445 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
> this process is 6846@slave2
> 2012-04-24 11:57:27,551 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server /127.0.0.1:2181
> 2012-04-24 11:57:27,552 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration
> occurred
> when trying to find JAAS configuration.
> 2012-04-24 11:57:27,552 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On the
> other hand, if you expected SASL to work, please fix your JAAS
> configuration.
> 2012-04-24 11:57:27,552 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.jav
> a:286)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> 2012-04-24 11:57:27,553 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
> ZooKeeper exception:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
> 2012-04-24 11:57:27,553 INFO org.apache.hadoop.hbase.util.RetryCounter:
> Sleeping 2000ms before retry #1...
> 2012-04-24 11:57:28,652 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server localhost/127.0.0.1:2181
> 2012-04-24 11:57:28,653 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration
> occurred
> when trying to find JAAS configuration.
> 2012-04-24 11:57:28,653 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On the
> other hand, if you expected SASL to work, please fix your JAAS
> configuration.
> 2012-04-24 11:57:28,653 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused etc etc
>
> Any ideas? Anyone else out there successfully running Pig 0.11
> HBaseStorage() against HBase 0.95?
>
> Thanks,
> Royston
>
>
>
> -----Original Message-----
> From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com]
> Sent: 20 April 2012 00:03
> To: user@pig.apache.org
> Subject: Re: HBaseStorage not working
>
> Nothing significant changed in Pig trunk, so I am guessing HBase changed
> something; you are more likely to get help from them (they should at least
> be able to point at APIs that changed and are likely to cause this sort of
> thing).
>
> You might also want to check if any of the started MR jobs have anything
> interesting in their task logs.
>
> D
>
> On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman
> <ro...@googlemail.com> wrote:
> > Does HBaseStorage work with HBase 0.95?
> >
> >
> >
> > This code was working with HBase 0.92 and Pig 0.9 but fails on HBase
> > 0.95 and Pig 0.11 (built from source):
> >
> >
> >
> > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> >
> > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> >
> >
> >
> >
> >
> > tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS
> > (
> >
> >      ID:chararray,
> >
> >      hp:chararray,
> >
> >      pf:chararray,
> >
> >      gz:chararray,
> >
> >      hid:chararray,
> >
> >      hst:chararray,
> >
> >      mgz:chararray,
> >
> >      gg:chararray,
> >
> >      epc:chararray );
> >
> >
> >
> > STORE tbl1 INTO 'hbase://sse.tbl1'
> >
> > USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp
> > edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
> >
> >
> >
> > The job output (using either Grunt or PigServer makes no difference)
> > shows the family:descriptors being added by HBaseStorage then starts
> > up the MR job which (after a long pause) reports:
> >
> > ------------
> >
> > Input(s):
> >
> > Failed to read data from
> > "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
> >
> >
> >
> > Output(s):
> >
> > Failed to produce result in "hbase://sse.tbl1"
> >
> >
> >
> >
> >
> > INFO mapReduceLayer.MapReduceLauncher: Failed!
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:hp
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:pf
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:gz
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:hid
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:hst
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:mgz
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:gg
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:epc
> >
> > ------------
> >
> >
> >
> > The "Failed to read" is misleading I think because dump tbl1; in place
> > of the store works fine.
> >
> >
> >
> > I get nothing in the HBase logs and nothing in the Pig log.
> >
> >
> >
> > HBase works fine from the shell and can read and write to the table.
> > Pig works fine in and out of HDFS on CSVs.
> >
> >
> >
> > Any ideas?
> >
> >
> >
> > Royston
> >
> >
> >
>
>

Re: HBaseStorage not working

Posted by Subir S <su...@gmail.com>.
Looping HBase group.

On Tue, Apr 24, 2012 at 5:18 PM, Royston Sellman <
royston.sellman@googlemail.com> wrote:

> We still haven't cracked this but  bit more info (HBase 0.95; Pig 0.11):
>
> The script below runs fine in a few seconds using Pig in local mode but
> with
> Pig in MR mode it sometimes works rapidly but usually takes 40 minutes to
> an
> hour.
>
> --hbaseuploadtest.pig
> register /opt/hbase/hbase-trunk/lib/protobuf-java-2.4.0a.jar
> register /opt/hbase/hbase-trunk/lib/guava-r09.jar
> register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> raw_data = LOAD '/data/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS
> (mid : chararray, hid : chararray, mf : chararray, mt : chararray, mind :
> chararray, mimd : chararray, mst : chararray );
> dump raw_data;
> STORE raw_data INTO 'hbase://hbaseuploadtest' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:hid info:mf info:mt
> info:mind info:mimd info:mst);
>
> i.e.
> [hadoop1@namenode hadoop-1.0.2]$ pig -x local
> ../pig-scripts/hbaseuploadtest.pig
> WORKS EVERY TIME!!
> But
> [hadoop1@namenode hadoop-1.0.2]$ pig -x mapreduce
> ../pig-scripts/hbaseuploadtest.pig
> Sometimes (but rarely) runs in under a minute, often takes more than 40
> minutes to get to 50% but then completes to 100% in seconds. The dataset is
> very small.
>
> Note that the dump of raw_data works in both cases. However the STORE
> command causes the MR job to stall and the job setup task shows the
> following errors:
> Task attempt_201204240854_0006_m_000002_0 failed to report status for 602
> seconds. Killing!
> Task attempt_201204240854_0006_m_000002_1 failed to report status for 601
> seconds. Killing!
>
> And task log shows the following stream of errors:
>
> 2012-04-24 11:57:27,427 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=localhost:2181 sessionTimeout=180000
> watcher=hconnection 0x5567d7fb
> 2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server /127.0.0.1:2181
> 2012-04-24 11:57:27,443 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration
> occurred
> when trying to find JAAS configuration.
> 2012-04-24 11:57:27,443 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On the
> other hand, if you expected SASL to work, please fix your JAAS
> configuration.
> 2012-04-24 11:57:27,444 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.jav
> a:286)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> 2012-04-24 11:57:27,445 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
> this process is 6846@slave2
> 2012-04-24 11:57:27,551 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server /127.0.0.1:2181
> 2012-04-24 11:57:27,552 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration
> occurred
> when trying to find JAAS configuration.
> 2012-04-24 11:57:27,552 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On the
> other hand, if you expected SASL to work, please fix your JAAS
> configuration.
> 2012-04-24 11:57:27,552 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
>
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.jav
> a:286)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
> 2012-04-24 11:57:27,553 WARN
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
> ZooKeeper exception:
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
> 2012-04-24 11:57:27,553 INFO org.apache.hadoop.hbase.util.RetryCounter:
> Sleeping 2000ms before retry #1...
> 2012-04-24 11:57:28,652 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> connection to server localhost/127.0.0.1:2181
> 2012-04-24 11:57:28,653 WARN
> org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
> java.lang.SecurityException: Unable to locate a login configuration
> occurred
> when trying to find JAAS configuration.
> 2012-04-24 11:57:28,653 INFO
> org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
> SASL-authenticate because the default JAAS configuration section 'Client'
> could not be found. If you are not using SASL, you may ignore this. On the
> other hand, if you expected SASL to work, please fix your JAAS
> configuration.
> 2012-04-24 11:57:28,653 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused etc etc
>
> Any ideas? Anyone else out there successfully running Pig 0.11
> HBaseStorage() against HBase 0.95?
>
> Thanks,
> Royston
>
>
>
> -----Original Message-----
> From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com]
> Sent: 20 April 2012 00:03
> To: user@pig.apache.org
> Subject: Re: HBaseStorage not working
>
> Nothing significant changed in Pig trunk, so I am guessing HBase changed
> something; you are more likely to get help from them (they should at least
> be able to point at APIs that changed and are likely to cause this sort of
> thing).
>
> You might also want to check if any of the started MR jobs have anything
> interesting in their task logs.
>
> D
>
> On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman
> <ro...@googlemail.com> wrote:
> > Does HBaseStorage work with HBase 0.95?
> >
> >
> >
> > This code was working with HBase 0.92 and Pig 0.9 but fails on HBase
> > 0.95 and Pig 0.11 (built from source):
> >
> >
> >
> > register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
> >
> > register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
> >
> >
> >
> >
> >
> > tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS
> > (
> >
> >      ID:chararray,
> >
> >      hp:chararray,
> >
> >      pf:chararray,
> >
> >      gz:chararray,
> >
> >      hid:chararray,
> >
> >      hst:chararray,
> >
> >      mgz:chararray,
> >
> >      gg:chararray,
> >
> >      epc:chararray );
> >
> >
> >
> > STORE tbl1 INTO 'hbase://sse.tbl1'
> >
> > USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp
> > edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
> >
> >
> >
> > The job output (using either Grunt or PigServer makes no difference)
> > shows the family:descriptors being added by HBaseStorage then starts
> > up the MR job which (after a long pause) reports:
> >
> > ------------
> >
> > Input(s):
> >
> > Failed to read data from
> > "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
> >
> >
> >
> > Output(s):
> >
> > Failed to produce result in "hbase://sse.tbl1"
> >
> >
> >
> >
> >
> > INFO mapReduceLayer.MapReduceLauncher: Failed!
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:hp
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:pf
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:gz
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:hid
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:hst
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:mgz
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:gg
> >
> > INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> > edrp:epc
> >
> > ------------
> >
> >
> >
> > The "Failed to read" is misleading I think because dump tbl1; in place
> > of the store works fine.
> >
> >
> >
> > I get nothing in the HBase logs and nothing in the Pig log.
> >
> >
> >
> > HBase works fine from the shell and can read and write to the table.
> > Pig works fine in and out of HDFS on CSVs.
> >
> >
> >
> > Any ideas?
> >
> >
> >
> > Royston
> >
> >
> >
>
>

RE: HBaseStorage not working

Posted by Royston Sellman <ro...@googlemail.com>.
We still haven't cracked this but  bit more info (HBase 0.95; Pig 0.11):

The script below runs fine in a few seconds using Pig in local mode but with
Pig in MR mode it sometimes works rapidly but usually takes 40 minutes to an
hour.

--hbaseuploadtest.pig
register /opt/hbase/hbase-trunk/lib/protobuf-java-2.4.0a.jar
register /opt/hbase/hbase-trunk/lib/guava-r09.jar
register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
raw_data = LOAD '/data/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS
(mid : chararray, hid : chararray, mf : chararray, mt : chararray, mind :
chararray, mimd : chararray, mst : chararray );
dump raw_data; 
STORE raw_data INTO 'hbase://hbaseuploadtest' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage ('info:hid info:mf info:mt
info:mind info:mimd info:mst); 

i.e.
[hadoop1@namenode hadoop-1.0.2]$ pig -x local
../pig-scripts/hbaseuploadtest.pig 
WORKS EVERY TIME!!
But
[hadoop1@namenode hadoop-1.0.2]$ pig -x mapreduce
../pig-scripts/hbaseuploadtest.pig
Sometimes (but rarely) runs in under a minute, often takes more than 40
minutes to get to 50% but then completes to 100% in seconds. The dataset is
very small.

Note that the dump of raw_data works in both cases. However the STORE
command causes the MR job to stall and the job setup task shows the
following errors:
Task attempt_201204240854_0006_m_000002_0 failed to report status for 602
seconds. Killing!
Task attempt_201204240854_0006_m_000002_1 failed to report status for 601
seconds. Killing! 

And task log shows the following stream of errors:

2012-04-24 11:57:27,427 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=localhost:2181 sessionTimeout=180000
watcher=hconnection 0x5567d7fb
2012-04-24 11:57:27,441 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server /127.0.0.1:2181
2012-04-24 11:57:27,443 WARN
org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
java.lang.SecurityException: Unable to locate a login configuration occurred
when trying to find JAAS configuration.
2012-04-24 11:57:27,443 INFO
org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
SASL-authenticate because the default JAAS configuration section 'Client'
could not be found. If you are not using SASL, you may ignore this. On the
other hand, if you expected SASL to work, please fix your JAAS
configuration.
2012-04-24 11:57:27,444 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
for server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
	at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.jav
a:286)
	at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
2012-04-24 11:57:27,445 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
this process is 6846@slave2
2012-04-24 11:57:27,551 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server /127.0.0.1:2181
2012-04-24 11:57:27,552 WARN
org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
java.lang.SecurityException: Unable to locate a login configuration occurred
when trying to find JAAS configuration.
2012-04-24 11:57:27,552 INFO
org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
SASL-authenticate because the default JAAS configuration section 'Client'
could not be found. If you are not using SASL, you may ignore this. On the
other hand, if you expected SASL to work, please fix your JAAS
configuration.
2012-04-24 11:57:27,552 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
for server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
	at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.jav
a:286)
	at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
2012-04-24 11:57:27,553 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient
ZooKeeper exception:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2012-04-24 11:57:27,553 INFO org.apache.hadoop.hbase.util.RetryCounter:
Sleeping 2000ms before retry #1...
2012-04-24 11:57:28,652 INFO org.apache.zookeeper.ClientCnxn: Opening socket
connection to server localhost/127.0.0.1:2181
2012-04-24 11:57:28,653 WARN
org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException:
java.lang.SecurityException: Unable to locate a login configuration occurred
when trying to find JAAS configuration.
2012-04-24 11:57:28,653 INFO
org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not
SASL-authenticate because the default JAAS configuration section 'Client'
could not be found. If you are not using SASL, you may ignore this. On the
other hand, if you expected SASL to work, please fix your JAAS
configuration.
2012-04-24 11:57:28,653 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
for server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused etc etc

Any ideas? Anyone else out there successfully running Pig 0.11
HBaseStorage() against HBase 0.95?

Thanks,
Royston



-----Original Message-----
From: Dmitriy Ryaboy [mailto:dvryaboy@gmail.com] 
Sent: 20 April 2012 00:03
To: user@pig.apache.org
Subject: Re: HBaseStorage not working

Nothing significant changed in Pig trunk, so I am guessing HBase changed
something; you are more likely to get help from them (they should at least
be able to point at APIs that changed and are likely to cause this sort of
thing).

You might also want to check if any of the started MR jobs have anything
interesting in their task logs.

D

On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman
<ro...@googlemail.com> wrote:
> Does HBaseStorage work with HBase 0.95?
>
>
>
> This code was working with HBase 0.92 and Pig 0.9 but fails on HBase 
> 0.95 and Pig 0.11 (built from source):
>
>
>
> register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
>
> register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
>
>
>
>
>
> tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS 
> (
>
>      ID:chararray,
>
>      hp:chararray,
>
>      pf:chararray,
>
>      gz:chararray,
>
>      hid:chararray,
>
>      hst:chararray,
>
>      mgz:chararray,
>
>      gg:chararray,
>
>      epc:chararray );
>
>
>
> STORE tbl1 INTO 'hbase://sse.tbl1'
>
> USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp 
> edrp:pf edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
>
>
>
> The job output (using either Grunt or PigServer makes no difference) 
> shows the family:descriptors being added by HBaseStorage then starts 
> up the MR job which (after a long pause) reports:
>
> ------------
>
> Input(s):
>
> Failed to read data from
> "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
>
>
>
> Output(s):
>
> Failed to produce result in "hbase://sse.tbl1"
>
>
>
>
>
> INFO mapReduceLayer.MapReduceLauncher: Failed!
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:hp
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:pf
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:gz
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:hid
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:hst
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:mgz
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:gg
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values 
> edrp:epc
>
> ------------
>
>
>
> The "Failed to read" is misleading I think because dump tbl1; in place 
> of the store works fine.
>
>
>
> I get nothing in the HBase logs and nothing in the Pig log.
>
>
>
> HBase works fine from the shell and can read and write to the table. 
> Pig works fine in and out of HDFS on CSVs.
>
>
>
> Any ideas?
>
>
>
> Royston
>
>
>


Re: HBaseStorage not working

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Nothing significant changed in Pig trunk, so I am guessing HBase
changed something; you are more likely to get help from them (they
should at least be able to point at APIs that changed and are likely
to cause this sort of thing).

You might also want to check if any of the started MR jobs have
anything interesting in their task logs.

D

On Thu, Apr 19, 2012 at 1:41 PM, Royston Sellman
<ro...@googlemail.com> wrote:
> Does HBaseStorage work with HBase 0.95?
>
>
>
> This code was working with HBase 0.92 and Pig 0.9 but fails on HBase 0.95
> and Pig 0.11 (built from source):
>
>
>
> register /opt/hbase/hbase-trunk/hbase-0.95-SNAPSHOT.jar
>
> register /opt/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.jar
>
>
>
>
>
> tbl1 = LOAD 'input/sse.tbl1.HEADERLESS.csv' USING PigStorage( ',' ) AS (
>
>      ID:chararray,
>
>      hp:chararray,
>
>      pf:chararray,
>
>      gz:chararray,
>
>      hid:chararray,
>
>      hst:chararray,
>
>      mgz:chararray,
>
>      gg:chararray,
>
>      epc:chararray );
>
>
>
> STORE tbl1 INTO 'hbase://sse.tbl1'
>
> USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('edrp:hp edrp:pf
> edrp:gz edrp:hid edrp:hst edrp:mgz edrp:gg edrp:epc');
>
>
>
> The job output (using either Grunt or PigServer makes no difference) shows
> the family:descriptors being added by HBaseStorage then starts up the MR job
> which (after a long pause) reports:
>
> ------------
>
> Input(s):
>
> Failed to read data from
> "hdfs://namenode:8020/user/hadoop1/input/sse.tbl1.HEADERLESS.csv"
>
>
>
> Output(s):
>
> Failed to produce result in "hbase://sse.tbl1"
>
>
>
>
>
> INFO mapReduceLayer.MapReduceLauncher: Failed!
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:hp
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:pf
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:gz
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:hid
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:hst
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:mgz
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:gg
>
> INFO hbase.HBaseStorage: Adding family:descriptor filters with values
> edrp:epc
>
> ------------
>
>
>
> The "Failed to read" is misleading I think because dump tbl1; in place of
> the store works fine.
>
>
>
> I get nothing in the HBase logs and nothing in the Pig log.
>
>
>
> HBase works fine from the shell and can read and write to the table. Pig
> works fine in and out of HDFS on CSVs.
>
>
>
> Any ideas?
>
>
>
> Royston
>
>
>