You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jameson Lopp <ja...@bronto.com> on 2011/05/25 23:03:32 UTC

pig / hadoop / hbase compatibility (sigh)

Our production environment has undergone software upgrades and now I'm working with:

	Hadoop 0.20.2-cdh3u0
	Apache Pig version 0.8.0-cdh3u0
	HBase 0.90.1-cdh3u0

My research indicates that these all OUGHT to play together nicely... I would kill for someone to 
publish a compatibility grid for the misc versions.

Anyway, I'm trying to load from HBase :

visitors = LOAD 'hbase://track' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser 
open:ip open:os open:createdDate', '-caching 1000')
                                                 as (browser:chararray, ipAddress:chararray, 
os:chararray, createdDate:chararray);

And I'm receiving the following error, which searching around seems to be indicative of 
compatibility issues between pig and hadoop:

ERROR 2999: Unexpected internal error. Failed to create DataStorage

java.lang.RuntimeException: Failed to create DataStorage
	at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
	at org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:196)
	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:116)
	at org.apache.pig.impl.PigContext.connect(PigContext.java:184)
	at org.apache.pig.PigServer.<init>(PigServer.java:243)
	at org.apache.pig.PigServer.<init>(PigServer.java:228)
	at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:46)
	at org.apache.pig.Main.run(Main.java:545)
	at org.apache.pig.Main.main(Main.java:108)
Caused by: java.io.IOException: Call to hadoop001/10.0.0.51:8020 failed on local exception: 
java.io.EOFException
	at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
	at org.apache.hadoop.ipc.Client.call(Client.java:743)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
	at $Proxy0.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
	at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
	at org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
	... 9 more
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:375)
	at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

Am I actually running incompatible versions? Should I bug the Cloudera folks?
-- 
Jameson Lopp
Software Engineer
Bronto Software, Inc.

Re: pig / hadoop / hbase compatibility (sigh)

Posted by Jameson Lopp <ja...@bronto.com>.
Just following up - the root cause of the problem seems to have been remnants of old hadoop / hbase 
versions on the machine. Once I got past the DataStorage error, Pig started throwing 
"java.lang.RuntimeException: hbase-default.xml file seems to be for and old version of HBase (null), 
this version is 0.90.1-cdh3u0" because it was loading that config file that is no longer used.

In summary, a machine with a clean install of:
          Hadoop 0.20.2-cdh3u0
          Apache Pig version 0.8.0-cdh3u0
          HBase 0.90.1-cdh3u0

runs just fine without any crazy workarounds needed. I no longer have to manually register jars in 
the pig script or turn split combination off in order for hbase loading to work!
--
Jameson Lopp
Software Engineer
Bronto Software, Inc.

On 05/25/2011 06:29 PM, Jonathan Coveney wrote:
> I wasn't trying to use HBase, but I have had the same problem. To get around
> it, I had to create a pig-nohadoop.jar, pass in the hadoop*.jar in the
> classpath, and register antlr in pig. I think it is a pig/hadoop
> compatibility error because I got the same error, but just to be sure, can
> you run normal hadoop jobs that do not use HBase, just to isolate variables?
>
> 2011/5/25 Dmitriy Ryaboy<dv...@gmail.com>
>
>> Use Pig 0.8.1
>>
>> D
>>
>> On Wed, May 25, 2011 at 2:03 PM, Jameson Lopp<ja...@bronto.com>  wrote:
>>> Our production environment has undergone software upgrades and now I'm
>>> working with:
>>>
>>>         Hadoop 0.20.2-cdh3u0
>>>         Apache Pig version 0.8.0-cdh3u0
>>>         HBase 0.90.1-cdh3u0
>>>
>>> My research indicates that these all OUGHT to play together nicely... I
>>> would kill for someone to publish a compatibility grid for the misc
>>> versions.
>>>
>>> Anyway, I'm trying to load from HBase :
>>>
>>> visitors = LOAD 'hbase://track' USING
>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
>>> open:os open:createdDate', '-caching 1000')
>>>                                                 as (browser:chararray,
>>> ipAddress:chararray, os:chararray, createdDate:chararray);
>>>
>>> And I'm receiving the following error, which searching around seems to be
>>> indicative of compatibility issues between pig and hadoop:
>>>
>>> ERROR 2999: Unexpected internal error. Failed to create DataStorage
>>>
>>> java.lang.RuntimeException: Failed to create DataStorage
>>>         at
>>>
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
>>>         at
>>>
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
>>>         at
>>>
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:196)
>>>         at
>>>
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:116)
>>>         at org.apache.pig.impl.PigContext.connect(PigContext.java:184)
>>>         at org.apache.pig.PigServer.<init>(PigServer.java:243)
>>>         at org.apache.pig.PigServer.<init>(PigServer.java:228)
>>>         at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:46)
>>>         at org.apache.pig.Main.run(Main.java:545)
>>>         at org.apache.pig.Main.main(Main.java:108)
>>> Caused by: java.io.IOException: Call to hadoop001/10.0.0.51:8020 failed
>> on
>>> local exception: java.io.EOFException
>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>>         at $Proxy0.getProtocolVersion(Unknown Source)
>>>         at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>>         at
>>> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
>>>         at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
>>>         at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
>>>         at
>>>
>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
>>>         at
>>> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>>>         at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>>>         at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>>>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>>>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
>>>         at
>>>
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
>>>         ... 9 more
>>> Caused by: java.io.EOFException
>>>         at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>         at
>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>>>
>>> Am I actually running incompatible versions? Should I bug the Cloudera
>>> folks?
>>> --
>>> Jameson Lopp
>>> Software Engineer
>>> Bronto Software, Inc.
>>>
>>
>

Re: pig / hadoop / hbase compatibility (sigh)

Posted by Jonathan Coveney <jc...@gmail.com>.
I wasn't trying to use HBase, but I have had the same problem. To get around
it, I had to create a pig-nohadoop.jar, pass in the hadoop*.jar in the
classpath, and register antlr in pig. I think it is a pig/hadoop
compatibility error because I got the same error, but just to be sure, can
you run normal hadoop jobs that do not use HBase, just to isolate variables?

2011/5/25 Dmitriy Ryaboy <dv...@gmail.com>

> Use Pig 0.8.1
>
> D
>
> On Wed, May 25, 2011 at 2:03 PM, Jameson Lopp <ja...@bronto.com> wrote:
> > Our production environment has undergone software upgrades and now I'm
> > working with:
> >
> >        Hadoop 0.20.2-cdh3u0
> >        Apache Pig version 0.8.0-cdh3u0
> >        HBase 0.90.1-cdh3u0
> >
> > My research indicates that these all OUGHT to play together nicely... I
> > would kill for someone to publish a compatibility grid for the misc
> > versions.
> >
> > Anyway, I'm trying to load from HBase :
> >
> > visitors = LOAD 'hbase://track' USING
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
> > open:os open:createdDate', '-caching 1000')
> >                                                as (browser:chararray,
> > ipAddress:chararray, os:chararray, createdDate:chararray);
> >
> > And I'm receiving the following error, which searching around seems to be
> > indicative of compatibility issues between pig and hadoop:
> >
> > ERROR 2999: Unexpected internal error. Failed to create DataStorage
> >
> > java.lang.RuntimeException: Failed to create DataStorage
> >        at
> >
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
> >        at
> >
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
> >        at
> >
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:196)
> >        at
> >
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:116)
> >        at org.apache.pig.impl.PigContext.connect(PigContext.java:184)
> >        at org.apache.pig.PigServer.<init>(PigServer.java:243)
> >        at org.apache.pig.PigServer.<init>(PigServer.java:228)
> >        at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:46)
> >        at org.apache.pig.Main.run(Main.java:545)
> >        at org.apache.pig.Main.main(Main.java:108)
> > Caused by: java.io.IOException: Call to hadoop001/10.0.0.51:8020 failed
> on
> > local exception: java.io.EOFException
> >        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
> >        at org.apache.hadoop.ipc.Client.call(Client.java:743)
> >        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >        at $Proxy0.getProtocolVersion(Unknown Source)
> >        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
> >        at
> > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
> >        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
> >        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
> >        at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
> >        at
> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
> >        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> >        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
> >        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
> >        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
> >        at
> >
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
> >        ... 9 more
> > Caused by: java.io.EOFException
> >        at java.io.DataInputStream.readInt(DataInputStream.java:375)
> >        at
> > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
> >        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
> >
> > Am I actually running incompatible versions? Should I bug the Cloudera
> > folks?
> > --
> > Jameson Lopp
> > Software Engineer
> > Bronto Software, Inc.
> >
>

Re: pig / hadoop / hbase compatibility (sigh)

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Use Pig 0.8.1

D

On Wed, May 25, 2011 at 2:03 PM, Jameson Lopp <ja...@bronto.com> wrote:
> Our production environment has undergone software upgrades and now I'm
> working with:
>
>        Hadoop 0.20.2-cdh3u0
>        Apache Pig version 0.8.0-cdh3u0
>        HBase 0.90.1-cdh3u0
>
> My research indicates that these all OUGHT to play together nicely... I
> would kill for someone to publish a compatibility grid for the misc
> versions.
>
> Anyway, I'm trying to load from HBase :
>
> visitors = LOAD 'hbase://track' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('open:browser open:ip
> open:os open:createdDate', '-caching 1000')
>                                                as (browser:chararray,
> ipAddress:chararray, os:chararray, createdDate:chararray);
>
> And I'm receiving the following error, which searching around seems to be
> indicative of compatibility issues between pig and hadoop:
>
> ERROR 2999: Unexpected internal error. Failed to create DataStorage
>
> java.lang.RuntimeException: Failed to create DataStorage
>        at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
>        at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
>        at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:196)
>        at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:116)
>        at org.apache.pig.impl.PigContext.connect(PigContext.java:184)
>        at org.apache.pig.PigServer.<init>(PigServer.java:243)
>        at org.apache.pig.PigServer.<init>(PigServer.java:228)
>        at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:46)
>        at org.apache.pig.Main.run(Main.java:545)
>        at org.apache.pig.Main.main(Main.java:108)
> Caused by: java.io.IOException: Call to hadoop001/10.0.0.51:8020 failed on
> local exception: java.io.EOFException
>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>        at $Proxy0.getProtocolVersion(Unknown Source)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>        at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
>        at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
>        at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
>        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
>        at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
>        ... 9 more
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>        at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>
> Am I actually running incompatible versions? Should I bug the Cloudera
> folks?
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc.
>