You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Landon Cox <la...@360vl.com> on 2010/03/04 22:03:12 UTC

pig connecting to hadoop - Failed to create DataStorage

I just installed hadoop and pig yesterday on an ubuntu Jaunty box.

Hadoop 0.18.3-6cloudera0.3.0
Apache Pig version 0.6.0 (r910629)

I have hadoop services running and can copy files to the hdfs of  
hadoop and ran the test for computing PI.

The problem I'm having is getting pig to recognize my hadoop cluster.   
when I start pig, it always starts in local mode:

"Connecting to hadoop file system at: file:///"

I have the following environment variables set in my .bashrc:

export JAVA_HOME=/usr/lib/jvm/java-6-sun
export HADOOPSITEPATH=/etc/hadoop/conf/hadoop-site.xml

Once I added these as well:

export PIG_CLASSPATH=/etc/hadoop/conf
export PIGDIR=/home/lcox/pig-0.6.0
export HADOOP_HOME=/usr/lib/hadoop

It attempted to connect to the cluster but failed immediately:

pig hadoop version: 20
2010-03-04 13:58:38,957 [main] INFO  org.apache.pig.Main - Logging  
error messages to: /home/lcox/pig-0.6.0/bin/pig_1267736318956.log
2010-03-04 13:58:39,060 [main] WARN   
org.apache.hadoop.conf.Configuration - DEPRECATED: hadoop-site.xml  
found in the classpath. Usage of hadoop-site.xml is deprecated.  
Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to  
override properties of core-default.xml, mapred-default.xml and hdfs- 
default.xml respectively
2010-03-04 13:58:39,291 [main] INFO   
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
Connecting to hadoop file system at: hdfs://localhost:8020
2010-03-04 13:58:39,554 [main] ERROR org.apache.pig.Main - ERROR 2999:  
Unexpected internal error. Failed to create DataStorage
Details at logfile: /home/lcox/pig-0.6.0/bin/pig_1267736318956.log

I had dumped out the pig hadoop version environ variable - as you can  
see it is 20 by default with this version of pig.

I searched and found others who had played with the version variable  
and downgraded it (say from 18 to 17, etc.) and got theirs to work.   
But I've tried 17,18,19,20 - all fail in the same way "Failed to  
create DataStorage"

Example:

lcox@hurricane:~/pig-0.6.0/bin$ ./pig
pig hadoop version: 18
2010-03-04 13:54:33,488 [main] INFO  org.apache.pig.Main - Logging  
error messages to: /home/lcox/pig-0.6.0/bin/pig_1267736073487.log
2010-03-04 13:54:33,591 [main] WARN   
org.apache.hadoop.conf.Configuration - DEPRECATED: hadoop-site.xml  
found in the classpath. Usage of hadoop-site.xml is deprecated.  
Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to  
override properties of core-default.xml, mapred-default.xml and hdfs- 
default.xml respectively
2010-03-04 13:54:33,838 [main] INFO   
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -  
Connecting to hadoop file system at: hdfs://localhost:8020
2010-03-04 13:54:34,105 [main] ERROR org.apache.pig.Main - ERROR 2999:  
Unexpected internal error. Failed to create DataStorage
Details at logfile: /home/lcox/pig-0.6.0/bin/pig_1267736073487.log

If you look in the log, this is the relevant exception:

java.lang.RuntimeException: Failed to create DataStorage
	at  
org 
.apache 
.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
	at  
org 
.apache 
.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java: 
58)
	at  
org 
.apache 
.pig 
.backend 
.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:186)
	at  
org 
.apache 
.pig 
.backend 
.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
	at org.apache.pig.impl.PigContext.connect(PigContext.java:201)
	at org.apache.pig.PigServer.<init>(PigServer.java:169)
	at org.apache.pig.PigServer.<init>(PigServer.java:158)
	at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:54)
	at org.apache.pig.Main.main(Main.java:349)
Caused by: java.io.IOException: Call to localhost/127.0.0.1:8020  
failed on local exception: java.io.EOFException
	at org.apache.hadoop.ipc.Client.wrapException(Client.java:774)
	at org.apache.hadoop.ipc.Client.call(Client.java:742)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
	at $Proxy0.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
	at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java: 
105)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:208)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:169)
	at  
org 
.apache 
.hadoop 
.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java: 
1373)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1385)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
	at  
org 
.apache 
.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
	... 8 more
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:375)
	at org.apache.hadoop.ipc.Client 
$Connection.receiveResponse(Client.java:501)
	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
= 
= 
= 
= 
= 
= 
= 
= 
========================================================================

It's obviously stuck on some version mismatch, but I can't really tell  
what - as I said, I changed the PIG_HADOOP_VERSION to various versions  
from 17-20.

Anyone have any insight on how to get this version of pig (0.6.0) to  
reliably connect to the Hadoop cluster (0.18.3-6cloudera0.3.0)?

thank you for any hints or suggestions,

Landon