You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Landon Cox <la...@360vl.com> on 2010/03/04 22:03:12 UTC
pig connecting to hadoop - Failed to create DataStorage
I just installed hadoop and pig yesterday on an ubuntu Jaunty box.
Hadoop 0.18.3-6cloudera0.3.0
Apache Pig version 0.6.0 (r910629)
I have hadoop services running and can copy files to the hdfs of
hadoop and ran the test for computing PI.
The problem I'm having is getting pig to recognize my hadoop cluster.
when I start pig, it always starts in local mode:
"Connecting to hadoop file system at: file:///"
I have the following environment variables set in my .bashrc:
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export HADOOPSITEPATH=/etc/hadoop/conf/hadoop-site.xml
Once I added these as well:
export PIG_CLASSPATH=/etc/hadoop/conf
export PIGDIR=/home/lcox/pig-0.6.0
export HADOOP_HOME=/usr/lib/hadoop
It attempted to connect to the cluster but failed immediately:
pig hadoop version: 20
2010-03-04 13:58:38,957 [main] INFO org.apache.pig.Main - Logging
error messages to: /home/lcox/pig-0.6.0/bin/pig_1267736318956.log
2010-03-04 13:58:39,060 [main] WARN
org.apache.hadoop.conf.Configuration - DEPRECATED: hadoop-site.xml
found in the classpath. Usage of hadoop-site.xml is deprecated.
Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to
override properties of core-default.xml, mapred-default.xml and hdfs-
default.xml respectively
2010-03-04 13:58:39,291 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: hdfs://localhost:8020
2010-03-04 13:58:39,554 [main] ERROR org.apache.pig.Main - ERROR 2999:
Unexpected internal error. Failed to create DataStorage
Details at logfile: /home/lcox/pig-0.6.0/bin/pig_1267736318956.log
I had dumped out the pig hadoop version environ variable - as you can
see it is 20 by default with this version of pig.
I searched and found others who had played with the version variable
and downgraded it (say from 18 to 17, etc.) and got theirs to work.
But I've tried 17,18,19,20 - all fail in the same way "Failed to
create DataStorage"
Example:
lcox@hurricane:~/pig-0.6.0/bin$ ./pig
pig hadoop version: 18
2010-03-04 13:54:33,488 [main] INFO org.apache.pig.Main - Logging
error messages to: /home/lcox/pig-0.6.0/bin/pig_1267736073487.log
2010-03-04 13:54:33,591 [main] WARN
org.apache.hadoop.conf.Configuration - DEPRECATED: hadoop-site.xml
found in the classpath. Usage of hadoop-site.xml is deprecated.
Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to
override properties of core-default.xml, mapred-default.xml and hdfs-
default.xml respectively
2010-03-04 13:54:33,838 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: hdfs://localhost:8020
2010-03-04 13:54:34,105 [main] ERROR org.apache.pig.Main - ERROR 2999:
Unexpected internal error. Failed to create DataStorage
Details at logfile: /home/lcox/pig-0.6.0/bin/pig_1267736073487.log
If you look in the log, this is the relevant exception:
java.lang.RuntimeException: Failed to create DataStorage
at
org
.apache
.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
at
org
.apache
.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:
58)
at
org
.apache
.pig
.backend
.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:186)
at
org
.apache
.pig
.backend
.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:134)
at org.apache.pig.impl.PigContext.connect(PigContext.java:201)
at org.apache.pig.PigServer.<init>(PigServer.java:169)
at org.apache.pig.PigServer.<init>(PigServer.java:158)
at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:54)
at org.apache.pig.Main.main(Main.java:349)
Caused by: java.io.IOException: Call to localhost/127.0.0.1:8020
failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:774)
at org.apache.hadoop.ipc.Client.call(Client.java:742)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:
105)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:208)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:169)
at
org
.apache
.hadoop
.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:
1373)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1385)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:191)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
at
org
.apache
.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
... 8 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client
$Connection.receiveResponse(Client.java:501)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
=
=
=
=
=
=
=
=
========================================================================
It's obviously stuck on some version mismatch, but I can't really tell
what - as I said, I changed the PIG_HADOOP_VERSION to various versions
from 17-20.
Anyone have any insight on how to get this version of pig (0.6.0) to
reliably connect to the Hadoop cluster (0.18.3-6cloudera0.3.0)?
thank you for any hints or suggestions,
Landon