You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Fabio Souto <fs...@gmail.com> on 2011/04/05 15:42:18 UTC

Error reading data from Cassandra

Hello,

I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:

grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();                                                                      
grunt> dump A;   


And i'm getting the following error:
==========================================================================
2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-04-05 15:33:57,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
2011-04-05 15:33:57,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2011-04-05 15:33:57,969 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2011-04-05 15:33:57,990 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-04-05 15:34:03,376 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2011-04-05 15:34:03,416 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2011-04-05 15:34:03,929 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2011-04-05 15:34:04,597 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2011-04-05 15:34:05,942 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
2011-04-05 15:34:05,943 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
2011-04-05 15:34:35,912 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
2011-04-05 15:34:35,918 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2011-04-05 15:34:35,933 [main] INFO  org.apache.pig.tools.pigstats.PigStats - Script Statistics: 

HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
0.20.2-CDH3B4	0.8.0-SNAPSHOT	root	2011-04-05 15:33:57	2011-04-05 15:34:35	UNKNOWN

Failed!

Failed Jobs:
JobId	Alias	Feature	Message	Outputs
job_201104051459_0008	A	MAP_ONLY	Message: Job failed! Error - NA	hdfs://localhost/tmp/temp2037710644/tmp-29784200,

Input(s):
Failed to read data from "cassandra://msg_keyspace/messages"

Output(s):
Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
==========================================================================

Any idea how to fix this?
Cheers

Re: Error reading data from Cassandra

Posted by Jeremy Hanna <je...@gmail.com>.

are you running with 'pig -x local myscript.pig' or just with 'pig myscript.pig'?

On Apr 5, 2011, at 10:29 AM, Fabio Souto wrote:

> Hi,
> 
> I had a bad enviroment variable
> PIG_PARTITIONER=RandomPartitioner 
> instead of 
> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
> but I correct this and still not working. I have the same error
> 
> Just in case I have this on my ~/.bash_profile
> 
> export HADOOPDIR=/etc/hadoop-0.20/conf
> export HADOOP_CLASSPATH=/usr/cassandra/lib/*:$HADOOP_CLASSPATH
> export CLASSPATH=$HADOOPDIR:$CLASSPATH
> 
> export PIG_CONF_DIR=$HADOOPDIR
> export PIG_CLASSPATH=/etc/hadoop/conf
> export PIG_CONF_DIR=$HADOOPDIR
> 
> export PIG_INITIAL_ADDRESS=localhost
> export PIG_RPC_PORT=9160
> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
> 
> 
> BTW I'm using the pig version that comes with Cassandra, the one in cassandra/contrib/pig
> 
> Thanks for your time Jeremy! :)
> Fabio
> 
> On 05/04/2011, at 17:04, Jeremy Hanna wrote:
> 
>> Fabio,
>> 
>> It looks like you need to set your environment variables to connect to cassandra.  Check out the readme.  Quoting here:
>> Finally, set the following as environment variables (uppercase,
>> underscored), or as Hadoop configuration variables (lowercase, dotted):
>> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on 
>> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
>> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
>> 
>> So you'll probably want to do:
>> export PIG_INITIAL_ADDRESS=localhost
>> export PIG_RPC_PORT=9160
>> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>> 
>> Tante belle cose and let me know if this doesn't work,
>> 
>> Jeremy
>> 
>> On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:
>> 
>>> Hi Jeremy,
>>> 
>>> Of course, here it is:
>>> 
>>> Backend error message
>>> ---------------------
>>> java.lang.NumberFormatException: null
>>> 	at java.lang.Integer.parseInt(Integer.java:417)
>>> 	at java.lang.Integer.parseInt(Integer.java:499)
>>> 	at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
>>> 	at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
>>> 	at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:234)
>>> 
>>> Pig Stack Trace
>>> ---------------
>>> ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>> 
>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>> 	at org.apache.pig.PigServer.openIterator(PigServer.java:742)
>>> 	at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>> 	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>> 	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>> 	at org.apache.pig.Main.run(Main.java:465)
>>> 	at org.apache.pig.Main.main(Main.java:107)
>>> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
>>> 	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>>> 	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>>> 	at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>>> 	at org.apache.pig.PigServer.store(PigServer.java:816)
>>> 	at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>>> 	... 7 more
>>> ================================================================================
>>> 
>>> 
>>> Thanks for all,
>>> Fabio
>>> 
>>> 
>>> On 05/04/2011, at 16:19, Jeremy Hanna wrote:
>>> 
>>>> Fabio,
>>>> 
>>>> Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?
>>>> 
>>>> Thanks,
>>>> 
>>>> Jeremy
>>>> 
>>>> On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:
>>>>> 
>>>>> grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();                                                                      
>>>>> grunt> dump A;   
>>>>> 
>>>>> 
>>>>> And i'm getting the following error:
>>>>> ==========================================================================
>>>>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
>>>>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
>>>>> 2011-04-05 15:33:57,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
>>>>> 2011-04-05 15:33:57,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
>>>>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
>>>>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
>>>>> 2011-04-05 15:33:57,969 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
>>>>> 2011-04-05 15:33:57,990 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
>>>>> 2011-04-05 15:34:03,376 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
>>>>> 2011-04-05 15:34:03,416 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
>>>>> 2011-04-05 15:34:03,929 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
>>>>> 2011-04-05 15:34:04,597 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
>>>>> 2011-04-05 15:34:05,942 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
>>>>> 2011-04-05 15:34:05,943 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
>>>>> 2011-04-05 15:34:35,912 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
>>>>> 2011-04-05 15:34:35,918 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
>>>>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>>>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>>>>> 2011-04-05 15:34:35,933 [main] INFO  org.apache.pig.tools.pigstats.PigStats - Script Statistics: 
>>>>> 
>>>>> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
>>>>> 0.20.2-CDH3B4	0.8.0-SNAPSHOT	root	2011-04-05 15:33:57	2011-04-05 15:34:35	UNKNOWN
>>>>> 
>>>>> Failed!
>>>>> 
>>>>> Failed Jobs:
>>>>> JobId	Alias	Feature	Message	Outputs
>>>>> job_201104051459_0008	A	MAP_ONLY	Message: Job failed! Error - NA	hdfs://localhost/tmp/temp2037710644/tmp-29784200,
>>>>> 
>>>>> Input(s):
>>>>> Failed to read data from "cassandra://msg_keyspace/messages"
>>>>> 
>>>>> Output(s):
>>>>> Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
>>>>> ==========================================================================
>>>>> 
>>>>> Any idea how to fix this?
>>>>> Cheers
>>>> 
>>> 
>> 
>

Re: Error reading data from Cassandra

Posted by Jeremy Hanna <je...@gmail.com>.

Glad it's working for you!  Also, I've started a github project that might be helpful going forward.  It's called Pygmalion and is for info, scripts, and UDFs to help running Pig with Cassandra.  It only has a few resources now but I am planning on adding a couple more UDFs over the next couple of days.  Feel free to add to it as well :).

https://github.com/jeromatron/pygmalion

Jeremy

On Apr 6, 2011, at 4:15 AM, Fabio Souto wrote:

> It works. Thank you for your help Jeremy!!
> 
> Cheers
> Fabio
> 
> On 05/04/2011, at 20:08, Jeremy Hanna wrote:
> 
>> Hmmm, if it's the same error then it's not getting your PIG_RPC_PORT variable still.
>> 
>> If you're running this in <cassandra_src>/contrib/pig:
>> 'bin/pig_cassandra -x local myscript.pig'
>> then you should only need to set PIG_HOME, and the other environment variables for connecting to cassandra.
>> 
>> If you want to run it against a cluster, what I've done is had a hadoop configuration locally and point PIG_CONF to <hadoop_home>/conf and put those three variables in the mapred-site.xml like this:
>> <property>
>>   <name>cassandra.thrift.address</name>
>>   <value>123.45.67.89</value>
>> </property>
>> <property>
>>   <name>cassandra.thrift.port</name>
>>   <value>9160</value>
>> </property>
>> <property>
>>   <name>cassandra.partitioner.class</name>
>>   <value>org.apache.cassandra.dht.RandomPartitioner</value>
>> </property>
>> 
>> I would make sure you can get it to run locally first though.
>> 
>> On Apr 5, 2011, at 10:29 AM, Fabio Souto wrote:
>> 
>>> Hi,
>>> 
>>> I had a bad enviroment variable
>>> PIG_PARTITIONER=RandomPartitioner 
>>> instead of 
>>> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>>> but I correct this and still not working. I have the same error
>>> 
>>> Just in case I have this on my ~/.bash_profile
>>> 
>>> export HADOOPDIR=/etc/hadoop-0.20/conf
>>> export HADOOP_CLASSPATH=/usr/cassandra/lib/*:$HADOOP_CLASSPATH
>>> export CLASSPATH=$HADOOPDIR:$CLASSPATH
>>> 
>>> export PIG_CONF_DIR=$HADOOPDIR
>>> export PIG_CLASSPATH=/etc/hadoop/conf
>>> export PIG_CONF_DIR=$HADOOPDIR
>>> 
>>> export PIG_INITIAL_ADDRESS=localhost
>>> export PIG_RPC_PORT=9160
>>> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>>> 
>>> 
>>> BTW I'm using the pig version that comes with Cassandra, the one in cassandra/contrib/pig
>>> 
>>> Thanks for your time Jeremy! :)
>>> Fabio
>>> 
>>> On 05/04/2011, at 17:04, Jeremy Hanna wrote:
>>> 
>>>> Fabio,
>>>> 
>>>> It looks like you need to set your environment variables to connect to cassandra.  Check out the readme.  Quoting here:
>>>> Finally, set the following as environment variables (uppercase,
>>>> underscored), or as Hadoop configuration variables (lowercase, dotted):
>>>> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on 
>>>> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
>>>> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
>>>> 
>>>> So you'll probably want to do:
>>>> export PIG_INITIAL_ADDRESS=localhost
>>>> export PIG_RPC_PORT=9160
>>>> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>>>> 
>>>> Tante belle cose and let me know if this doesn't work,
>>>> 
>>>> Jeremy
>>>> 
>>>> On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:
>>>> 
>>>>> Hi Jeremy,
>>>>> 
>>>>> Of course, here it is:
>>>>> 
>>>>> Backend error message
>>>>> ---------------------
>>>>> java.lang.NumberFormatException: null
>>>>> 	at java.lang.Integer.parseInt(Integer.java:417)
>>>>> 	at java.lang.Integer.parseInt(Integer.java:499)
>>>>> 	at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
>>>>> 	at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
>>>>> 	at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
>>>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
>>>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:234)
>>>>> 
>>>>> Pig Stack Trace
>>>>> ---------------
>>>>> ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>>>> 
>>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>>>> 	at org.apache.pig.PigServer.openIterator(PigServer.java:742)
>>>>> 	at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>>> 	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>>> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>>> 	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>>> 	at org.apache.pig.Main.run(Main.java:465)
>>>>> 	at org.apache.pig.Main.main(Main.java:107)
>>>>> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
>>>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
>>>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
>>>>> 	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>>>>> 	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>>>>> 	at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>>>>> 	at org.apache.pig.PigServer.store(PigServer.java:816)
>>>>> 	at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>>>>> 	... 7 more
>>>>> ================================================================================
>>>>> 
>>>>> 
>>>>> Thanks for all,
>>>>> Fabio
>>>>> 
>>>>> 
>>>>> On 05/04/2011, at 16:19, Jeremy Hanna wrote:
>>>>> 
>>>>>> Fabio,
>>>>>> 
>>>>>> Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Jeremy
>>>>>> 
>>>>>> On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:
>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:
>>>>>>> 
>>>>>>> grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();                                                                      
>>>>>>> grunt> dump A;   
>>>>>>> 
>>>>>>> 
>>>>>>> And i'm getting the following error:
>>>>>>> ==========================================================================
>>>>>>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
>>>>>>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
>>>>>>> 2011-04-05 15:33:57,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
>>>>>>> 2011-04-05 15:33:57,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
>>>>>>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
>>>>>>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
>>>>>>> 2011-04-05 15:33:57,969 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
>>>>>>> 2011-04-05 15:33:57,990 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
>>>>>>> 2011-04-05 15:34:03,376 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
>>>>>>> 2011-04-05 15:34:03,416 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
>>>>>>> 2011-04-05 15:34:03,929 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
>>>>>>> 2011-04-05 15:34:04,597 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
>>>>>>> 2011-04-05 15:34:05,942 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
>>>>>>> 2011-04-05 15:34:05,943 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
>>>>>>> 2011-04-05 15:34:35,912 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
>>>>>>> 2011-04-05 15:34:35,918 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
>>>>>>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>>>>>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>>>>>>> 2011-04-05 15:34:35,933 [main] INFO  org.apache.pig.tools.pigstats.PigStats - Script Statistics: 
>>>>>>> 
>>>>>>> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
>>>>>>> 0.20.2-CDH3B4	0.8.0-SNAPSHOT	root	2011-04-05 15:33:57	2011-04-05 15:34:35	UNKNOWN
>>>>>>> 
>>>>>>> Failed!
>>>>>>> 
>>>>>>> Failed Jobs:
>>>>>>> JobId	Alias	Feature	Message	Outputs
>>>>>>> job_201104051459_0008	A	MAP_ONLY	Message: Job failed! Error - NA	hdfs://localhost/tmp/temp2037710644/tmp-29784200,
>>>>>>> 
>>>>>>> Input(s):
>>>>>>> Failed to read data from "cassandra://msg_keyspace/messages"
>>>>>>> 
>>>>>>> Output(s):
>>>>>>> Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
>>>>>>> ==========================================================================
>>>>>>> 
>>>>>>> Any idea how to fix this?
>>>>>>> Cheers
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: Error reading data from Cassandra

Posted by Fabio Souto <fs...@gmail.com>.

It works. Thank you for your help Jeremy!!

Cheers
Fabio

On 05/04/2011, at 20:08, Jeremy Hanna wrote:

> Hmmm, if it's the same error then it's not getting your PIG_RPC_PORT variable still.
> 
> If you're running this in <cassandra_src>/contrib/pig:
> 'bin/pig_cassandra -x local myscript.pig'
> then you should only need to set PIG_HOME, and the other environment variables for connecting to cassandra.
> 
> If you want to run it against a cluster, what I've done is had a hadoop configuration locally and point PIG_CONF to <hadoop_home>/conf and put those three variables in the mapred-site.xml like this:
>  <property>
>    <name>cassandra.thrift.address</name>
>    <value>123.45.67.89</value>
>  </property>
>  <property>
>    <name>cassandra.thrift.port</name>
>    <value>9160</value>
>  </property>
>  <property>
>    <name>cassandra.partitioner.class</name>
>    <value>org.apache.cassandra.dht.RandomPartitioner</value>
>  </property>
> 
> I would make sure you can get it to run locally first though.
> 
> On Apr 5, 2011, at 10:29 AM, Fabio Souto wrote:
> 
>> Hi,
>> 
>> I had a bad enviroment variable
>> PIG_PARTITIONER=RandomPartitioner 
>> instead of 
>> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>> but I correct this and still not working. I have the same error
>> 
>> Just in case I have this on my ~/.bash_profile
>> 
>> export HADOOPDIR=/etc/hadoop-0.20/conf
>> export HADOOP_CLASSPATH=/usr/cassandra/lib/*:$HADOOP_CLASSPATH
>> export CLASSPATH=$HADOOPDIR:$CLASSPATH
>> 
>> export PIG_CONF_DIR=$HADOOPDIR
>> export PIG_CLASSPATH=/etc/hadoop/conf
>> export PIG_CONF_DIR=$HADOOPDIR
>> 
>> export PIG_INITIAL_ADDRESS=localhost
>> export PIG_RPC_PORT=9160
>> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>> 
>> 
>> BTW I'm using the pig version that comes with Cassandra, the one in cassandra/contrib/pig
>> 
>> Thanks for your time Jeremy! :)
>> Fabio
>> 
>> On 05/04/2011, at 17:04, Jeremy Hanna wrote:
>> 
>>> Fabio,
>>> 
>>> It looks like you need to set your environment variables to connect to cassandra.  Check out the readme.  Quoting here:
>>> Finally, set the following as environment variables (uppercase,
>>> underscored), or as Hadoop configuration variables (lowercase, dotted):
>>> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on 
>>> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
>>> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
>>> 
>>> So you'll probably want to do:
>>> export PIG_INITIAL_ADDRESS=localhost
>>> export PIG_RPC_PORT=9160
>>> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>>> 
>>> Tante belle cose and let me know if this doesn't work,
>>> 
>>> Jeremy
>>> 
>>> On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:
>>> 
>>>> Hi Jeremy,
>>>> 
>>>> Of course, here it is:
>>>> 
>>>> Backend error message
>>>> ---------------------
>>>> java.lang.NumberFormatException: null
>>>> 	at java.lang.Integer.parseInt(Integer.java:417)
>>>> 	at java.lang.Integer.parseInt(Integer.java:499)
>>>> 	at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
>>>> 	at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
>>>> 	at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
>>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
>>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:234)
>>>> 
>>>> Pig Stack Trace
>>>> ---------------
>>>> ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>>> 
>>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>>> 	at org.apache.pig.PigServer.openIterator(PigServer.java:742)
>>>> 	at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>> 	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>> 	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>> 	at org.apache.pig.Main.run(Main.java:465)
>>>> 	at org.apache.pig.Main.main(Main.java:107)
>>>> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
>>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
>>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
>>>> 	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>>>> 	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>>>> 	at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>>>> 	at org.apache.pig.PigServer.store(PigServer.java:816)
>>>> 	at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>>>> 	... 7 more
>>>> ================================================================================
>>>> 
>>>> 
>>>> Thanks for all,
>>>> Fabio
>>>> 
>>>> 
>>>> On 05/04/2011, at 16:19, Jeremy Hanna wrote:
>>>> 
>>>>> Fabio,
>>>>> 
>>>>> Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jeremy
>>>>> 
>>>>> On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:
>>>>>> 
>>>>>> grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();                                                                      
>>>>>> grunt> dump A;   
>>>>>> 
>>>>>> 
>>>>>> And i'm getting the following error:
>>>>>> ==========================================================================
>>>>>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
>>>>>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
>>>>>> 2011-04-05 15:33:57,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
>>>>>> 2011-04-05 15:33:57,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
>>>>>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
>>>>>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
>>>>>> 2011-04-05 15:33:57,969 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
>>>>>> 2011-04-05 15:33:57,990 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
>>>>>> 2011-04-05 15:34:03,376 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
>>>>>> 2011-04-05 15:34:03,416 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
>>>>>> 2011-04-05 15:34:03,929 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
>>>>>> 2011-04-05 15:34:04,597 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
>>>>>> 2011-04-05 15:34:05,942 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
>>>>>> 2011-04-05 15:34:05,943 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
>>>>>> 2011-04-05 15:34:35,912 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
>>>>>> 2011-04-05 15:34:35,918 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
>>>>>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>>>>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>>>>>> 2011-04-05 15:34:35,933 [main] INFO  org.apache.pig.tools.pigstats.PigStats - Script Statistics: 
>>>>>> 
>>>>>> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
>>>>>> 0.20.2-CDH3B4	0.8.0-SNAPSHOT	root	2011-04-05 15:33:57	2011-04-05 15:34:35	UNKNOWN
>>>>>> 
>>>>>> Failed!
>>>>>> 
>>>>>> Failed Jobs:
>>>>>> JobId	Alias	Feature	Message	Outputs
>>>>>> job_201104051459_0008	A	MAP_ONLY	Message: Job failed! Error - NA	hdfs://localhost/tmp/temp2037710644/tmp-29784200,
>>>>>> 
>>>>>> Input(s):
>>>>>> Failed to read data from "cassandra://msg_keyspace/messages"
>>>>>> 
>>>>>> Output(s):
>>>>>> Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
>>>>>> ==========================================================================
>>>>>> 
>>>>>> Any idea how to fix this?
>>>>>> Cheers
>>>>> 
>>>> 
>>> 
>> 
>

Re: Error reading data from Cassandra

Posted by Jeremy Hanna <je...@gmail.com>.

Hmmm, if it's the same error then it's not getting your PIG_RPC_PORT variable still.

If you're running this in <cassandra_src>/contrib/pig:
'bin/pig_cassandra -x local myscript.pig'
then you should only need to set PIG_HOME, and the other environment variables for connecting to cassandra.

If you want to run it against a cluster, what I've done is had a hadoop configuration locally and point PIG_CONF to <hadoop_home>/conf and put those three variables in the mapred-site.xml like this:
  <property>
    <name>cassandra.thrift.address</name>
    <value>123.45.67.89</value>
  </property>
  <property>
    <name>cassandra.thrift.port</name>
    <value>9160</value>
  </property>
  <property>
    <name>cassandra.partitioner.class</name>
    <value>org.apache.cassandra.dht.RandomPartitioner</value>
  </property>

I would make sure you can get it to run locally first though.

On Apr 5, 2011, at 10:29 AM, Fabio Souto wrote:

> Hi,
> 
> I had a bad enviroment variable
> PIG_PARTITIONER=RandomPartitioner 
> instead of 
> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
> but I correct this and still not working. I have the same error
> 
> Just in case I have this on my ~/.bash_profile
> 
> export HADOOPDIR=/etc/hadoop-0.20/conf
> export HADOOP_CLASSPATH=/usr/cassandra/lib/*:$HADOOP_CLASSPATH
> export CLASSPATH=$HADOOPDIR:$CLASSPATH
> 
> export PIG_CONF_DIR=$HADOOPDIR
> export PIG_CLASSPATH=/etc/hadoop/conf
> export PIG_CONF_DIR=$HADOOPDIR
> 
> export PIG_INITIAL_ADDRESS=localhost
> export PIG_RPC_PORT=9160
> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
> 
> 
> BTW I'm using the pig version that comes with Cassandra, the one in cassandra/contrib/pig
> 
> Thanks for your time Jeremy! :)
> Fabio
> 
> On 05/04/2011, at 17:04, Jeremy Hanna wrote:
> 
>> Fabio,
>> 
>> It looks like you need to set your environment variables to connect to cassandra.  Check out the readme.  Quoting here:
>> Finally, set the following as environment variables (uppercase,
>> underscored), or as Hadoop configuration variables (lowercase, dotted):
>> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on 
>> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
>> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
>> 
>> So you'll probably want to do:
>> export PIG_INITIAL_ADDRESS=localhost
>> export PIG_RPC_PORT=9160
>> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>> 
>> Tante belle cose and let me know if this doesn't work,
>> 
>> Jeremy
>> 
>> On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:
>> 
>>> Hi Jeremy,
>>> 
>>> Of course, here it is:
>>> 
>>> Backend error message
>>> ---------------------
>>> java.lang.NumberFormatException: null
>>> 	at java.lang.Integer.parseInt(Integer.java:417)
>>> 	at java.lang.Integer.parseInt(Integer.java:499)
>>> 	at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
>>> 	at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
>>> 	at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:234)
>>> 
>>> Pig Stack Trace
>>> ---------------
>>> ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>> 
>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>> 	at org.apache.pig.PigServer.openIterator(PigServer.java:742)
>>> 	at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>> 	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>> 	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>> 	at org.apache.pig.Main.run(Main.java:465)
>>> 	at org.apache.pig.Main.main(Main.java:107)
>>> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
>>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
>>> 	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>>> 	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>>> 	at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>>> 	at org.apache.pig.PigServer.store(PigServer.java:816)
>>> 	at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>>> 	... 7 more
>>> ================================================================================
>>> 
>>> 
>>> Thanks for all,
>>> Fabio
>>> 
>>> 
>>> On 05/04/2011, at 16:19, Jeremy Hanna wrote:
>>> 
>>>> Fabio,
>>>> 
>>>> Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?
>>>> 
>>>> Thanks,
>>>> 
>>>> Jeremy
>>>> 
>>>> On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:
>>>>> 
>>>>> grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();                                                                      
>>>>> grunt> dump A;   
>>>>> 
>>>>> 
>>>>> And i'm getting the following error:
>>>>> ==========================================================================
>>>>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
>>>>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
>>>>> 2011-04-05 15:33:57,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
>>>>> 2011-04-05 15:33:57,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
>>>>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
>>>>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
>>>>> 2011-04-05 15:33:57,969 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
>>>>> 2011-04-05 15:33:57,990 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
>>>>> 2011-04-05 15:34:03,376 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
>>>>> 2011-04-05 15:34:03,416 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
>>>>> 2011-04-05 15:34:03,929 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
>>>>> 2011-04-05 15:34:04,597 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
>>>>> 2011-04-05 15:34:05,942 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
>>>>> 2011-04-05 15:34:05,943 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
>>>>> 2011-04-05 15:34:35,912 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
>>>>> 2011-04-05 15:34:35,918 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
>>>>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>>>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>>>>> 2011-04-05 15:34:35,933 [main] INFO  org.apache.pig.tools.pigstats.PigStats - Script Statistics: 
>>>>> 
>>>>> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
>>>>> 0.20.2-CDH3B4	0.8.0-SNAPSHOT	root	2011-04-05 15:33:57	2011-04-05 15:34:35	UNKNOWN
>>>>> 
>>>>> Failed!
>>>>> 
>>>>> Failed Jobs:
>>>>> JobId	Alias	Feature	Message	Outputs
>>>>> job_201104051459_0008	A	MAP_ONLY	Message: Job failed! Error - NA	hdfs://localhost/tmp/temp2037710644/tmp-29784200,
>>>>> 
>>>>> Input(s):
>>>>> Failed to read data from "cassandra://msg_keyspace/messages"
>>>>> 
>>>>> Output(s):
>>>>> Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
>>>>> ==========================================================================
>>>>> 
>>>>> Any idea how to fix this?
>>>>> Cheers
>>>> 
>>> 
>> 
>

Re: Error reading data from Cassandra

Posted by Fabio Souto <fs...@gmail.com>.

Hi,

I had a bad enviroment variable
PIG_PARTITIONER=RandomPartitioner 
instead of 
PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
but I correct this and still not working. I have the same error

Just in case I have this on my ~/.bash_profile

export HADOOPDIR=/etc/hadoop-0.20/conf
export HADOOP_CLASSPATH=/usr/cassandra/lib/*:$HADOOP_CLASSPATH
export CLASSPATH=$HADOOPDIR:$CLASSPATH

export PIG_CONF_DIR=$HADOOPDIR
export PIG_CLASSPATH=/etc/hadoop/conf
export PIG_CONF_DIR=$HADOOPDIR

export PIG_INITIAL_ADDRESS=localhost
export PIG_RPC_PORT=9160
export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner


BTW I'm using the pig version that comes with Cassandra, the one in cassandra/contrib/pig

Thanks for your time Jeremy! :)
Fabio

On 05/04/2011, at 17:04, Jeremy Hanna wrote:

> Fabio,
> 
> It looks like you need to set your environment variables to connect to cassandra.  Check out the readme.  Quoting here:
> Finally, set the following as environment variables (uppercase,
> underscored), or as Hadoop configuration variables (lowercase, dotted):
> * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on 
> * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
> * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
> 
> So you'll probably want to do:
> export PIG_INITIAL_ADDRESS=localhost
> export PIG_RPC_PORT=9160
> export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
> 
> Tante belle cose and let me know if this doesn't work,
> 
> Jeremy
> 
> On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:
> 
>> Hi Jeremy,
>> 
>> Of course, here it is:
>> 
>> Backend error message
>> ---------------------
>> java.lang.NumberFormatException: null
>> 	at java.lang.Integer.parseInt(Integer.java:417)
>> 	at java.lang.Integer.parseInt(Integer.java:499)
>> 	at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
>> 	at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
>> 	at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>> 	at org.apache.hadoop.mapred.Child.main(Child.java:234)
>> 
>> Pig Stack Trace
>> ---------------
>> ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>> 
>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>> 	at org.apache.pig.PigServer.openIterator(PigServer.java:742)
>> 	at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>> 	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>> 	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>> 	at org.apache.pig.Main.run(Main.java:465)
>> 	at org.apache.pig.Main.main(Main.java:107)
>> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
>> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
>> 	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>> 	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>> 	at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>> 	at org.apache.pig.PigServer.store(PigServer.java:816)
>> 	at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>> 	... 7 more
>> ================================================================================
>> 
>> 
>> Thanks for all,
>> Fabio
>> 
>> 
>> On 05/04/2011, at 16:19, Jeremy Hanna wrote:
>> 
>>> Fabio,
>>> 
>>> Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?
>>> 
>>> Thanks,
>>> 
>>> Jeremy
>>> 
>>> On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:
>>>> 
>>>> grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();                                                                      
>>>> grunt> dump A;   
>>>> 
>>>> 
>>>> And i'm getting the following error:
>>>> ==========================================================================
>>>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
>>>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
>>>> 2011-04-05 15:33:57,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
>>>> 2011-04-05 15:33:57,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
>>>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
>>>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
>>>> 2011-04-05 15:33:57,969 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
>>>> 2011-04-05 15:33:57,990 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
>>>> 2011-04-05 15:34:03,376 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
>>>> 2011-04-05 15:34:03,416 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
>>>> 2011-04-05 15:34:03,929 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
>>>> 2011-04-05 15:34:04,597 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
>>>> 2011-04-05 15:34:05,942 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
>>>> 2011-04-05 15:34:05,943 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
>>>> 2011-04-05 15:34:35,912 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
>>>> 2011-04-05 15:34:35,918 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
>>>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>>>> 2011-04-05 15:34:35,933 [main] INFO  org.apache.pig.tools.pigstats.PigStats - Script Statistics: 
>>>> 
>>>> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
>>>> 0.20.2-CDH3B4	0.8.0-SNAPSHOT	root	2011-04-05 15:33:57	2011-04-05 15:34:35	UNKNOWN
>>>> 
>>>> Failed!
>>>> 
>>>> Failed Jobs:
>>>> JobId	Alias	Feature	Message	Outputs
>>>> job_201104051459_0008	A	MAP_ONLY	Message: Job failed! Error - NA	hdfs://localhost/tmp/temp2037710644/tmp-29784200,
>>>> 
>>>> Input(s):
>>>> Failed to read data from "cassandra://msg_keyspace/messages"
>>>> 
>>>> Output(s):
>>>> Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
>>>> ==========================================================================
>>>> 
>>>> Any idea how to fix this?
>>>> Cheers
>>> 
>> 
>

Re: Error reading data from Cassandra

Posted by Jeremy Hanna <je...@gmail.com>.

Fabio,

It looks like you need to set your environment variables to connect to cassandra.  Check out the readme.  Quoting here:
Finally, set the following as environment variables (uppercase,
underscored), or as Hadoop configuration variables (lowercase, dotted):
* PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on 
* PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
* PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner

So you'll probably want to do:
export PIG_INITIAL_ADDRESS=localhost
export PIG_RPC_PORT=9160
export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner

Tante belle cose and let me know if this doesn't work,

Jeremy

On Apr 5, 2011, at 9:38 AM, Fabio Souto wrote:

> Hi Jeremy,
> 
> Of course, here it is:
> 
> Backend error message
> ---------------------
> java.lang.NumberFormatException: null
> 	at java.lang.Integer.parseInt(Integer.java:417)
> 	at java.lang.Integer.parseInt(Integer.java:499)
> 	at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
> 	at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
> 	at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:234)
> 
> Pig Stack Trace
> ---------------
> ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
> 
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
> 	at org.apache.pig.PigServer.openIterator(PigServer.java:742)
> 	at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> 	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> 	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> 	at org.apache.pig.Main.run(Main.java:465)
> 	at org.apache.pig.Main.main(Main.java:107)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
> 	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
> 	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
> 	at org.apache.pig.PigServer.storeEx(PigServer.java:874)
> 	at org.apache.pig.PigServer.store(PigServer.java:816)
> 	at org.apache.pig.PigServer.openIterator(PigServer.java:728)
> 	... 7 more
> ================================================================================
> 
> 
> Thanks for all,
> Fabio
> 
> 
> On 05/04/2011, at 16:19, Jeremy Hanna wrote:
> 
>> Fabio,
>> 
>> Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?
>> 
>> Thanks,
>> 
>> Jeremy
>> 
>> On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:
>> 
>>> Hello,
>>> 
>>> I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:
>>> 
>>> grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();                                                                      
>>> grunt> dump A;   
>>> 
>>> 
>>> And i'm getting the following error:
>>> ==========================================================================
>>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
>>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
>>> 2011-04-05 15:33:57,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
>>> 2011-04-05 15:33:57,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
>>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
>>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
>>> 2011-04-05 15:33:57,969 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
>>> 2011-04-05 15:33:57,990 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
>>> 2011-04-05 15:34:03,376 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
>>> 2011-04-05 15:34:03,416 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
>>> 2011-04-05 15:34:03,929 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
>>> 2011-04-05 15:34:04,597 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
>>> 2011-04-05 15:34:05,942 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
>>> 2011-04-05 15:34:05,943 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
>>> 2011-04-05 15:34:35,912 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
>>> 2011-04-05 15:34:35,918 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
>>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>>> 2011-04-05 15:34:35,933 [main] INFO  org.apache.pig.tools.pigstats.PigStats - Script Statistics: 
>>> 
>>> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
>>> 0.20.2-CDH3B4	0.8.0-SNAPSHOT	root	2011-04-05 15:33:57	2011-04-05 15:34:35	UNKNOWN
>>> 
>>> Failed!
>>> 
>>> Failed Jobs:
>>> JobId	Alias	Feature	Message	Outputs
>>> job_201104051459_0008	A	MAP_ONLY	Message: Job failed! Error - NA	hdfs://localhost/tmp/temp2037710644/tmp-29784200,
>>> 
>>> Input(s):
>>> Failed to read data from "cassandra://msg_keyspace/messages"
>>> 
>>> Output(s):
>>> Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
>>> ==========================================================================
>>> 
>>> Any idea how to fix this?
>>> Cheers
>> 
>

Re: Error reading data from Cassandra

Posted by Fabio Souto <fs...@gmail.com>.

Hi Jeremy,

Of course, here it is:

Backend error message
---------------------
java.lang.NumberFormatException: null
	at java.lang.Integer.parseInt(Integer.java:417)
	at java.lang.Integer.parseInt(Integer.java:499)
	at org.apache.cassandra.hadoop.ConfigHelper.getRpcPort(ConfigHelper.java:233)
	at org.apache.cassandra.hadoop.pig.CassandraStorage.setConnectionInformation(Unknown Source)
	at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown Source)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:133)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:111)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
	at org.apache.hadoop.mapred.Child.main(Child.java:234)

Pig Stack Trace
---------------
ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A. Backend error : Unable to recreate exception from backed error: java.lang.NumberFormatException: null
	at org.apache.pig.PigServer.openIterator(PigServer.java:742)
	at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
	at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
	at org.apache.pig.Main.run(Main.java:465)
	at org.apache.pig.Main.main(Main.java:107)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:151)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
	at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
	at org.apache.pig.PigServer.storeEx(PigServer.java:874)
	at org.apache.pig.PigServer.store(PigServer.java:816)
	at org.apache.pig.PigServer.openIterator(PigServer.java:728)
	... 7 more
================================================================================


Thanks for all,
Fabio


On 05/04/2011, at 16:19, Jeremy Hanna wrote:

> Fabio,
> 
> Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?
> 
> Thanks,
> 
> Jeremy
> 
> On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:
> 
>> Hello,
>> 
>> I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:
>> 
>> grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();                                                                      
>> grunt> dump A;   
>> 
>> 
>> And i'm getting the following error:
>> ==========================================================================
>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
>> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
>> 2011-04-05 15:33:57,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
>> 2011-04-05 15:33:57,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
>> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
>> 2011-04-05 15:33:57,969 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
>> 2011-04-05 15:33:57,990 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
>> 2011-04-05 15:34:03,376 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
>> 2011-04-05 15:34:03,416 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
>> 2011-04-05 15:34:03,929 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
>> 2011-04-05 15:34:04,597 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
>> 2011-04-05 15:34:05,942 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
>> 2011-04-05 15:34:05,943 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
>> 2011-04-05 15:34:35,912 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
>> 2011-04-05 15:34:35,918 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
>> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
>> 2011-04-05 15:34:35,933 [main] INFO  org.apache.pig.tools.pigstats.PigStats - Script Statistics: 
>> 
>> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
>> 0.20.2-CDH3B4	0.8.0-SNAPSHOT	root	2011-04-05 15:33:57	2011-04-05 15:34:35	UNKNOWN
>> 
>> Failed!
>> 
>> Failed Jobs:
>> JobId	Alias	Feature	Message	Outputs
>> job_201104051459_0008	A	MAP_ONLY	Message: Job failed! Error - NA	hdfs://localhost/tmp/temp2037710644/tmp-29784200,
>> 
>> Input(s):
>> Failed to read data from "cassandra://msg_keyspace/messages"
>> 
>> Output(s):
>> Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
>> ==========================================================================
>> 
>> Any idea how to fix this?
>> Cheers
>

Re: Error reading data from Cassandra

Posted by Jeremy Hanna <je...@gmail.com>.

Fabio,

Could you post the full stack trace that's found in the pig_<long number>.log that's in the directory that you ran pig?

Thanks,

Jeremy

On Apr 5, 2011, at 8:42 AM, Fabio Souto wrote:

> Hello,
> 
> I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test:
> 
> grunt> A = LOAD 'cassandra://msg_keyspace/messages' USING org.apache.cassandra.hadoop.pig.CassandraStorage();                                                                      
> grunt> dump A;   
> 
> 
> And i'm getting the following error:
> ==========================================================================
> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
> 2011-04-05 15:33:57,669 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
> 2011-04-05 15:33:57,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: A: Store(hdfs://localhost/tmp/temp2037710644/tmp-29784200:org.apache.pig.impl.io.InterStorage) - scope-1 Operator Key: scope-1)
> 2011-04-05 15:33:57,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
> 2011-04-05 15:33:57,877 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
> 2011-04-05 15:33:57,969 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
> 2011-04-05 15:33:57,990 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2011-04-05 15:34:03,376 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2011-04-05 15:34:03,416 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2011-04-05 15:34:03,929 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
> 2011-04-05 15:34:04,597 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
> 2011-04-05 15:34:05,942 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201104051459_0008
> 2011-04-05 15:34:05,943 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://localhost:50030/jobdetails.jsp?jobid=job_201104051459_0008
> 2011-04-05 15:34:35,912 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201104051459_0008 has failed! Stop running all dependent jobs
> 2011-04-05 15:34:35,918 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 2997: Unable to recreate exception from backed error: java.lang.NumberFormatException: null
> 2011-04-05 15:34:35,931 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2011-04-05 15:34:35,933 [main] INFO  org.apache.pig.tools.pigstats.PigStats - Script Statistics: 
> 
> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
> 0.20.2-CDH3B4	0.8.0-SNAPSHOT	root	2011-04-05 15:33:57	2011-04-05 15:34:35	UNKNOWN
> 
> Failed!
> 
> Failed Jobs:
> JobId	Alias	Feature	Message	Outputs
> job_201104051459_0008	A	MAP_ONLY	Message: Job failed! Error - NA	hdfs://localhost/tmp/temp2037710644/tmp-29784200,
> 
> Input(s):
> Failed to read data from "cassandra://msg_keyspace/messages"
> 
> Output(s):
> Failed to produce result in "hdfs://localhost/tmp/temp2037710644/tmp-29784200"
> ==========================================================================
> 
> Any idea how to fix this?
> Cheers