You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Pengyu Hou (JIRA)" <ji...@apache.org> on 2013/12/10 02:28:06 UTC

[jira] [Created] (CASSANDRA-6466) Using Pig_Cassandra while Loading Column Family from Cassandra

Pengyu Hou created CASSANDRA-6466:
-------------------------------------

             Summary: Using Pig_Cassandra while Loading Column Family from Cassandra 
                 Key: CASSANDRA-6466
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6466
             Project: Cassandra
          Issue Type: Bug
         Environment: MAC OS with Single node hadoop 
            Reporter: Pengyu Hou
            Priority: Minor
             Fix For: 1.2.12


My cassandra version 1.2.12, Pig version, 0.7.0, hadoop version 1.0.4,

My dataset is like this way, user_name | tweet | user_id:
mr_tootall93 | I love you @Beyonce | 408845338565300224

For cqlsh part:
cqlsh:pxh130430> CREATE COLUMNFAMILY twitters (user varchar, tweet varchar, user_id varchar, PRIMARY KEY(user_id));

cqlsh:pxh130430> COPY twitters (user, tweet, user_id) FROM '~/nameT.csv' with delimiter = '|';

3625 rows imported in 3.520 seconds.

Then, for pig_cassandra part,
grunt> rows = LOAD 'cql://pxh130430/twitters' USING CqlStorage();
grunt> describe rows;                                            
rows: {user_id: chararray,tweet: chararray,user: chararray}
grunt> dump rows;                                                
2013-12-09 18:18:36,019 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,024 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,029 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,034 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,038 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,043 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,047 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,051 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,063 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,069 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,074 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,079 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,086 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,092 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,097 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,102 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,106 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,114 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,120 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,125 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,126 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1480062582/tmp1079331104:org.apache.pig.builtin.BinStorage) - 1-1195 Operator Key: 1-1195)
2013-12-09 18:18:36,126 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2013-12-09 18:18:36,126 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2013-12-09 18:18:36,129 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,130 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:36,130 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-12-09 18:18:37,778 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2013-12-09 18:18:37,780 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:37,780 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-12-09 18:18:37,781 [Thread-191] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2013-12-09 18:18:37,950 [Thread-191] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:37,953 [Thread-191] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:38,009 [Thread-200] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:38,027 [Thread-200] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:38,028 [Thread-200] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:38,030 [Thread-200] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:38,032 [Thread-200] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:38,067 [Thread-200] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0013
java.lang.RuntimeException
	at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665)
	at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.<init>(CqlPagingRecordReader.java:301)
	at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initialize(PigRecordReader.java:133)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: InvalidRequestException(why:Expected 8 or 0 byte long (1))
	at org.apache.cassandra.thrift.Cassandra$execute_prepared_cql3_query_result.read(Cassandra.java:41868)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
	at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_prepared_cql3_query(Cassandra.java:1689)
	at org.apache.cassandra.thrift.Cassandra$Client.execute_prepared_cql3_query(Cassandra.java:1674)
	at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:635)
	... 7 more
2013-12-09 18:18:38,281 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0013
2013-12-09 18:18:38,281 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-12-09 18:18:43,289 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-12-09 18:18:43,289 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
2013-12-09 18:18:43,289 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1480062582/tmp1079331104"
2013-12-09 18:18:43,289 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2013-12-09 18:18:43,295 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2013-12-09 18:18:43,296 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias rows
Details at logfile: /Users/pengyuhou/apache-cassandra-1.2.12-src/examples/pig/bin/pig_1386623559235.log

What's wrong with this? How can I get the output just as the column family?

If I tried another way:
For cqlsh part:
CREATE TABLE twitters ( user_id varchar PRIMARY KEY, tweet varchar, user varchar);
COPY twitters (user, tweet, user_id) FROM '/tmp/nameT.csv' with delimiter = '|';

Then, for the pig_canssandra part:
grunt> rows = LOAD 'cassandra://pxh130430/twitters' USING CassandraStorage();                                         
grunt> describe rows;
rows: {key: chararray,columns: {(name: (null),value: bytearray)}}

However, in this way, pig just treated the tweet&user_name as one column. How can I get the output successfully just like this way: rows: {user_id: chararray,tweet: chararray,user: chararray} 

Thank you so much!!



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)