You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Artem Aliev (JIRA)" <ji...@apache.org> on 2015/01/09 13:09:34 UTC

[jira] [Comment Edited] (CASSANDRA-8577) Values of set types not loading correctly into Pig

    [ https://issues.apache.org/jira/browse/CASSANDRA-8577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270942#comment-14270942 ] 

Artem Aliev edited comment on CASSANDRA-8577 at 1/9/15 12:08 PM:
-----------------------------------------------------------------

to reproduce the bug with unit tests:
1 replace ./build/lib/jars/cassandra-driver-core-2.0.5.jar with cassandra-driver-core-2.1.3.jar
2 run pig unit tests 
 ant pig-test -Dtest.name=CqlTableDataTypeTest
{code}
….
   [junit] org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes after list value
    [junit] at org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:104)
    [junit] at org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:27)
    [junit] at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
    [junit] at org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
    [junit] at org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
    [junit] at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
    [junit] at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
    [junit] at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    [junit] at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    [junit] at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    [junit] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    [junit] at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
….
{code}

Cassandra 2.1 is shipped with java driver 2.0, that used V2 native protocol. The java driver 2.1 is available and it use V3 native protocol.
The collection serialisation is changed in V3. Current implementation of pig reader has harcoded version 1 for deserialisation, as result of incomplete fix of CASSANDRA-7287.
The version 1 should be used in cql-over-thrift deprecated API only. CqlNativeStorage use java driver protocol. So the patch passes the negotiated by java driver serialisation protocol to deserialiser in case CqlNativeStorage is used. I also add optional ‘cassandra.input.native.protocol.version’ parameter to force the protocol version, just in case.



was (Author: artem.aliev):
to reproduce the bug with unit tests:
1 replace ./build/lib/jars/cassandra-driver-core-2.0.5.jar with cassandra-driver-core-2.0.5.jar
2 run pig unit tests 
 ant pig-test -Dtest.name=CqlTableDataTypeTest
{code}
….
   [junit] org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes after list value
    [junit] at org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:104)
    [junit] at org.apache.cassandra.serializers.ListSerializer.deserializeForNativeProtocol(ListSerializer.java:27)
    [junit] at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
    [junit] at org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
    [junit] at org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
    [junit] at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
    [junit] at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
    [junit] at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    [junit] at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    [junit] at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    [junit] at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    [junit] at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
….
{code}

Cassandra 2.1 is shipped with java driver 2.0, that used V2 native protocol. The java driver 2.1 is available and it use V3 native protocol.
The collection serialisation is changed in V3. Current implementation of pig reader has harcoded version 1 for deserialisation, as result of incomplete fix of CASSANDRA-7287.
The version 1 should be used in cql-over-thrift deprecated API only. CqlNativeStorage use java driver protocol. So the patch passes the negotiated by java driver serialisation protocol to deserialiser in case CqlNativeStorage is used. I also add optional ‘cassandra.input.native.protocol.version’ parameter to force the protocol version, just in case.


> Values of set types not loading correctly into Pig
> --------------------------------------------------
>
>                 Key: CASSANDRA-8577
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8577
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Oksana Danylyshyn
>            Assignee: Brandon Williams
>             Fix For: 2.1.3
>
>         Attachments: cassandra-2.1-8577.txt
>
>
> Values of set types are not loading correctly from Cassandra (cql3 table, Native protocol v3) into Pig using CqlNativeStorage. 
> When using Cassandra version 2.1.0 only empty values are loaded, and for newer versions (2.1.1 and 2.1.2) the following error is received: 
> org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes after set value
> at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
> Steps to reproduce:
> {code}cqlsh:socialdata> CREATE TABLE test (
>                  key varchar PRIMARY KEY,
>                  tags set<varchar>
>                );
> cqlsh:socialdata> insert into test (key, tags) values ('key', {'Running', 'onestep4red', 'running'});
> cqlsh:socialdata> select * from test;
>  key | tags
> -----+---------------------------------------
>  key | {'Running', 'onestep4red', 'running'}
> (1 rows){code}
> With version 2.1.0:
> {code}grunt> data = load 'cql://socialdata/test' using org.apache.cassandra.hadoop.pig.CqlNativeStorage();
> grunt> dump data;
> (key,()){code}
> With version 2.1.2:
> {code}grunt> data = load 'cql://socialdata/test' using org.apache.cassandra.hadoop.pig.CqlNativeStorage();
> grunt> dump data;
> org.apache.cassandra.serializers.MarshalException: Unexpected extraneous bytes after set value
>   at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:94)
>   at org.apache.cassandra.serializers.SetSerializer.deserializeForNativeProtocol(SetSerializer.java:27)
>   at org.apache.cassandra.hadoop.pig.AbstractCassandraStorage.cassandraToObj(AbstractCassandraStorage.java:796)
>   at org.apache.cassandra.hadoop.pig.CqlStorage.cqlColumnToObj(CqlStorage.java:195)
>   at org.apache.cassandra.hadoop.pig.CqlNativeStorage.getNext(CqlNativeStorage.java:106)
>   at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
>   at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
>   at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>   at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212){code}
> Expected result:
> {code}(key,(Running,onestep4red,running)){code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)