You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Philip Thompson (JIRA)" <ji...@apache.org> on 2015/01/29 19:03:35 UTC

[jira] [Comment Edited] (CASSANDRA-8358) Bundled tools shouldn't be using Thrift API

    [ https://issues.apache.org/jira/browse/CASSANDRA-8358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296200#comment-14296200 ] 

Philip Thompson edited comment on CASSANDRA-8358 at 1/29/15 6:03 PM:
---------------------------------------------------------------------

Here is my current branch: https://github.com/ptnapoleon/cassandra/tree/hadoop

 I have recently received a JAR of the a tentative 2.1.5 of the driver containing JAVA-312, so I can finish work on the BulkLoader changes now.

I was having an issue where the Thread calling the java driver's connect() was being interrupted, which was causing the connect() to fail. Currently I check for Thread.interrupted() and retry if that is the reason for the failure. I am not sure how to prevent the interruption in the first place.

Currently when running pig-test, only one test that uses CqlNativeStorage is failing, and that is testCqlNativeStorageCollectionColumnTable. 
This is due to the following problem:
{code}
java.lang.IllegalArgumentException
	at java.nio.Buffer.limit(Buffer.java:267)
	at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:552)
	at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:561)
	at org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:118)
	at org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:100)
	at org.apache.cassandra.cql3.Maps$Value.fromSerialized(Maps.java:164)
	at org.apache.cassandra.cql3.Maps$Marker.bind(Maps.java:273)
	at org.apache.cassandra.cql3.Maps$Marker.bind(Maps.java:262)
	at org.apache.cassandra.cql3.Maps$Putter.doPut(Maps.java:355)
	at org.apache.cassandra.cql3.Maps$Setter.execute(Maps.java:292)
	at org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:98)
	at org.apache.cassandra.cql3.statements.ModificationStatement.getMutations(ModificationStatement.java:655)
	at org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:487)
	at org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:473)
	at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:233)
	at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:443)
	at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:134)
	at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439)
	at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
	at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
	at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
	at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
	at java.lang.Thread.run(Thread.java:745)
{code}
This is erroring because in CollectionSerializer.readValue
{code}
    public static ByteBuffer readValue(ByteBuffer input, int version)
    {
        if (version >= Server.VERSION_3)
        {
            int size = input.getInt();
            if (size < 0)
                return null;

            return ByteBufferUtil.readBytes(input, size);
        }
        else
        {
            return ByteBufferUtil.readBytesWithShortLength(input);
        }
    }
{code}
The value of size from input.getInt() is an integer in the millions for one of the map values. I am still figuring out what is differing from cassandra-2.1 where the test is passing without my changes, but the ByteBuffer itself doesn't appear to be different.

In  CqlConfigHelper, should I be creating an OUTPUT_* property for each INPUT_* property?

PigTestBase should be switched over to using the java driver, but I would rather handle that in a separate ticket.
AbstractCassandraStorage may be deprecated for 3.0, but it is not working at all with the current schema parsing queries. That also belongs in a separate ticket.


was (Author: philipthompson):
Here is my current branch: https://github.com/ptnapoleon/cassandra/compare/8358
Sorry about the WIP pushed changes to BulkLoader, ignore those for now. I have recently received a JAR of the a tentative 2.1.5 of the driver containing JAVA-312, so I can finish work on this now.

I was having an issue where the Thread calling the java driver's connect() was being interrupted, which was causing the connect() to fail. Currently I check for Thread.interrupted() and retry if that is the reason for the failure. I am not sure how to prevent the interruption in the first place.

Currently when running pig-test, only one test that uses CqlNativeStorage is failing, and that is testCqlNativeStorageCollectionColumnTable. 
This is due to the following problem:
{code}
java.lang.IllegalArgumentException
	at java.nio.Buffer.limit(Buffer.java:267)
	at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:552)
	at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:561)
	at org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:118)
	at org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:100)
	at org.apache.cassandra.cql3.Maps$Value.fromSerialized(Maps.java:164)
	at org.apache.cassandra.cql3.Maps$Marker.bind(Maps.java:273)
	at org.apache.cassandra.cql3.Maps$Marker.bind(Maps.java:262)
	at org.apache.cassandra.cql3.Maps$Putter.doPut(Maps.java:355)
	at org.apache.cassandra.cql3.Maps$Setter.execute(Maps.java:292)
	at org.apache.cassandra.cql3.statements.UpdateStatement.addUpdateForKey(UpdateStatement.java:98)
	at org.apache.cassandra.cql3.statements.ModificationStatement.getMutations(ModificationStatement.java:655)
	at org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:487)
	at org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:473)
	at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:233)
	at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:443)
	at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:134)
	at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439)
	at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
	at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
	at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
	at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
	at java.lang.Thread.run(Thread.java:745)
{code}
This is erroring because in CollectionSerializer.readValue
{code}
    public static ByteBuffer readValue(ByteBuffer input, int version)
    {
        if (version >= Server.VERSION_3)
        {
            int size = input.getInt();
            if (size < 0)
                return null;

            return ByteBufferUtil.readBytes(input, size);
        }
        else
        {
            return ByteBufferUtil.readBytesWithShortLength(input);
        }
    }
{code}
The value of size from input.getInt() is an integer in the millions for one of the map values. I am still figuring out what is differing from cassandra-2.1 where the test is passing without my changes, but the ByteBuffer itself doesn't appear to be different.

In  CqlConfigHelper, should I be creating an OUTPUT_* property for each INPUT_* property?

PigTestBase should be switched over to using the java driver, but I would rather handle that in a separate ticket.
AbstractCassandraStorage may be deprecated for 3.0, but it is not working at all with the current schema parsing queries. That also belongs in a separate ticket.

> Bundled tools shouldn't be using Thrift API
> -------------------------------------------
>
>                 Key: CASSANDRA-8358
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8358
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>            Assignee: Philip Thompson
>             Fix For: 3.0
>
>
> In 2.1, we switched cqlsh to the python-driver.
> In 3.0, we got rid of cassandra-cli.
> Yet there is still code that's using legacy Thrift API. We want to convert it all to use the java-driver instead.
> 1. BulkLoader uses Thrift to query the schema tables. It should be using java-driver metadata APIs directly instead.
> 2. o.a.c.hadoop.cql3.CqlRecordWriter is using Thrift
> 3. o.a.c.hadoop.ColumnFamilyRecordReader is using Thrift
> 4. o.a.c.hadoop.AbstractCassandraStorage is using Thrift
> 5. o.a.c.hadoop.pig.CqlStorage is using Thrift
> Some of the things listed above use Thrift to get the list of partition key columns or clustering columns. Those should be converted to use the Metadata API of the java-driver.
> Somewhat related to that, we also have badly ported code from Thrift in o.a.c.hadoop.cql3.CqlRecordReader (see fetchKeys()) that manually fetches columns from schema tables instead of properly using the driver's Metadata API.
> We need all of it fixed. One exception, for now, is o.a.c.hadoop.AbstractColumnFamilyInputFormat - it's using Thrift for its describe_splits_ex() call that cannot be currently replaced by any java-driver call (?).
> Once this is done, we can stop starting Thrift RPC port by default in cassandra.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)