You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Carl Bruecken <ca...@corp.aol.com> on 2010/09/01 00:54:52 UTC

Re: TTransportException intermittently in 0.7

  I've made some progress on narrowing this down and am able to 
reproduce easily.   I am using pelops as a client and I configured the 
policy in pelops to only establish 1 connection to a cassandra node.  
I'm able to step through the pelops code line by line and see the 
resulting thrift transport logging in cassandra.   Seems that flushing 
the transport causes the unwanted TTransportConnection in the server and 
subsequent closing of the connection.   The connection should stay open 
after flushing.   When there are many connection established the 
behaviour seems intermittent and many operations succeed.



Here are the details

1) The trigger from the client side is when the framed transport is flushed.
                conn.getAPI().batch_mutate(convertedBatch, cLevel);
                 // Flush connection
                 conn.flush();

2) In CustomTThreadPoolServer.java in Cassandra I modified the code to 
log TTransportExceptions.

         catch (TTransportException ttx) {
         LOGGER.error("Transport exception", ttx);
     } catch (TException tx) {
         LOGGER.error("Thrift error occurred during processing of 
message.", tx);
     } catch (Exception x) {
         LOGGER.error("Error occurred during processing of message.", x);
     }


3) Here is the exception that is ignored in cassandra.   Flushing the 
transport causes the server to believe the client has closed the connection.

org.apache.thrift.transport.TTransportException: Cannot read. Remote 
side has closed. Tried to read 4 bytes, but only got 0 bytes.
     at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
     at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:369)
     at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:295)
     at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:202)
     at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2487)
     at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:637)

4) The next batch mutate to this connection caused the exception in the 
client

  WARN [main] 2010-08-31 18:40:06,749 Operand.java (line 72) Operation 
failed as result of network exception. Connection must be destroyed.  
See cause for details...
org.apache.thrift.transport.TTransportException: 
java.net.SocketException: Connection reset
     at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
     at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
     at 
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
     at 
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
     at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
     at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:369)
     at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:295)
     at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:202)
     at 
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:905)
     at 
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:889)
     at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:42)
     at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:38)
     at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:53)
     at org.scale7.cassandra.pelops.Mutator.execute(Mutator.java:49)
     at com.aol.data.c7.App.doWork(App.java:41)
     at com.aol.data.c7.App.main(App.java:77)
Caused by: java.net.SocketException: Connection reset
     at java.net.SocketInputStream.read(SocketInputStream.java:168)
     at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
     at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
     at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
     at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
     ... 15 more




On 8/31/10 4:04 PM, Jonathan Ellis wrote:
>
> No, I don't know that anyone has reproduced that.  TTransportException
> always means "something went wrong on the thrift side" in my
> experience, it shouldn't be cassandra-version specific.
>
> On Tue, Aug 31, 2010 at 12:53 PM, Carl Bruecken
> <ca...@corp.aol.com> wrote:
> >
> >  Are there any estimates as to when a fix for this will be checked into
> > trunk?
> >
> > Coincidentally, has anyone tracked down the issue?
> >
> >  I'm experiencing same issue with nightly build from a week ago.
> >
> > Thank You
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: TTransportException intermittently in 0.7

Posted by Andres March <am...@qualcomm.com>.

  I saw this with pelops and only with batch mutate.  Other calls worked 
fine.

On 09/01/2010 08:16 AM, Carl Bruecken wrote:
> I believe the problem here is with Pelops.   The batch_mutate does a 
> flush and the pelops flush is not required.   Removing the flush from 
> pelops fixes the issue.
>
> By chance is anyone that has seen this error using Pelops?
>
>
> On 8/31/10 6:54 PM, Carl Bruecken wrote:
>> I've made some progress on narrowing this down and am able to 
>> reproduce easily.   I am using pelops as a client and I configured 
>> the policy in pelops to only establish 1 connection to a cassandra 
>> node.  I'm able to step through the pelops code line by line and see 
>> the resulting thrift transport logging in cassandra.   Seems that 
>> flushing the transport causes the unwanted TTransportConnection in 
>> the server and subsequent closing of the connection.   The connection 
>> should stay open after flushing.   When there are many connection 
>> established the behaviour seems intermittent and many operations succeed.
>>
>>
>>
>> Here are the details
>>
>> 1) The trigger from the client side is when the framed transport is 
>> flushed.
>>                conn.getAPI().batch_mutate(convertedBatch, cLevel);
>>                 // Flush connection
>>                 conn.flush();
>>
>> 2) In CustomTThreadPoolServer.java in Cassandra I modified the code 
>> to log TTransportExceptions.
>>
>>         catch (TTransportException ttx) {
>>         LOGGER.error("Transport exception", ttx);
>>     } catch (TException tx) {
>>         LOGGER.error("Thrift error occurred during processing of 
>> message.", tx);
>>     } catch (Exception x) {
>>         LOGGER.error("Error occurred during processing of message.", x);
>>     }
>>
>>
>> 3) Here is the exception that is ignored in cassandra.   Flushing the 
>> transport causes the server to believe the client has closed the 
>> connection.
>>
>> org.apache.thrift.transport.TTransportException: Cannot read. Remote 
>> side has closed. Tried to read 4 bytes, but only got 0 bytes.
>>     at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>>     at 
>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:369)
>>     at 
>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:295)
>>     at 
>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:202)
>>     at 
>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2487)
>>     at 
>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>     at java.lang.Thread.run(Thread.java:637)
>>
>> 4) The next batch mutate to this connection caused the exception in 
>> the client
>>
>>  WARN [main] 2010-08-31 18:40:06,749 Operand.java (line 72) Operation 
>> failed as result of network exception. Connection must be destroyed.  
>> See cause for details...
>> org.apache.thrift.transport.TTransportException: 
>> java.net.SocketException: Connection reset
>>     at 
>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>>     at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>>     at 
>> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
>>     at 
>> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
>>     at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>>     at 
>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:369)
>>     at 
>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:295)
>>     at 
>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:202)
>>     at 
>> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:905)
>>     at 
>> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:889)
>>     at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:42)
>>     at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:38)
>>     at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:53)
>>     at org.scale7.cassandra.pelops.Mutator.execute(Mutator.java:49)
>>     at com.aol.data.c7.App.doWork(App.java:41)
>>     at com.aol.data.c7.App.main(App.java:77)
>> Caused by: java.net.SocketException: Connection reset
>>     at java.net.SocketInputStream.read(SocketInputStream.java:168)
>>     at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>>     at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>>     at 
>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>>     ... 15 more
>>
>>
>>
>>
>> On 8/31/10 4:04 PM, Jonathan Ellis wrote:
>>>
>>> No, I don't know that anyone has reproduced that.  TTransportException
>>> always means "something went wrong on the thrift side" in my
>>> experience, it shouldn't be cassandra-version specific.
>>>
>>> On Tue, Aug 31, 2010 at 12:53 PM, Carl Bruecken
>>> <ca...@corp.aol.com> wrote:
>>> >
>>> >  Are there any estimates as to when a fix for this will be checked 
>>> into
>>> > trunk?
>>> >
>>> > Coincidentally, has anyone tracked down the issue?
>>> >
>>> >  I'm experiencing same issue with nightly build from a week ago.
>>> >
>>> > Thank You
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra support
>>> http://riptano.com
>>>
>>
>

-- 
*Andres March*
amarch@qualcomm.com <ma...@qualcomm.com>
Qualcomm Internet Services

Re: TTransportException intermittently in 0.7

Posted by Carl Bruecken <ca...@corp.aol.com>.

  I believe the problem here is with Pelops.   The batch_mutate does a 
flush and the pelops flush is not required.   Removing the flush from 
pelops fixes the issue.

By chance is anyone that has seen this error using Pelops?


On 8/31/10 6:54 PM, Carl Bruecken wrote:
> I've made some progress on narrowing this down and am able to 
> reproduce easily.   I am using pelops as a client and I configured the 
> policy in pelops to only establish 1 connection to a cassandra node.  
> I'm able to step through the pelops code line by line and see the 
> resulting thrift transport logging in cassandra.   Seems that flushing 
> the transport causes the unwanted TTransportConnection in the server 
> and subsequent closing of the connection.   The connection should stay 
> open after flushing.   When there are many connection established the 
> behaviour seems intermittent and many operations succeed.
>
>
>
> Here are the details
>
> 1) The trigger from the client side is when the framed transport is 
> flushed.
>                conn.getAPI().batch_mutate(convertedBatch, cLevel);
>                 // Flush connection
>                 conn.flush();
>
> 2) In CustomTThreadPoolServer.java in Cassandra I modified the code to 
> log TTransportExceptions.
>
>         catch (TTransportException ttx) {
>         LOGGER.error("Transport exception", ttx);
>     } catch (TException tx) {
>         LOGGER.error("Thrift error occurred during processing of 
> message.", tx);
>     } catch (Exception x) {
>         LOGGER.error("Error occurred during processing of message.", x);
>     }
>
>
> 3) Here is the exception that is ignored in cassandra.   Flushing the 
> transport causes the server to believe the client has closed the 
> connection.
>
> org.apache.thrift.transport.TTransportException: Cannot read. Remote 
> side has closed. Tried to read 4 bytes, but only got 0 bytes.
>     at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:369)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:295)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:202)
>     at 
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2487)
>     at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:637)
>
> 4) The next batch mutate to this connection caused the exception in 
> the client
>
>  WARN [main] 2010-08-31 18:40:06,749 Operand.java (line 72) Operation 
> failed as result of network exception. Connection must be destroyed.  
> See cause for details...
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketException: Connection reset
>     at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>     at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>     at 
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
>     at 
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
>     at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:369)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:295)
>     at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:202)
>     at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:905)
>     at 
> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:889)
>     at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:42)
>     at org.scale7.cassandra.pelops.Mutator$1.execute(Mutator.java:38)
>     at org.scale7.cassandra.pelops.Operand.tryOperation(Operand.java:53)
>     at org.scale7.cassandra.pelops.Mutator.execute(Mutator.java:49)
>     at com.aol.data.c7.App.doWork(App.java:41)
>     at com.aol.data.c7.App.main(App.java:77)
> Caused by: java.net.SocketException: Connection reset
>     at java.net.SocketInputStream.read(SocketInputStream.java:168)
>     at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>     at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>     ... 15 more
>
>
>
>
> On 8/31/10 4:04 PM, Jonathan Ellis wrote:
>>
>> No, I don't know that anyone has reproduced that.  TTransportException
>> always means "something went wrong on the thrift side" in my
>> experience, it shouldn't be cassandra-version specific.
>>
>> On Tue, Aug 31, 2010 at 12:53 PM, Carl Bruecken
>> <ca...@corp.aol.com> wrote:
>> >
>> >  Are there any estimates as to when a fix for this will be checked into
>> > trunk?
>> >
>> > Coincidentally, has anyone tracked down the issue?
>> >
>> >  I'm experiencing same issue with nightly build from a week ago.
>> >
>> > Thank You
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>