You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Vijay (Created) (JIRA)" <ji...@apache.org> on 2012/02/02 07:11:53 UTC

[jira] [Created] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Repair Streaming hangs between multiple regions
-----------------------------------------------

                 Key: CASSANDRA-3838
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.0.7
            Reporter: Vijay
            Assignee: Vijay
            Priority: Minor
             Fix For: 1.0.8


Streaming hangs between datacenters, though there might be multiple reasons for this, a simple fix will be to add the Socket timeout so the session can retry.

The following is the netstat of the affected node (the below output remains this way for a very long period).
[test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
Mode: NORMAL
Streaming to: /50.17.92.159
   /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1523325354/2475291786 - 61%
   /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
   /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
   /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
   /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
   /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
   /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%



"Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable [0x000000006be85000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
        at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
        at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
        - locked <0x00000006afea1bd8> (a com.sun.net.ssl.internal.ssl.AppOutputStream)
        at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
        at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
        at com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
        at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
        at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

Streaming from: /46.51.141.51
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db sections=7231 progress=0/1548922508 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db sections=4730 progress=0/296474156 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db sections=7650 progress=0/1580417610 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db sections=7682 progress=0/196689250 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db sections=7149 progress=0/478695185 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db sections=443 progress=0/78417320 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db sections=4590 progress=0/1310718798 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db sections=7876 progress=0/3308781588 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db sections=7386 progress=0/2868167170 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db sections=4618 progress=0/215989758 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1542191546/2475291786 - 62%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db sections=6698 progress=0/2304563183 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
   abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db sections=7386 progress=0/1324787539 - 0%


"Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable [0x000000004251b000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
        at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405)
        at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
        - locked <0x00000005e220a170> (a java.lang.Object)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
        at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
        - locked <0x00000005e220a1b8> (a com.sun.net.ssl.internal.ssl.AppInputStream)
        at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392)
        at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
        at com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
        at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
        at java.io.DataInputStream.readFully(DataInputStream.java:178)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115)
        at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119)
        at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
        at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244)
        at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148)
        at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90)
        at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
        at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200566#comment-13200566 ] 

Vijay commented on CASSANDRA-3838:
----------------------------------

Hi Sylvain,
My observation on this is that... when there is network congestion the Routers will start to drop the packets and which will cause the write on the socket to hang.... Until we write again to the socket we will not know if the socket is closed or not... hence it will be better to have it in both the sides... 

I will add streaming_socket_timeout and add documentation in the next patch... if you are ok with the above Thanks!
                
> Repair Streaming hangs between multiple regions
> -----------------------------------------------
>
>                 Key: CASSANDRA-3838
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0.8
>
>         Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons for this, a simple fix will be to add the Socket timeout so the session can retry.
> The following is the netstat of the affected node (the below output remains this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1523325354/2475291786 - 61%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable [0x000000006be85000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
>         at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
>         - locked <0x00000006afea1bd8> (a com.sun.net.ssl.internal.ssl.AppOutputStream)
>         at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
>         at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
>         at com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
>         at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db sections=7231 progress=0/1548922508 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db sections=4730 progress=0/296474156 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db sections=7650 progress=0/1580417610 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db sections=7682 progress=0/196689250 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db sections=7149 progress=0/478695185 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db sections=443 progress=0/78417320 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db sections=4590 progress=0/1310718798 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db sections=7876 progress=0/3308781588 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db sections=7386 progress=0/2868167170 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db sections=4618 progress=0/215989758 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1542191546/2475291786 - 62%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db sections=6698 progress=0/2304563183 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db sections=7386 progress=0/1324787539 - 0%
> "Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable [0x000000004251b000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405)
>         at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
>         - locked <0x00000005e220a170> (a java.lang.Object)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>         at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
>         - locked <0x00000005e220a1b8> (a com.sun.net.ssl.internal.ssl.AppInputStream)
>         at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392)
>         at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
>         at com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
>         at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244)
>         at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148)
>         at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90)
>         at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
>         at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-3838:
-----------------------------

    Attachment: 0001-CASSANDRA-3838.patch
    
> Repair Streaming hangs between multiple regions
> -----------------------------------------------
>
>                 Key: CASSANDRA-3838
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0.8
>
>         Attachments: 0001-Add-streaming-socket-timeouts.patch, 0001-CASSANDRA-3838.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons for this, a simple fix will be to add the Socket timeout so the session can retry.
> The following is the netstat of the affected node (the below output remains this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1523325354/2475291786 - 61%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable [0x000000006be85000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
>         at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
>         - locked <0x00000006afea1bd8> (a com.sun.net.ssl.internal.ssl.AppOutputStream)
>         at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
>         at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
>         at com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
>         at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db sections=7231 progress=0/1548922508 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db sections=4730 progress=0/296474156 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db sections=7650 progress=0/1580417610 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db sections=7682 progress=0/196689250 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db sections=7149 progress=0/478695185 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db sections=443 progress=0/78417320 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db sections=4590 progress=0/1310718798 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db sections=7876 progress=0/3308781588 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db sections=7386 progress=0/2868167170 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db sections=4618 progress=0/215989758 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1542191546/2475291786 - 62%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db sections=6698 progress=0/2304563183 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db sections=7386 progress=0/1324787539 - 0%
> "Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable [0x000000004251b000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405)
>         at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
>         - locked <0x00000005e220a170> (a java.lang.Object)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>         at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
>         - locked <0x00000005e220a1b8> (a com.sun.net.ssl.internal.ssl.AppInputStream)
>         at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392)
>         at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
>         at com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
>         at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244)
>         at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148)
>         at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90)
>         at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
>         at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200612#comment-13200612 ] 

Peter Schuller commented on CASSANDRA-3838:
-------------------------------------------

Vijay, I do believe though that if you don't care about having to wait for a few hours for streams to abort, simply setting keep alive is the easiest and least-likely-to-have-negative-side-effects fix to your problem of inter-dc streams.

                
> Repair Streaming hangs between multiple regions
> -----------------------------------------------
>
>                 Key: CASSANDRA-3838
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0.8
>
>         Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons for this, a simple fix will be to add the Socket timeout so the session can retry.
> The following is the netstat of the affected node (the below output remains this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1523325354/2475291786 - 61%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable [0x000000006be85000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
>         at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
>         - locked <0x00000006afea1bd8> (a com.sun.net.ssl.internal.ssl.AppOutputStream)
>         at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
>         at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
>         at com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
>         at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db sections=7231 progress=0/1548922508 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db sections=4730 progress=0/296474156 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db sections=7650 progress=0/1580417610 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db sections=7682 progress=0/196689250 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db sections=7149 progress=0/478695185 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db sections=443 progress=0/78417320 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db sections=4590 progress=0/1310718798 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db sections=7876 progress=0/3308781588 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db sections=7386 progress=0/2868167170 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db sections=4618 progress=0/215989758 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1542191546/2475291786 - 62%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db sections=6698 progress=0/2304563183 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db sections=7386 progress=0/1324787539 - 0%
> "Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable [0x000000004251b000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405)
>         at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
>         - locked <0x00000005e220a170> (a java.lang.Object)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>         at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
>         - locked <0x00000005e220a1b8> (a com.sun.net.ssl.internal.ssl.AppInputStream)
>         at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392)
>         at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
>         at com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
>         at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244)
>         at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148)
>         at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90)
>         at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
>         at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200617#comment-13200617 ] 

Peter Schuller commented on CASSANDRA-3838:
-------------------------------------------

Let me be more clear about why keep-alive is better.

TCP keep-alive is at the transport level, and thus independent of in-band data (or lack thereof). Imagine that you're implementing a remote procedure call protocol where the client sends:

{code}
INVOKE name-of-process arg1 arg2
{code}

The server invokes the method, and responds:

{code}
RET success|failure exit-value|exception
{code}

The first thing you need if you are using this in some kind of production scenario, is to ensure that requests can time out. But there is a problem. Suppose you're making the assumption that this software is running on well-connected networks and a high number of requests per second; there is no reason to not quickly time out requests if the remote host is unreachable. So you set a socket timeout to 1 second. The only problem is that it will also time out on all requests that take longer than 1 second because the method call legitimately took longer.

The conflict happens because the selection of timeout was made based on the transport level circumstances (fast local network, high throughput, no need to wait if a host is down) while the effect of the timeout is at the in-band data level and is thus triggered by a slow request.

One way to fix this is to extend the protocol between client and server such that they can constantly be exchanging PING/PONG type messages (witness IRC for an example of this). This allows you to utilize socket (or read/write op) timeouts to detect a broken transport, under the assumption/premise that both sides have dedicated code for the ping/pong stuff which is independent of any delay in processing the otherwise in-band data.

Disadvantages of this approach can include the need to actually change the protocol, and (depending on implementation) additional implementation complexity as you suddenly need to actively model the transport as such.

TCP keep-alive is a way to let the kernel/tcp, which is already supposed to support this, deal with this without adding complexity to the application. It allows what effectively boils down to a "timeout" at the transport level which can be selected based on use-case and expected networking characteristics, and is independent of the nature of the in-band data sent over that transport.

In the Cassandra case, the equivalent of the slow RPC call might be that a write() during streaming blocks for 5 seconds because socket buffers on both ends are full, and the other end is going a GC or waiting on an fsync().

By using keep-alives we get more "correct" behavior in that such blocks won't cause connection tear-downs, while at the same time not having to change the protocol and/or add complexity to the code base to implement a protocol-within-tcp in which to mux the actual payload for streaming.

                
> Repair Streaming hangs between multiple regions
> -----------------------------------------------
>
>                 Key: CASSANDRA-3838
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0.8
>
>         Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons for this, a simple fix will be to add the Socket timeout so the session can retry.
> The following is the netstat of the affected node (the below output remains this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1523325354/2475291786 - 61%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable [0x000000006be85000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
>         at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
>         - locked <0x00000006afea1bd8> (a com.sun.net.ssl.internal.ssl.AppOutputStream)
>         at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
>         at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
>         at com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
>         at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db sections=7231 progress=0/1548922508 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db sections=4730 progress=0/296474156 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db sections=7650 progress=0/1580417610 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db sections=7682 progress=0/196689250 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db sections=7149 progress=0/478695185 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db sections=443 progress=0/78417320 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db sections=4590 progress=0/1310718798 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db sections=7876 progress=0/3308781588 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db sections=7386 progress=0/2868167170 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db sections=4618 progress=0/215989758 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1542191546/2475291786 - 62%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db sections=6698 progress=0/2304563183 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db sections=7386 progress=0/1324787539 - 0%
> "Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable [0x000000004251b000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405)
>         at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
>         - locked <0x00000005e220a170> (a java.lang.Object)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>         at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
>         - locked <0x00000005e220a1b8> (a com.sun.net.ssl.internal.ssl.AppInputStream)
>         at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392)
>         at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
>         at com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
>         at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244)
>         at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148)
>         at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90)
>         at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
>         at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200477#comment-13200477 ] 

Sylvain Lebresne commented on CASSANDRA-3838:
---------------------------------------------

Is there any usefulness to set the SO_TIMEOUT on the socket that is writing?

I also wonder if we really should reuse the rpc timeout for this (and my initial intuition is that we probably shouldn't). As far as I'm concerned, I'm fine adding a new streaming_socket_timeout option for this (we don't even have to document it in the yaml if we consider it's an advanced thing).
                
> Repair Streaming hangs between multiple regions
> -----------------------------------------------
>
>                 Key: CASSANDRA-3838
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0.8
>
>         Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons for this, a simple fix will be to add the Socket timeout so the session can retry.
> The following is the netstat of the affected node (the below output remains this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1523325354/2475291786 - 61%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable [0x000000006be85000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
>         at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
>         - locked <0x00000006afea1bd8> (a com.sun.net.ssl.internal.ssl.AppOutputStream)
>         at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
>         at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
>         at com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
>         at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db sections=7231 progress=0/1548922508 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db sections=4730 progress=0/296474156 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db sections=7650 progress=0/1580417610 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db sections=7682 progress=0/196689250 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db sections=7149 progress=0/478695185 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db sections=443 progress=0/78417320 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db sections=4590 progress=0/1310718798 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db sections=7876 progress=0/3308781588 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db sections=7386 progress=0/2868167170 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db sections=4618 progress=0/215989758 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1542191546/2475291786 - 62%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db sections=6698 progress=0/2304563183 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db sections=7386 progress=0/1324787539 - 0%
> "Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable [0x000000004251b000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405)
>         at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
>         - locked <0x00000005e220a170> (a java.lang.Object)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>         at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
>         - locked <0x00000005e220a1b8> (a com.sun.net.ssl.internal.ssl.AppInputStream)
>         at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392)
>         at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
>         at com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
>         at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244)
>         at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148)
>         at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90)
>         at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
>         at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-3838:
-----------------------------

    Attachment: 0001-Add-streaming-socket-timeouts.patch
    
> Repair Streaming hangs between multiple regions
> -----------------------------------------------
>
>                 Key: CASSANDRA-3838
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0.8
>
>         Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons for this, a simple fix will be to add the Socket timeout so the session can retry.
> The following is the netstat of the affected node (the below output remains this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1523325354/2475291786 - 61%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable [0x000000006be85000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
>         at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
>         - locked <0x00000006afea1bd8> (a com.sun.net.ssl.internal.ssl.AppOutputStream)
>         at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
>         at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
>         at com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
>         at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db sections=7231 progress=0/1548922508 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db sections=4730 progress=0/296474156 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db sections=7650 progress=0/1580417610 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db sections=7682 progress=0/196689250 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db sections=7149 progress=0/478695185 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db sections=443 progress=0/78417320 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db sections=4590 progress=0/1310718798 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db sections=7876 progress=0/3308781588 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db sections=7386 progress=0/2868167170 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db sections=4618 progress=0/215989758 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1542191546/2475291786 - 62%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db sections=6698 progress=0/2304563183 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db sections=7386 progress=0/1324787539 - 0%
> "Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable [0x000000004251b000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405)
>         at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
>         - locked <0x00000005e220a170> (a java.lang.Object)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>         at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
>         - locked <0x00000005e220a1b8> (a com.sun.net.ssl.internal.ssl.AppInputStream)
>         at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392)
>         at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
>         at com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
>         at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244)
>         at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148)
>         at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90)
>         at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
>         at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200610#comment-13200610 ] 

Peter Schuller commented on CASSANDRA-3838:
-------------------------------------------

Note that simply adding a socket timeout is not a good idea unless both sides are truly expected to never starve (this is why I didn't suggest it for CASSANDRA-3569, and why TCP keep-alive is the "correct" solution because it does not generate spurious timeouts by lack of in-band data on the channel - but as noted in that ticket, the practical reality is that we don't control keep alive parameters on a per-socket basis).

For example if one of the ends is waiting for a few seconds for a particularly expensive fsync(), or waiting for some kind of lock, you'd have spurious failures (whereas this is not the case for keep-alive, because the transport is alive and kicking at the kernel level). Depending on surrounding logic, it could be dangerous if it causes the receiver to believe it received the file while the sender believes it doesn't (e.g. multiple streaming -> disk space explosion).

I would suggest TCP keep-alive for the reasons mentioned here and discussed in CASSANDRA-3569, and suggest that the TCP keep-alive settings be tweaked to fail quicker if that is desired.

If adding a socket timeout, thought needs to go into what kind of false failure cases will be created. If both ends are truly expected not to block on anything like compaction locks or whatever else there might be, it might be okay.

In either case, definitely *don't* use rpc timeout IMO; the concerns are completely different. A low-timeout cluster with an rpc timeout of 0.5 seconds for example would be extremely sensitive to even the slightest hiccup (such as waitnig 1 second for an fsync(), or a GC pause, etc) and it would truly be useless and extremely damaging to kill streams for that.

In general, as with CASSANDRA-3569, I strongly argue that streaming should not be caused to spuriously fail because the impact of that can be huge, particularly on clusters with large nodes.

As for reads vs. writes: You definitely want timeouts on both sides in order to guarantee that you never hang under any circumstance regardless of the nature of the TCP connection loss, unless you have some other method to accomplish the same thing.

If this (socket timeouts) does go in, I argue even more strongly than before that the tear-down of streams due to failure detector as in CASSANDRA-3569 is truly just negative rather than positive (but as noted in that ticket, not hanging forever on repairs and such remains a concern).

                
> Repair Streaming hangs between multiple regions
> -----------------------------------------------
>
>                 Key: CASSANDRA-3838
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0.8
>
>         Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons for this, a simple fix will be to add the Socket timeout so the session can retry.
> The following is the netstat of the affected node (the below output remains this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1523325354/2475291786 - 61%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable [0x000000006be85000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
>         at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
>         - locked <0x00000006afea1bd8> (a com.sun.net.ssl.internal.ssl.AppOutputStream)
>         at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
>         at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
>         at com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
>         at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db sections=7231 progress=0/1548922508 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db sections=4730 progress=0/296474156 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db sections=7650 progress=0/1580417610 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db sections=7682 progress=0/196689250 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db sections=7149 progress=0/478695185 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db sections=443 progress=0/78417320 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db sections=4590 progress=0/1310718798 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db sections=7876 progress=0/3308781588 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db sections=7386 progress=0/2868167170 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db sections=4618 progress=0/215989758 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1542191546/2475291786 - 62%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db sections=6698 progress=0/2304563183 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db sections=7386 progress=0/1324787539 - 0%
> "Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable [0x000000004251b000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405)
>         at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
>         - locked <0x00000005e220a170> (a java.lang.Object)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>         at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
>         - locked <0x00000005e220a1b8> (a com.sun.net.ssl.internal.ssl.AppInputStream)
>         at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392)
>         at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
>         at com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
>         at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244)
>         at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148)
>         at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90)
>         at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
>         at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200841#comment-13200841 ] 

Peter Schuller commented on CASSANDRA-3838:
-------------------------------------------

{quote}
So given that for keep alive that would involve potential non-portable code and such, let's keep that for later unless someone is willing to actually write that keep-alive patch in a timely fashion.
{quote}

I have nothing against that at all, to be clear. I didn't mean to sound like I was arguing against this going in. I am very much for it, +1. In fact I'd rather have this just be turned on by default (with a reasonably high default timeout) than having FD convictions kill streams. So no argument here.

Timeouts like these can be exchanged for keep-alive:s in the future without really affecting the surrounding logic, if that feature becomes available.
                
> Repair Streaming hangs between multiple regions
> -----------------------------------------------
>
>                 Key: CASSANDRA-3838
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0.8
>
>         Attachments: 0001-Add-streaming-socket-timeouts.patch, 0001-CASSANDRA-3838-v2.patch, 0001-CASSANDRA-3838.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons for this, a simple fix will be to add the Socket timeout so the session can retry.
> The following is the netstat of the affected node (the below output remains this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1523325354/2475291786 - 61%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable [0x000000006be85000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
>         at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
>         - locked <0x00000006afea1bd8> (a com.sun.net.ssl.internal.ssl.AppOutputStream)
>         at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
>         at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
>         at com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
>         at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db sections=7231 progress=0/1548922508 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db sections=4730 progress=0/296474156 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db sections=7650 progress=0/1580417610 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db sections=7682 progress=0/196689250 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db sections=7149 progress=0/478695185 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db sections=443 progress=0/78417320 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db sections=4590 progress=0/1310718798 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db sections=7876 progress=0/3308781588 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db sections=7386 progress=0/2868167170 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db sections=4618 progress=0/215989758 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1542191546/2475291786 - 62%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db sections=6698 progress=0/2304563183 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db sections=7386 progress=0/1324787539 - 0%
> "Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable [0x000000004251b000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405)
>         at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
>         - locked <0x00000005e220a170> (a java.lang.Object)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>         at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
>         - locked <0x00000005e220a1b8> (a com.sun.net.ssl.internal.ssl.AppInputStream)
>         at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392)
>         at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
>         at com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
>         at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244)
>         at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148)
>         at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90)
>         at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
>         at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200801#comment-13200801 ] 

Sylvain Lebresne commented on CASSANDRA-3838:
---------------------------------------------

The default for this should be to not timeout at all, we should be conservative.

I'm good with keep alive on principle, but I think the actual hitch Vijay is trying to hitch is to not wait hours to retry the repair. So given that for keep alive that would involve potential non-portable code and such, let's keep that for later unless someone is willing to actually write that keep-alive patch in a timely fashion. 
                
> Repair Streaming hangs between multiple regions
> -----------------------------------------------
>
>                 Key: CASSANDRA-3838
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0.8
>
>         Attachments: 0001-Add-streaming-socket-timeouts.patch, 0001-CASSANDRA-3838.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons for this, a simple fix will be to add the Socket timeout so the session can retry.
> The following is the netstat of the affected node (the below output remains this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1523325354/2475291786 - 61%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable [0x000000006be85000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
>         at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
>         - locked <0x00000006afea1bd8> (a com.sun.net.ssl.internal.ssl.AppOutputStream)
>         at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
>         at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
>         at com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
>         at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db sections=7231 progress=0/1548922508 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db sections=4730 progress=0/296474156 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db sections=7650 progress=0/1580417610 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db sections=7682 progress=0/196689250 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db sections=7149 progress=0/478695185 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db sections=443 progress=0/78417320 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db sections=4590 progress=0/1310718798 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db sections=7876 progress=0/3308781588 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db sections=7386 progress=0/2868167170 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db sections=4618 progress=0/215989758 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1542191546/2475291786 - 62%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db sections=6698 progress=0/2304563183 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db sections=7386 progress=0/1324787539 - 0%
> "Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable [0x000000004251b000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405)
>         at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
>         - locked <0x00000005e220a170> (a java.lang.Object)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>         at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
>         - locked <0x00000005e220a1b8> (a com.sun.net.ssl.internal.ssl.AppInputStream)
>         at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392)
>         at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
>         at com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
>         at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244)
>         at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148)
>         at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90)
>         at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
>         at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Posted by "Vijay (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13200619#comment-13200619 ] 

Vijay commented on CASSANDRA-3838:
----------------------------------

>>>> In either case, definitely don't use rpc timeout IMO; the concerns are completely different. A low-timeout cluster with an rpc timeout of 0.5 seconds 
We will add a configuration  streaming_socket_timeout  which will be different than rpc_timeout...  

>>> If this (socket timeouts) does go in, I argue even more strongly than before that the tear-down of streams due to failure detector as in CASSANDRA-3569
I dont have any option on that ticket, but it looks reasonable. I would say so_timeout will be a better solution for streaming as it is not a long lived connections... but i also think Keep alive should be set for the Messaging connection as you mentioned in the other ticket.

>>> I do believe though that if you don't care about having to wait for a few hours for streams to abort
We definitely dont want to wait for hours.... And i dont think we have to wait for hours when we have a better option, even if we set streaming_socket_timeout to 30 Seconds or even a minute.

>>> As for reads vs. writes: You definitely want timeouts on both sides in order to guarantee that you never hang under any circumstance 
Agree, i will get the patch done in few min.
                
> Repair Streaming hangs between multiple regions
> -----------------------------------------------
>
>                 Key: CASSANDRA-3838
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0.8
>
>         Attachments: 0001-Add-streaming-socket-timeouts.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons for this, a simple fix will be to add the Socket timeout so the session can retry.
> The following is the netstat of the affected node (the below output remains this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1523325354/2475291786 - 61%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable [0x000000006be85000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
>         at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
>         - locked <0x00000006afea1bd8> (a com.sun.net.ssl.internal.ssl.AppOutputStream)
>         at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
>         at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
>         at com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
>         at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db sections=7231 progress=0/1548922508 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db sections=4730 progress=0/296474156 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db sections=7650 progress=0/1580417610 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db sections=7682 progress=0/196689250 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db sections=7149 progress=0/478695185 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db sections=443 progress=0/78417320 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db sections=4590 progress=0/1310718798 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db sections=7876 progress=0/3308781588 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db sections=7386 progress=0/2868167170 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db sections=4618 progress=0/215989758 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1542191546/2475291786 - 62%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db sections=6698 progress=0/2304563183 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db sections=7386 progress=0/1324787539 - 0%
> "Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable [0x000000004251b000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405)
>         at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
>         - locked <0x00000005e220a170> (a java.lang.Object)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>         at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
>         - locked <0x00000005e220a1b8> (a com.sun.net.ssl.internal.ssl.AppInputStream)
>         at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392)
>         at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
>         at com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
>         at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244)
>         at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148)
>         at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90)
>         at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
>         at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3838) Repair Streaming hangs between multiple regions

Posted by "Vijay (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-3838:
-----------------------------

    Attachment: 0001-CASSANDRA-3838-v2.patch

Hi Sylvain the default is set to no timeout in the new patch. Thanks!
                
> Repair Streaming hangs between multiple regions
> -----------------------------------------------
>
>                 Key: CASSANDRA-3838
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3838
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.0.8
>
>         Attachments: 0001-Add-streaming-socket-timeouts.patch, 0001-CASSANDRA-3838-v2.patch, 0001-CASSANDRA-3838.patch
>
>
> Streaming hangs between datacenters, though there might be multiple reasons for this, a simple fix will be to add the Socket timeout so the session can retry.
> The following is the netstat of the affected node (the below output remains this way for a very long period).
> [test_abrepairtest@test_abrepair--euwest1c-i-1adfb753 ~]$ nt netstats
> Mode: NORMAL
> Streaming to: /50.17.92.159
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1523325354/2475291786 - 61%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
> "Streaming:1" daemon prio=10 tid=0x00002aaac2060800 nid=0x1676 runnable [0x000000006be85000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(OutputRecord.java:297)
>         at com.sun.net.ssl.internal.ssl.OutputRecord.write(OutputRecord.java:286)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:743)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:731)
>         at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
>         - locked <0x00000006afea1bd8> (a com.sun.net.ssl.internal.ssl.AppOutputStream)
>         at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
>         at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
>         at com.ning.compress.lzf.LZFOutputStream.flush(LZFOutputStream.java:117)
>         at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:152)
>         at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Streaming from: /46.51.141.51
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2241-Data.db sections=7231 progress=0/1548922508 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2231-Data.db sections=4730 progress=0/296474156 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2244-Data.db sections=7650 progress=0/1580417610 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2217-Data.db sections=7682 progress=0/196689250 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2220-Data.db sections=7149 progress=0/478695185 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2171-Data.db sections=443 progress=0/78417320 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2235-Data.db sections=6631 progress=0/2270344837 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2222-Data.db sections=4590 progress=0/1310718798 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2233-Data.db sections=4581 progress=0/595026085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-g-2226-Data.db sections=7682 progress=0/2933920085 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2213-Data.db sections=7876 progress=0/3308781588 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2216-Data.db sections=7386 progress=0/2868167170 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2240-Data.db sections=7874 progress=0/587439833 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2254-Data.db sections=4618 progress=0/215989758 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2221-Data.db sections=7002 progress=1542191546/2475291786 - 62%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2239-Data.db sections=6266 progress=0/2190197091 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2210-Data.db sections=6698 progress=0/2304563183 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2230-Data.db sections=7662 progress=0/3082087770 - 0%
>    abtests: /mnt/data/cassandra070/data/abtests/cust_allocs-hc-2229-Data.db sections=7386 progress=0/1324787539 - 0%
> "Thread-198896" prio=10 tid=0x00002aaac0e00800 nid=0x4710 runnable [0x000000004251b000]
>    java.lang.Thread.State: RUNNABLE
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
>         at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(InputRecord.java:405)
>         at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:360)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:798)
>         - locked <0x00000005e220a170> (a java.lang.Object)
>         at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:755)
>         at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
>         - locked <0x00000005e220a1b8> (a com.sun.net.ssl.internal.ssl.AppInputStream)
>         at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:392)
>         at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190)
>         at com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254)
>         at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129)
>         at java.io.DataInputStream.readFully(DataInputStream.java:178)
>         at java.io.DataInputStream.readLong(DataInputStream.java:399)
>         at org.apache.cassandra.utils.BytesReadTracker.readLong(BytesReadTracker.java:115)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:119)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:244)
>         at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:148)
>         at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:90)
>         at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
>         at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira