You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Dominic Williams (Created) (JIRA)" <ji...@apache.org> on 2012/02/14 11:29:59 UTC

[jira] [Created] (CASSANDRA-3908) Bootstrapping node stalls. Bootstrapper thinks it is still streaming some sstables. The source nodes do not. Caused by IllegalStateException on source nodes.

Bootstrapping node stalls. Bootstrapper thinks it is still streaming some sstables. The source nodes do not. Caused by IllegalStateException on source nodes.
-------------------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: CASSANDRA-3908
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3908
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.0.7
         Environment: Ubuntu
            Reporter: Dominic Williams
             Fix For: 1.0.8


This problem looks like 2792

I am bootstrapping a new node into my cluster.

There are two keyspaces FightMyMonster and FMM_Studio. The first keyspace successfully streams and the whole operation is probably at 99%+ when it stalls on some sstables in the much smaller FMM_Studio keyspace.

Netstats on the bootstrapping node reports it is still streaming:

root:/var/lib/cassandra/data# nodetool -h localhost netstats
Mode: JOINING
Not sending any streams.
Streaming from: /192.168.1.9
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%
Streaming from: /192.168.1.4
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-201-Data.db sections=1 progress=0/50172 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-199-Data.db sections=1 progress=0/5140877 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-202-Data.db sections=1 progress=0/147346 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/Studio-hc-86-Data.db sections=1 progress=0/2014 - 0%
Pool Name                    Active   Pending      Completed
Commands                        n/a         0            478
Responses                       n/a         0         496302

However, running netstats on the source nodes reports they are not streaming:

root:~# nodetool -h localhost netstats
Mode: NORMAL
 Nothing streaming to /192.168.1.11
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0       13291116
Responses                       n/a         0        8334754

Examination of the logs on the source nodes show an IllegalStateException that has likely interrupted/broken the streaming process.

ERROR [MiscStage:1] 2012-02-14 01:41:27,235 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
        at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
        at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
ERROR [MiscStage:1] 2012-02-14 01:41:27,285 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
        at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
        at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)


There are two problems:

(1) the source nodes should report to the bootstrapping node that there has been a problem so that it may retry or giveup

(2) the IllegalStateException that breaks streaming



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3908) Bootstrapping node stalls. Bootstrapper thinks it is still streaming some sstables. The source nodes do not. Caused by IllegalStateException on source nodes.

Posted by "Yuki Morishita (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212722#comment-13212722 ] 

Yuki Morishita commented on CASSANDRA-3908:
-------------------------------------------

Dominic,

Can you reproduce this with DEBUG logging?
When this happened, did bootstrapping node receive any data? (Are sstable files created inside your data directory?)
                
> Bootstrapping node stalls. Bootstrapper thinks it is still streaming some sstables. The source nodes do not. Caused by IllegalStateException on source nodes.
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3908
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3908
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>         Environment: Ubuntu
>            Reporter: Dominic Williams
>              Labels: bootstrap, repair, streaming, streams
>             Fix For: 1.0.8
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> This problem looks like 2792
> I am bootstrapping a new node into my cluster.
> There are two keyspaces FightMyMonster and FMM_Studio. The first keyspace successfully streams and the whole operation is probably at 99%+ when it stalls on some sstables in the much smaller FMM_Studio keyspace.
> Netstats on the bootstrapping node reports it is still streaming:
> root:/var/lib/cassandra/data# nodetool -h localhost netstats
> Mode: JOINING
> Not sending any streams.
> Streaming from: /192.168.1.9
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%
> Streaming from: /192.168.1.4
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-201-Data.db sections=1 progress=0/50172 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-199-Data.db sections=1 progress=0/5140877 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-202-Data.db sections=1 progress=0/147346 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/Studio-hc-86-Data.db sections=1 progress=0/2014 - 0%
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0            478
> Responses                       n/a         0         496302
> However, running netstats on the source nodes reports they are not streaming:
> root:~# nodetool -h localhost netstats
> Mode: NORMAL
>  Nothing streaming to /192.168.1.11
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0       13291116
> Responses                       n/a         0        8334754
> Examination of the logs on the source nodes does NOT show an error for the specific sstables that are stalled. The starting of streaming is duly logged:
> pStage:1] 2012-02-14 01:40:58,746 Gossiper.java (line 804) InetAddress /192.168.1.11 is now UP
>  INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 114) Beginning transfer to /192.168.1.11
>  INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 95) Flushing memtables for [CFS(Keyspace='FMM_Studio', ColumnFamily='Classes'), CFS(Keyspace='FMM_Studio', ColumnFamily='Part
> sData'), CFS(Keyspace='FMM_Studio', ColumnFamily='Studio'), CFS(Keyspace='FMM_Studio', ColumnFamily='AuthorClasses')]...
>  INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOut.java (line 160) Stream context metadata [/var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db sections=1 progress=0/2460670 - 0%, /
> var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%, /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%, /
> var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%], 6 sstables.
>  INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOutSession.java (line 203) Streaming to /192.168.1.11
>  INFO [StreamStage:1] 2012-02-14 01:41:26,835 StreamOut.java (line 114) Beginning transfer to /192.168.1.11
> There does however appear to have been an IllegalStateException for another sstable in this keyspace (which occurs a second or so after streaming has begun). Perhaps this broke the streaming...
> ERROR [MiscStage:1] 2012-02-14 01:41:27,235 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
> java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
>         at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
>         at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ERROR [MiscStage:1] 2012-02-14 01:41:27,285 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
> java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
>         at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
>         at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3908) Bootstrapping node stalls. Bootstrapper thinks it is still streaming some sstables. The source nodes do not. Caused by IllegalStateException on source nodes.

Posted by "Dominic Williams (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212732#comment-13212732 ] 

Dominic Williams commented on CASSANDRA-3908:
---------------------------------------------

We will bootstrap some more nodes soon and check details. In the end we solved the problem by simply restarting the bootstrapper, which made it join, and then running repair on problem keyspace. Aaron Morton performed procedure so feel free to ask him if he knows any more.
                
> Bootstrapping node stalls. Bootstrapper thinks it is still streaming some sstables. The source nodes do not. Caused by IllegalStateException on source nodes.
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3908
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3908
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>         Environment: Ubuntu
>            Reporter: Dominic Williams
>              Labels: bootstrap, repair, streaming, streams
>             Fix For: 1.0.8
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> This problem looks like 2792
> I am bootstrapping a new node into my cluster.
> There are two keyspaces FightMyMonster and FMM_Studio. The first keyspace successfully streams and the whole operation is probably at 99%+ when it stalls on some sstables in the much smaller FMM_Studio keyspace.
> Netstats on the bootstrapping node reports it is still streaming:
> root:/var/lib/cassandra/data# nodetool -h localhost netstats
> Mode: JOINING
> Not sending any streams.
> Streaming from: /192.168.1.9
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%
> Streaming from: /192.168.1.4
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-201-Data.db sections=1 progress=0/50172 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-199-Data.db sections=1 progress=0/5140877 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-202-Data.db sections=1 progress=0/147346 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/Studio-hc-86-Data.db sections=1 progress=0/2014 - 0%
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0            478
> Responses                       n/a         0         496302
> However, running netstats on the source nodes reports they are not streaming:
> root:~# nodetool -h localhost netstats
> Mode: NORMAL
>  Nothing streaming to /192.168.1.11
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0       13291116
> Responses                       n/a         0        8334754
> Examination of the logs on the source nodes does NOT show an error for the specific sstables that are stalled. The starting of streaming is duly logged:
> pStage:1] 2012-02-14 01:40:58,746 Gossiper.java (line 804) InetAddress /192.168.1.11 is now UP
>  INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 114) Beginning transfer to /192.168.1.11
>  INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 95) Flushing memtables for [CFS(Keyspace='FMM_Studio', ColumnFamily='Classes'), CFS(Keyspace='FMM_Studio', ColumnFamily='Part
> sData'), CFS(Keyspace='FMM_Studio', ColumnFamily='Studio'), CFS(Keyspace='FMM_Studio', ColumnFamily='AuthorClasses')]...
>  INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOut.java (line 160) Stream context metadata [/var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db sections=1 progress=0/2460670 - 0%, /
> var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%, /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%, /
> var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%], 6 sstables.
>  INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOutSession.java (line 203) Streaming to /192.168.1.11
>  INFO [StreamStage:1] 2012-02-14 01:41:26,835 StreamOut.java (line 114) Beginning transfer to /192.168.1.11
> There does however appear to have been an IllegalStateException for another sstable in this keyspace (which occurs a second or so after streaming has begun). Perhaps this broke the streaming...
> ERROR [MiscStage:1] 2012-02-14 01:41:27,235 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
> java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
>         at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
>         at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ERROR [MiscStage:1] 2012-02-14 01:41:27,285 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
> java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
>         at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
>         at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3908) Bootstrapping node stalls. Bootstrapper thinks it is still streaming some sstables. The source nodes do not. Caused by IllegalStateException on source nodes.

Posted by "Dominic Williams (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dominic Williams updated CASSANDRA-3908:
----------------------------------------

    Description: 
This problem looks like 2792

I am bootstrapping a new node into my cluster.

There are two keyspaces FightMyMonster and FMM_Studio. The first keyspace successfully streams and the whole operation is probably at 99%+ when it stalls on some sstables in the much smaller FMM_Studio keyspace.

Netstats on the bootstrapping node reports it is still streaming:

root:/var/lib/cassandra/data# nodetool -h localhost netstats
Mode: JOINING
Not sending any streams.
Streaming from: /192.168.1.9
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%
Streaming from: /192.168.1.4
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-201-Data.db sections=1 progress=0/50172 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-199-Data.db sections=1 progress=0/5140877 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-202-Data.db sections=1 progress=0/147346 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/Studio-hc-86-Data.db sections=1 progress=0/2014 - 0%
Pool Name                    Active   Pending      Completed
Commands                        n/a         0            478
Responses                       n/a         0         496302

However, running netstats on the source nodes reports they are not streaming:

root:~# nodetool -h localhost netstats
Mode: NORMAL
 Nothing streaming to /192.168.1.11
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0       13291116
Responses                       n/a         0        8334754

Examination of the logs on the source nodes does NOT show an error for the specific sstables that are stalled. The starting of streaming is duly logged:

pStage:1] 2012-02-14 01:40:58,746 Gossiper.java (line 804) InetAddress /192.168.1.11 is now UP
 INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 114) Beginning transfer to /192.168.1.11
 INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 95) Flushing memtables for [CFS(Keyspace='FMM_Studio', ColumnFamily='Classes'), CFS(Keyspace='FMM_Studio', ColumnFamily='Part
sData'), CFS(Keyspace='FMM_Studio', ColumnFamily='Studio'), CFS(Keyspace='FMM_Studio', ColumnFamily='AuthorClasses')]...
 INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOut.java (line 160) Stream context metadata [/var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db sections=1 progress=0/2460670 - 0%, /
var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%, /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%, /
var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%], 6 sstables.
 INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOutSession.java (line 203) Streaming to /192.168.1.11
 INFO [StreamStage:1] 2012-02-14 01:41:26,835 StreamOut.java (line 114) Beginning transfer to /192.168.1.11

There does however appear to have been an IllegalStateException for another sstable in this keyspace (which occurs a second or so after streaming has begun). Perhaps this broke the streaming...

ERROR [MiscStage:1] 2012-02-14 01:41:27,235 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
        at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
        at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
ERROR [MiscStage:1] 2012-02-14 01:41:27,285 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
        at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
        at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)


  was:
This problem looks like 2792

I am bootstrapping a new node into my cluster.

There are two keyspaces FightMyMonster and FMM_Studio. The first keyspace successfully streams and the whole operation is probably at 99%+ when it stalls on some sstables in the much smaller FMM_Studio keyspace.

Netstats on the bootstrapping node reports it is still streaming:

root:/var/lib/cassandra/data# nodetool -h localhost netstats
Mode: JOINING
Not sending any streams.
Streaming from: /192.168.1.9
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%
Streaming from: /192.168.1.4
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-201-Data.db sections=1 progress=0/50172 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-199-Data.db sections=1 progress=0/5140877 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-202-Data.db sections=1 progress=0/147346 - 0%
   FMM_Studio: /var/lib/cassandra/data/FMM_Studio/Studio-hc-86-Data.db sections=1 progress=0/2014 - 0%
Pool Name                    Active   Pending      Completed
Commands                        n/a         0            478
Responses                       n/a         0         496302

However, running netstats on the source nodes reports they are not streaming:

root:~# nodetool -h localhost netstats
Mode: NORMAL
 Nothing streaming to /192.168.1.11
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0       13291116
Responses                       n/a         0        8334754

Examination of the logs on the source nodes show an IllegalStateException that has likely interrupted/broken the streaming process.

ERROR [MiscStage:1] 2012-02-14 01:41:27,235 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
        at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
        at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
ERROR [MiscStage:1] 2012-02-14 01:41:27,285 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
        at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
        at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)


There are two problems:

(1) the source nodes should report to the bootstrapping node that there has been a problem so that it may retry or giveup

(2) the IllegalStateException that breaks streaming



    
> Bootstrapping node stalls. Bootstrapper thinks it is still streaming some sstables. The source nodes do not. Caused by IllegalStateException on source nodes.
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3908
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3908
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>         Environment: Ubuntu
>            Reporter: Dominic Williams
>              Labels: bootstrap, repair, streaming, streams
>             Fix For: 1.0.8
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> This problem looks like 2792
> I am bootstrapping a new node into my cluster.
> There are two keyspaces FightMyMonster and FMM_Studio. The first keyspace successfully streams and the whole operation is probably at 99%+ when it stalls on some sstables in the much smaller FMM_Studio keyspace.
> Netstats on the bootstrapping node reports it is still streaming:
> root:/var/lib/cassandra/data# nodetool -h localhost netstats
> Mode: JOINING
> Not sending any streams.
> Streaming from: /192.168.1.9
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%
> Streaming from: /192.168.1.4
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-201-Data.db sections=1 progress=0/50172 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-199-Data.db sections=1 progress=0/5140877 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-202-Data.db sections=1 progress=0/147346 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/Studio-hc-86-Data.db sections=1 progress=0/2014 - 0%
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0            478
> Responses                       n/a         0         496302
> However, running netstats on the source nodes reports they are not streaming:
> root:~# nodetool -h localhost netstats
> Mode: NORMAL
>  Nothing streaming to /192.168.1.11
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0       13291116
> Responses                       n/a         0        8334754
> Examination of the logs on the source nodes does NOT show an error for the specific sstables that are stalled. The starting of streaming is duly logged:
> pStage:1] 2012-02-14 01:40:58,746 Gossiper.java (line 804) InetAddress /192.168.1.11 is now UP
>  INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 114) Beginning transfer to /192.168.1.11
>  INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 95) Flushing memtables for [CFS(Keyspace='FMM_Studio', ColumnFamily='Classes'), CFS(Keyspace='FMM_Studio', ColumnFamily='Part
> sData'), CFS(Keyspace='FMM_Studio', ColumnFamily='Studio'), CFS(Keyspace='FMM_Studio', ColumnFamily='AuthorClasses')]...
>  INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOut.java (line 160) Stream context metadata [/var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db sections=1 progress=0/2460670 - 0%, /
> var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%, /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%, /
> var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%], 6 sstables.
>  INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOutSession.java (line 203) Streaming to /192.168.1.11
>  INFO [StreamStage:1] 2012-02-14 01:41:26,835 StreamOut.java (line 114) Beginning transfer to /192.168.1.11
> There does however appear to have been an IllegalStateException for another sstable in this keyspace (which occurs a second or so after streaming has begun). Perhaps this broke the streaming...
> ERROR [MiscStage:1] 2012-02-14 01:41:27,235 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
> java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
>         at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
>         at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ERROR [MiscStage:1] 2012-02-14 01:41:27,285 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
> java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
>         at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
>         at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-3908) Bootstrapping node stalls. Bootstrapper thinks it is still streaming some sstables. The source nodes do not. Caused by IllegalStateException on source nodes.

Posted by "Yuki Morishita (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita resolved CASSANDRA-3908.
---------------------------------------

       Resolution: Cannot Reproduce
    Fix Version/s:     (was: 1.0.10)

I cannot reproduce this myself, but I'm sure we have some problem regarding streaming state handling.
Improved error handling is currently under development for next 1.1 release(CASSANDRA-4051), so closing this as cannot reproduce.
                
> Bootstrapping node stalls. Bootstrapper thinks it is still streaming some sstables. The source nodes do not. Caused by IllegalStateException on source nodes.
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3908
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3908
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.7
>         Environment: Ubuntu
>            Reporter: Dominic Williams
>              Labels: bootstrap, repair, streaming, streams
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> This problem looks like 2792
> I am bootstrapping a new node into my cluster.
> There are two keyspaces FightMyMonster and FMM_Studio. The first keyspace successfully streams and the whole operation is probably at 99%+ when it stalls on some sstables in the much smaller FMM_Studio keyspace.
> Netstats on the bootstrapping node reports it is still streaming:
> root:/var/lib/cassandra/data# nodetool -h localhost netstats
> Mode: JOINING
> Not sending any streams.
> Streaming from: /192.168.1.9
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%
> Streaming from: /192.168.1.4
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-201-Data.db sections=1 progress=0/50172 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-199-Data.db sections=1 progress=0/5140877 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/PartsData-hc-202-Data.db sections=1 progress=0/147346 - 0%
>    FMM_Studio: /var/lib/cassandra/data/FMM_Studio/Studio-hc-86-Data.db sections=1 progress=0/2014 - 0%
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0            478
> Responses                       n/a         0         496302
> However, running netstats on the source nodes reports they are not streaming:
> root:~# nodetool -h localhost netstats
> Mode: NORMAL
>  Nothing streaming to /192.168.1.11
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0       13291116
> Responses                       n/a         0        8334754
> Examination of the logs on the source nodes does NOT show an error for the specific sstables that are stalled. The starting of streaming is duly logged:
> pStage:1] 2012-02-14 01:40:58,746 Gossiper.java (line 804) InetAddress /192.168.1.11 is now UP
>  INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 114) Beginning transfer to /192.168.1.11
>  INFO [StreamStage:1] 2012-02-14 01:41:26,765 StreamOut.java (line 95) Flushing memtables for [CFS(Keyspace='FMM_Studio', ColumnFamily='Classes'), CFS(Keyspace='FMM_Studio', ColumnFamily='Part
> sData'), CFS(Keyspace='FMM_Studio', ColumnFamily='Studio'), CFS(Keyspace='FMM_Studio', ColumnFamily='AuthorClasses')]...
>  INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOut.java (line 160) Stream context metadata [/var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db sections=1 progress=0/2460670 - 0%, /
> var/lib/cassandra/data/FMM_Studio/PartsData-hc-149-Data.db sections=1 progress=0/6158642 - 0%, /var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-134-Data.db sections=1 progress=0/160 - 0%, /
> var/lib/cassandra/data/FMM_Studio/AuthorClasses-hc-132-Data.db sections=1 progress=0/4422 - 0%], 6 sstables.
>  INFO [StreamStage:1] 2012-02-14 01:41:26,825 StreamOutSession.java (line 203) Streaming to /192.168.1.11
>  INFO [StreamStage:1] 2012-02-14 01:41:26,835 StreamOut.java (line 114) Beginning transfer to /192.168.1.11
> There does however appear to have been an IllegalStateException for another sstable in this keyspace (which occurs a second or so after streaming has begun). Perhaps this broke the streaming...
> ERROR [MiscStage:1] 2012-02-14 01:41:27,235 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
> java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
>         at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
>         at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> ERROR [MiscStage:1] 2012-02-14 01:41:27,285 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MiscStage:1,5,main]
> java.lang.IllegalStateException: target reports current file is /var/lib/cassandra/data/FMM_Studio/Classes-hc-144-Data.db but is null
>         at org.apache.cassandra.streaming.StreamOutSession.validateCurrentFile(StreamOutSession.java:195)
>         at org.apache.cassandra.streaming.StreamReplyVerbHandler.doVerb(StreamReplyVerbHandler.java:58)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira