You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Gurpreet Singh <gu...@gmail.com> on 2010/09/14 18:16:54 UTC

Bootstrapping stays stuck

Hi,
I have a cassandra cluster of 4 machines, and I am trying to bootstrap 2
more machines, one at a time.
For both these machines, the bootstrapping stays stuck after the streaming
is done.

When the nodes come up for bootstrapping, I see all the relevant messages
about getting a new token, assuming load from a particular host. I see a
couple of nodes anticompacting data to send, and at a later point the node
that is bootstrapping prints the right streaming mesgs. However, once the
streaming is over, the node just doesnt do anything. Previously while
bootstrapping, I have seen that after the streaming is done, the node
restarts and becomes part of the ring by itself. I dont see this behaviour
with both the nodes I tried today.
I even restarted all the nodes in the cluster, and tried bootstrapping one
of the nodes again, but it again was stuck after streaming. It seems to have
copied the relevant load as well.
Any ideas as to what could be going on here?

/G

Re: Bootstrapping stays stuck

Posted by Brandon Williams <dr...@gmail.com>.
On Thu, Sep 16, 2010 at 4:59 PM, Gurpreet Singh <gu...@gmail.com>wrote:

> Thanks to driftx from cassandra IRC channel for helping out.
>

That's me. :)


> This was resolved by increasing the rpc timeout for the bootstrap process.
>

I suspect that this is a bug, because changing the rpctimeout to bootstrap
shouldn't be necessary.  However, I've been unable to replicate it, even
with much more data than you had.  If anyone else is observing this
behavior, please let me know.

-Brandon

Re: Bootstrapping stays stuck

Posted by Gurpreet Singh <gu...@gmail.com>.
Thanks to driftx from cassandra IRC channel for helping out.
This was resolved by increasing the rpc timeout for the bootstrap process.

On Wed, Sep 15, 2010 at 11:43 AM, Gurpreet Singh
<gu...@gmail.com>wrote:

> This problem still stays unresolved despite numerous restarts to the
> cluster. I cant seem to find a way out of this one, and I am not really
> looking for a workaround, kinda need this to work if i need to go to
> production.
>
> Turned on the ALL logging in log4j, and now I see the following exception
> (EOFException) on the destination. After receiving each file, it seems to be
> throwing this exception. The transfer is successful except for this
> exception. The source successful declares the transfer complete. But the
> destination does not move out of the bootstrapping mode, and just sits
> there.
>
> DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line
> 65) Receiving stream: finished reading chunk, awaiting more
> DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line
> 87) Removing stream context
> /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Index.db:522051369
> DEBUG [Thread-15] 2010-09-15 10:56:59,767 StreamCompletionHandler.java
> (line 73) Sending a streaming finished message with
> org.apache.cassandra.streaming.CompletedFileStatus@54828e7 to IP1
> TRACE [Thread-15] 2010-09-15 10:56:59,769 IncomingTcpConnection.java (line
> 82) eof reading from socket; closing
> java.io.EOFException
>         at java.io.DataInputStream.readInt(Unknown Source)
>         at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59)
> DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line
> 51) Receiving stream
> DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line
> 54) Creating file for
> /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db
> DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line
> 65) Receiving stream: finished reading chunk, awaiting more
> DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line
> 87) Removing stream context
> /data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db:7489045
> DEBUG [Thread-16] 2010-09-15 10:56:59,876 StreamCompletionHandler.java
> (line 73) Sending a streaming finished message with
> org.apache.cassandra.streaming.CompletedFileStatus@7b41a32f to IP1
> TRACE [Thread-16] 2010-09-15 10:56:59,876 IncomingTcpConnection.java (line
> 82) eof reading from socket; closing
> java.io.EOFException
>         at java.io.DataInputStream.readInt(Unknown Source)
>         at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59)
>
> /G
>
> On Tue, Sep 14, 2010 at 11:40 AM, Gurpreet Singh <gurpreet.singh@gmail.com
> > wrote:
>
>> Hi Vineet,
>> I have tracked the nodetool streams to completion each time. Below are the
>> logs on the source and destination node. There are 3 sstables being
>> transferred, and the transfer seems to be successful. However, after the
>> streams finish, the source prints out messages about the dropped messages,
>> which may point to the problem. ideas? I checked port 7000 is open for
>> communication. 9160 is not up on the node being bootstrapped, but that comes
>> up after the node is bootstrapped, is that right?
>>
>> Thanks a ton,
>> /G
>>
>> *Logs on the source node (IP2):*
>> *
>> *
>> INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 79)
>> Flushing memtables for userdata...
>>  INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 95)
>> Performing anticompaction ...
>>  INFO [COMPACTION-POOL:1] 2010-09-14 09:54:07,900 CompactionManager.java
>> (line 339) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_list_items-5823-Data.db')]
>>  INFO [GC inspection] 2010-09-14 09:56:54,712 GCInspector.java (line 129)
>> GC for ParNew: 212 ms, 29033016 reclaimed leaving 579419360 used; max is
>> 4415946752
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,508 CompactionManager.java
>> (line 396) AntiCompacted to
>> /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db.
>>  49074138589/36770836242 bytes for 5990912 keys.  Time: 1438607ms.
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,528 CompactionManager.java
>> (line 339) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user-22-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,839 CompactionManager.java
>> (line 396) AntiCompacted to
>> /data/mysql/cassandrastorage/userdata/stream/user-24-Data.db.
>>  28185244/21126422 bytes for 47722 keys.  Time: 2310ms.
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,840 CompactionManager.java
>> (line 339) AntiCompacting
>> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_lists-502-Data.db')]
>>  INFO [COMPACTION-POOL:1] 2010-09-14 10:21:08,606 CompactionManager.java
>> (line 396) AntiCompacted to
>> /data/mysql/cassandrastorage/userdata/stream/user_lists-504-Data.db.
>>  2927724285/2195768325 bytes for 3976118 keys.  Time: 179766ms.
>>  INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,607 StreamOut.java (line 127)
>> Stream context metadata
>> /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Index.db:522051369,
>>  3
>> sstables./data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Filter.db:7489045,
>>  3
>> sstables./data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db:36770836242,
>>  3
>> sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Index.db:3373143,
>>  3
>> sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Filter.db:59965,
>>  3
>> sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Data.db:21126422,
>>  3
>> sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Index.db:282956452,
>>  3
>> sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Filter.db:4970125,
>>  3
>> sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Data.db:2195768325
>>  INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line 132)
>> Sending a stream initiate message to IP1...
>>  INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line 137)
>> Waiting for transfer to IP1 to complete
>>  *WARN [DroppedMessagesLogger] 2010-09-14 10:28:00,592
>> MessagingService.java (line 501) Dropped 9 messages in the last 1000ms*
>> * *INFO [STREAM-STAGE:1] 2010-09-14 10:28:00,605 StreamOut.java (line
>> 141) Done with transfer to IP1
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:00,670
>> SSTableDeletingReference.java (line 104) Deleted
>> /data/cassandra/datadir/cassandradb/system/LocationInfo-17-Data.db
>>  *WARN [DroppedMessagesLogger] 2010-09-14 10:28:01,602
>> MessagingService.java (line 501) Dropped 1 messages in the last 1000ms*
>> * *INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,133
>> SSTableDeletingReference.java (line 104) Deleted
>> /data/mysql/cassandrastorage/system/LocationInfo-19-Data.db
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,134
>> SSTableDeletingReference.java (line 104) Deleted
>> /data/cassandra/datadir/cassandradb/system/LocationInfo-18-Data.db
>>  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,134
>> SSTableDeletingReference.java (line 104) Deleted
>> /data/cassandra/datadir/cassandradb/system/LocationInfo-20-Data.db
>>
>>
>>
>> *Logs on new node being bootstrapped (IP1):*
>>
>> INFO [main] 2010-09-14 09:53:37,788 BootStrapper.java (line 104) New token
>> will be 149298847080838649048722691811739653739 to assume load from IP2
>>  INFO [main] 2010-09-14 09:53:37,789 StorageService.java (line 388)
>> Joining: sleeping 30000 ms for pending range setup
>>  INFO [main] 2010-09-14 09:54:07,792 StorageService.java (line 388)
>> Bootstrapping
>>  INFO [Thread-17] 2010-09-14 10:26:40,699 SSTableReader.java (line 120)
>> Sampling index for
>> /data/mysql/cassandrastorage/userdata/user_list_items-2-Data.db
>>  INFO [Thread-17] 2010-09-14 10:26:51,453 StreamCompletionHandler.java
>> (line 64) Streaming added
>> /data/mysql/cassandrastorage/userdata/user_list_items-2-Data.db
>>  INFO [Thread-23] 2010-09-14 10:26:52,219 SSTableReader.java (line 120)
>> Sampling index for
>> /data/cassandra/datadir/cassandradb/userdata/user-2-Data.db
>>  INFO [Thread-23] 2010-09-14 10:26:52,311 StreamCompletionHandler.java
>> (line 64) Streaming added
>> /data/cassandra/datadir/cassandradb/userdata/user-2-Data.db
>>  INFO [Timer-0] 2010-09-14 10:27:13,676 Gossiper.java (line 180)
>> InetAddress IP2 is now dead.
>>  INFO [Thread-26] 2010-09-14 10:27:18,858 SSTableReader.java (line 120)
>> Sampling index for
>> /data/cassandra/datadir/cassandradb/userdata/user_lists-2-Data.db
>>  INFO [Thread-26] 2010-09-14 10:27:24,645 StreamCompletionHandler.java
>> (line 64) Streaming added
>> /data/cassandra/datadir/cassandradb/userdata/user_lists-2-Data.db
>>  INFO [HINTED-HANDOFF-POOL:1] 2010-09-14 10:28:00,605
>> HintedHandOffManager.java (line 165) Started hinted handoff for endPoint IP2
>>  INFO [GMFD:1] 2010-09-14 10:28:00,605 Gossiper.java (line 578)
>> InetAddress IP2 is now UP
>>  INFO [HINTED-HANDOFF-POOL:1] 2010-09-14 10:28:00,606
>> HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to
>> endpoint IP2
>>  INFO [FLUSH-TIMER] 2010-09-14 10:57:52,381 ColumnFamilyStore.java (line
>> 357) LocationInfo has reached its threshold; switching in a fresh Memtable
>> at
>> CommitLogContext(file='/data/cassandra/datadir/commitlog/CommitLog-1284483127473.log',
>> position=1353)
>>  INFO [FLUSH-TIMER] 2010-09-14 10:57:52,382 ColumnFamilyStore.java (line
>> 609) Enqueuing flush of Memtable-LocationInfo@2057079871(210 bytes, 6
>> operations)
>>  INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,382 Memtable.java (line
>> 148) Writing Memtable-LocationInfo@2057079871(210 bytes, 6 operations)
>>  INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,464 Memtable.java (line
>> 162) Completed flushing
>> /data/cassandra/datadir/cassandradb/system/LocationInfo-10-Data.db
>>
>>
>> On Tue, Sep 14, 2010 at 10:52 AM, vineet daniel <vi...@gmail.com>wrote:
>>
>>> Hi Gurpreet
>>>
>>> What is the output of  nodetool -h <hostname/IP> streams  -->( to see
>>> what is going on between the nodes) . If you dont see anything happening
>>> try switching off firewall or iptables.
>>>
>>>
>>> Regards
>>> Vineet Daniel
>>> Cell          : +918106217121
>>> Websites :
>>> Blog <http://vinetedaniel.blogspot.com>   |   Linkedin<http://in.linkedin.com/in/vineetdaniel>
>>> |  Twitter <https://twitter.com/vineetdaniel>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Sep 14, 2010 at 11:11 PM, Gurpreet Singh <
>>> gurpreet.singh@gmail.com> wrote:
>>>
>>>> I tried this again, it happenned yet again.
>>>> This time while the transfer messages seemed tobe in order, i also
>>>> noticed that the source logs talk about having 9 dropped messages in the
>>>> last 1000 ms. The only activity on the whole cluster is this bootstrapping,
>>>> there is no read/write traffic going on.
>>>>
>>>> /G
>>>>
>>>> On Tue, Sep 14, 2010 at 10:05 AM, Gurpreet Singh <
>>>> gurpreet.singh@gmail.com> wrote:
>>>>
>>>>> I am using cassandra 0.6.5.
>>>>>
>>>>>
>>>>> On Tue, Sep 14, 2010 at 9:16 AM, Gurpreet Singh <
>>>>> gurpreet.singh@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>> I have a cassandra cluster of 4 machines, and I am trying to bootstrap
>>>>>> 2 more machines, one at a time.
>>>>>> For both these machines, the bootstrapping stays stuck after the
>>>>>> streaming is done.
>>>>>>
>>>>>> When the nodes come up for bootstrapping, I see all the relevant
>>>>>> messages about getting a new token, assuming load from a particular host. I
>>>>>> see a couple of nodes anticompacting data to send, and at a later point the
>>>>>> node that is bootstrapping prints the right streaming mesgs. However, once
>>>>>> the streaming is over, the node just doesnt do anything. Previously while
>>>>>> bootstrapping, I have seen that after the streaming is done, the node
>>>>>> restarts and becomes part of the ring by itself. I dont see this behaviour
>>>>>> with both the nodes I tried today.
>>>>>> I even restarted all the nodes in the cluster, and tried bootstrapping
>>>>>> one of the nodes again, but it again was stuck after streaming. It seems to
>>>>>> have copied the relevant load as well.
>>>>>> Any ideas as to what could be going on here?
>>>>>>
>>>>>> /G
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Bootstrapping stays stuck

Posted by Gurpreet Singh <gu...@gmail.com>.
This problem still stays unresolved despite numerous restarts to the
cluster. I cant seem to find a way out of this one, and I am not really
looking for a workaround, kinda need this to work if i need to go to
production.

Turned on the ALL logging in log4j, and now I see the following exception
(EOFException) on the destination. After receiving each file, it seems to be
throwing this exception. The transfer is successful except for this
exception. The source successful declares the transfer complete. But the
destination does not move out of the bootstrapping mode, and just sits
there.

DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line
65) Receiving stream: finished reading chunk, awaiting more
DEBUG [Thread-15] 2010-09-15 10:56:59,767 IncomingStreamReader.java (line
87) Removing stream context
/data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Index.db:522051369
DEBUG [Thread-15] 2010-09-15 10:56:59,767 StreamCompletionHandler.java (line
73) Sending a streaming finished message with
org.apache.cassandra.streaming.CompletedFileStatus@54828e7 to IP1
TRACE [Thread-15] 2010-09-15 10:56:59,769 IncomingTcpConnection.java (line
82) eof reading from socket; closing
java.io.EOFException
        at java.io.DataInputStream.readInt(Unknown Source)
        at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59)
DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line
51) Receiving stream
DEBUG [Thread-16] 2010-09-15 10:56:59,812 IncomingStreamReader.java (line
54) Creating file for
/data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db
DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line
65) Receiving stream: finished reading chunk, awaiting more
DEBUG [Thread-16] 2010-09-15 10:56:59,876 IncomingStreamReader.java (line
87) Removing stream context
/data/cassandra/datadir/cassandradb/userdata/user_list_items-tmp-1-Filter.db:7489045
DEBUG [Thread-16] 2010-09-15 10:56:59,876 StreamCompletionHandler.java (line
73) Sending a streaming finished message with
org.apache.cassandra.streaming.CompletedFileStatus@7b41a32f to IP1
TRACE [Thread-16] 2010-09-15 10:56:59,876 IncomingTcpConnection.java (line
82) eof reading from socket; closing
java.io.EOFException
        at java.io.DataInputStream.readInt(Unknown Source)
        at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:59)

/G

On Tue, Sep 14, 2010 at 11:40 AM, Gurpreet Singh
<gu...@gmail.com>wrote:

> Hi Vineet,
> I have tracked the nodetool streams to completion each time. Below are the
> logs on the source and destination node. There are 3 sstables being
> transferred, and the transfer seems to be successful. However, after the
> streams finish, the source prints out messages about the dropped messages,
> which may point to the problem. ideas? I checked port 7000 is open for
> communication. 9160 is not up on the node being bootstrapped, but that comes
> up after the node is bootstrapped, is that right?
>
> Thanks a ton,
> /G
>
> *Logs on the source node (IP2):*
> *
> *
> INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 79)
> Flushing memtables for userdata...
>  INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 95)
> Performing anticompaction ...
>  INFO [COMPACTION-POOL:1] 2010-09-14 09:54:07,900 CompactionManager.java
> (line 339) AntiCompacting
> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_list_items-5823-Data.db')]
>  INFO [GC inspection] 2010-09-14 09:56:54,712 GCInspector.java (line 129)
> GC for ParNew: 212 ms, 29033016 reclaimed leaving 579419360 used; max is
> 4415946752
>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,508 CompactionManager.java
> (line 396) AntiCompacted to
> /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db.
>  49074138589/36770836242 bytes for 5990912 keys.  Time: 1438607ms.
>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,528 CompactionManager.java
> (line 339) AntiCompacting
> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user-22-Data.db')]
>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,839 CompactionManager.java
> (line 396) AntiCompacted to
> /data/mysql/cassandrastorage/userdata/stream/user-24-Data.db.
>  28185244/21126422 bytes for 47722 keys.  Time: 2310ms.
>  INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,840 CompactionManager.java
> (line 339) AntiCompacting
> [org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_lists-502-Data.db')]
>  INFO [COMPACTION-POOL:1] 2010-09-14 10:21:08,606 CompactionManager.java
> (line 396) AntiCompacted to
> /data/mysql/cassandrastorage/userdata/stream/user_lists-504-Data.db.
>  2927724285/2195768325 bytes for 3976118 keys.  Time: 179766ms.
>  INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,607 StreamOut.java (line 127)
> Stream context metadata
> /data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Index.db:522051369,
>  3
> sstables./data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Filter.db:7489045,
>  3
> sstables./data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db:36770836242,
>  3
> sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Index.db:3373143,
>  3
> sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Filter.db:59965,
>  3
> sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Data.db:21126422,
>  3
> sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Index.db:282956452,
>  3
> sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Filter.db:4970125,
>  3
> sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Data.db:2195768325
>  INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line 132)
> Sending a stream initiate message to IP1...
>  INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line 137)
> Waiting for transfer to IP1 to complete
>  *WARN [DroppedMessagesLogger] 2010-09-14 10:28:00,592
> MessagingService.java (line 501) Dropped 9 messages in the last 1000ms*
> * *INFO [STREAM-STAGE:1] 2010-09-14 10:28:00,605 StreamOut.java (line 141)
> Done with transfer to IP1
>  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:00,670
> SSTableDeletingReference.java (line 104) Deleted
> /data/cassandra/datadir/cassandradb/system/LocationInfo-17-Data.db
>  *WARN [DroppedMessagesLogger] 2010-09-14 10:28:01,602
> MessagingService.java (line 501) Dropped 1 messages in the last 1000ms*
> * *INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,133
> SSTableDeletingReference.java (line 104) Deleted
> /data/mysql/cassandrastorage/system/LocationInfo-19-Data.db
>  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,134
> SSTableDeletingReference.java (line 104) Deleted
> /data/cassandra/datadir/cassandradb/system/LocationInfo-18-Data.db
>  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,134
> SSTableDeletingReference.java (line 104) Deleted
> /data/cassandra/datadir/cassandradb/system/LocationInfo-20-Data.db
>
>
>
> *Logs on new node being bootstrapped (IP1):*
>
> INFO [main] 2010-09-14 09:53:37,788 BootStrapper.java (line 104) New token
> will be 149298847080838649048722691811739653739 to assume load from IP2
>  INFO [main] 2010-09-14 09:53:37,789 StorageService.java (line 388)
> Joining: sleeping 30000 ms for pending range setup
>  INFO [main] 2010-09-14 09:54:07,792 StorageService.java (line 388)
> Bootstrapping
>  INFO [Thread-17] 2010-09-14 10:26:40,699 SSTableReader.java (line 120)
> Sampling index for
> /data/mysql/cassandrastorage/userdata/user_list_items-2-Data.db
>  INFO [Thread-17] 2010-09-14 10:26:51,453 StreamCompletionHandler.java
> (line 64) Streaming added
> /data/mysql/cassandrastorage/userdata/user_list_items-2-Data.db
>  INFO [Thread-23] 2010-09-14 10:26:52,219 SSTableReader.java (line 120)
> Sampling index for
> /data/cassandra/datadir/cassandradb/userdata/user-2-Data.db
>  INFO [Thread-23] 2010-09-14 10:26:52,311 StreamCompletionHandler.java
> (line 64) Streaming added
> /data/cassandra/datadir/cassandradb/userdata/user-2-Data.db
>  INFO [Timer-0] 2010-09-14 10:27:13,676 Gossiper.java (line 180)
> InetAddress IP2 is now dead.
>  INFO [Thread-26] 2010-09-14 10:27:18,858 SSTableReader.java (line 120)
> Sampling index for
> /data/cassandra/datadir/cassandradb/userdata/user_lists-2-Data.db
>  INFO [Thread-26] 2010-09-14 10:27:24,645 StreamCompletionHandler.java
> (line 64) Streaming added
> /data/cassandra/datadir/cassandradb/userdata/user_lists-2-Data.db
>  INFO [HINTED-HANDOFF-POOL:1] 2010-09-14 10:28:00,605
> HintedHandOffManager.java (line 165) Started hinted handoff for endPoint IP2
>  INFO [GMFD:1] 2010-09-14 10:28:00,605 Gossiper.java (line 578) InetAddress
> IP2 is now UP
>  INFO [HINTED-HANDOFF-POOL:1] 2010-09-14 10:28:00,606
> HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to
> endpoint IP2
>  INFO [FLUSH-TIMER] 2010-09-14 10:57:52,381 ColumnFamilyStore.java (line
> 357) LocationInfo has reached its threshold; switching in a fresh Memtable
> at
> CommitLogContext(file='/data/cassandra/datadir/commitlog/CommitLog-1284483127473.log',
> position=1353)
>  INFO [FLUSH-TIMER] 2010-09-14 10:57:52,382 ColumnFamilyStore.java (line
> 609) Enqueuing flush of Memtable-LocationInfo@2057079871(210 bytes, 6
> operations)
>  INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,382 Memtable.java (line
> 148) Writing Memtable-LocationInfo@2057079871(210 bytes, 6 operations)
>  INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,464 Memtable.java (line
> 162) Completed flushing
> /data/cassandra/datadir/cassandradb/system/LocationInfo-10-Data.db
>
>
> On Tue, Sep 14, 2010 at 10:52 AM, vineet daniel <vi...@gmail.com>wrote:
>
>> Hi Gurpreet
>>
>> What is the output of  nodetool -h <hostname/IP> streams  -->( to see
>> what is going on between the nodes) . If you dont see anything happening
>> try switching off firewall or iptables.
>>
>>
>> Regards
>> Vineet Daniel
>> Cell          : +918106217121
>> Websites :
>> Blog <http://vinetedaniel.blogspot.com>   |   Linkedin<http://in.linkedin.com/in/vineetdaniel>
>> |  Twitter <https://twitter.com/vineetdaniel>
>>
>>
>>
>>
>>
>> On Tue, Sep 14, 2010 at 11:11 PM, Gurpreet Singh <
>> gurpreet.singh@gmail.com> wrote:
>>
>>> I tried this again, it happenned yet again.
>>> This time while the transfer messages seemed tobe in order, i also
>>> noticed that the source logs talk about having 9 dropped messages in the
>>> last 1000 ms. The only activity on the whole cluster is this bootstrapping,
>>> there is no read/write traffic going on.
>>>
>>> /G
>>>
>>> On Tue, Sep 14, 2010 at 10:05 AM, Gurpreet Singh <
>>> gurpreet.singh@gmail.com> wrote:
>>>
>>>> I am using cassandra 0.6.5.
>>>>
>>>>
>>>> On Tue, Sep 14, 2010 at 9:16 AM, Gurpreet Singh <
>>>> gurpreet.singh@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>> I have a cassandra cluster of 4 machines, and I am trying to bootstrap
>>>>> 2 more machines, one at a time.
>>>>> For both these machines, the bootstrapping stays stuck after the
>>>>> streaming is done.
>>>>>
>>>>> When the nodes come up for bootstrapping, I see all the relevant
>>>>> messages about getting a new token, assuming load from a particular host. I
>>>>> see a couple of nodes anticompacting data to send, and at a later point the
>>>>> node that is bootstrapping prints the right streaming mesgs. However, once
>>>>> the streaming is over, the node just doesnt do anything. Previously while
>>>>> bootstrapping, I have seen that after the streaming is done, the node
>>>>> restarts and becomes part of the ring by itself. I dont see this behaviour
>>>>> with both the nodes I tried today.
>>>>> I even restarted all the nodes in the cluster, and tried bootstrapping
>>>>> one of the nodes again, but it again was stuck after streaming. It seems to
>>>>> have copied the relevant load as well.
>>>>> Any ideas as to what could be going on here?
>>>>>
>>>>> /G
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Bootstrapping stays stuck

Posted by Gurpreet Singh <gu...@gmail.com>.
Hi Vineet,
I have tracked the nodetool streams to completion each time. Below are the
logs on the source and destination node. There are 3 sstables being
transferred, and the transfer seems to be successful. However, after the
streams finish, the source prints out messages about the dropped messages,
which may point to the problem. ideas? I checked port 7000 is open for
communication. 9160 is not up on the node being bootstrapped, but that comes
up after the node is bootstrapped, is that right?

Thanks a ton,
/G

*Logs on the source node (IP2):*
*
*
INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 79)
Flushing memtables for userdata...
 INFO [STREAM-STAGE:1] 2010-09-14 09:54:07,900 StreamOut.java (line 95)
Performing anticompaction ...
 INFO [COMPACTION-POOL:1] 2010-09-14 09:54:07,900 CompactionManager.java
(line 339) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_list_items-5823-Data.db')]
 INFO [GC inspection] 2010-09-14 09:56:54,712 GCInspector.java (line 129) GC
for ParNew: 212 ms, 29033016 reclaimed leaving 579419360 used; max is
4415946752
 INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,508 CompactionManager.java
(line 396) AntiCompacted to
/data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db.
 49074138589/36770836242 bytes for 5990912 keys.  Time: 1438607ms.
 INFO [COMPACTION-POOL:1] 2010-09-14 10:18:06,528 CompactionManager.java
(line 339) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user-22-Data.db')]
 INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,839 CompactionManager.java
(line 396) AntiCompacted to
/data/mysql/cassandrastorage/userdata/stream/user-24-Data.db.
 28185244/21126422 bytes for 47722 keys.  Time: 2310ms.
 INFO [COMPACTION-POOL:1] 2010-09-14 10:18:08,840 CompactionManager.java
(line 339) AntiCompacting
[org.apache.cassandra.io.SSTableReader(path='/data/cassandra/datadir/cassandradb/userdata/user_lists-502-Data.db')]
 INFO [COMPACTION-POOL:1] 2010-09-14 10:21:08,606 CompactionManager.java
(line 396) AntiCompacted to
/data/mysql/cassandrastorage/userdata/stream/user_lists-504-Data.db.
 2927724285/2195768325 bytes for 3976118 keys.  Time: 179766ms.
 INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,607 StreamOut.java (line 127)
Stream context metadata
/data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Index.db:522051369,
 3
sstables./data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Filter.db:7489045,
 3
sstables./data/cassandra/datadir/cassandradb/userdata/stream/user_list_items-5825-Data.db:36770836242,
 3
sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Index.db:3373143,
 3
sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Filter.db:59965,
 3
sstables./data/mysql/cassandrastorage/userdata/stream/user-24-Data.db:21126422,
 3
sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Index.db:282956452,
 3
sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Filter.db:4970125,
 3
sstables./data/mysql/cassandrastorage/userdata/stream/user_lists-504-Data.db:2195768325
 INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line 132)
Sending a stream initiate message to IP1...
 INFO [STREAM-STAGE:1] 2010-09-14 10:21:08,608 StreamOut.java (line 137)
Waiting for transfer to IP1 to complete
 *WARN [DroppedMessagesLogger] 2010-09-14 10:28:00,592 MessagingService.java
(line 501) Dropped 9 messages in the last 1000ms*
* *INFO [STREAM-STAGE:1] 2010-09-14 10:28:00,605 StreamOut.java (line 141)
Done with transfer to IP1
 INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:00,670
SSTableDeletingReference.java (line 104) Deleted
/data/cassandra/datadir/cassandradb/system/LocationInfo-17-Data.db
 *WARN [DroppedMessagesLogger] 2010-09-14 10:28:01,602 MessagingService.java
(line 501) Dropped 1 messages in the last 1000ms*
* *INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,133
SSTableDeletingReference.java (line 104) Deleted
/data/mysql/cassandrastorage/system/LocationInfo-19-Data.db
 INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,134
SSTableDeletingReference.java (line 104) Deleted
/data/cassandra/datadir/cassandradb/system/LocationInfo-18-Data.db
 INFO [SSTABLE-CLEANUP-TIMER] 2010-09-14 10:28:06,134
SSTableDeletingReference.java (line 104) Deleted
/data/cassandra/datadir/cassandradb/system/LocationInfo-20-Data.db



*Logs on new node being bootstrapped (IP1):*

INFO [main] 2010-09-14 09:53:37,788 BootStrapper.java (line 104) New token
will be 149298847080838649048722691811739653739 to assume load from IP2
 INFO [main] 2010-09-14 09:53:37,789 StorageService.java (line 388) Joining:
sleeping 30000 ms for pending range setup
 INFO [main] 2010-09-14 09:54:07,792 StorageService.java (line 388)
Bootstrapping
 INFO [Thread-17] 2010-09-14 10:26:40,699 SSTableReader.java (line 120)
Sampling index for
/data/mysql/cassandrastorage/userdata/user_list_items-2-Data.db
 INFO [Thread-17] 2010-09-14 10:26:51,453 StreamCompletionHandler.java (line
64) Streaming added
/data/mysql/cassandrastorage/userdata/user_list_items-2-Data.db
 INFO [Thread-23] 2010-09-14 10:26:52,219 SSTableReader.java (line 120)
Sampling index for
/data/cassandra/datadir/cassandradb/userdata/user-2-Data.db
 INFO [Thread-23] 2010-09-14 10:26:52,311 StreamCompletionHandler.java (line
64) Streaming added
/data/cassandra/datadir/cassandradb/userdata/user-2-Data.db
 INFO [Timer-0] 2010-09-14 10:27:13,676 Gossiper.java (line 180) InetAddress
IP2 is now dead.
 INFO [Thread-26] 2010-09-14 10:27:18,858 SSTableReader.java (line 120)
Sampling index for
/data/cassandra/datadir/cassandradb/userdata/user_lists-2-Data.db
 INFO [Thread-26] 2010-09-14 10:27:24,645 StreamCompletionHandler.java (line
64) Streaming added
/data/cassandra/datadir/cassandradb/userdata/user_lists-2-Data.db
 INFO [HINTED-HANDOFF-POOL:1] 2010-09-14 10:28:00,605
HintedHandOffManager.java (line 165) Started hinted handoff for endPoint IP2
 INFO [GMFD:1] 2010-09-14 10:28:00,605 Gossiper.java (line 578) InetAddress
IP2 is now UP
 INFO [HINTED-HANDOFF-POOL:1] 2010-09-14 10:28:00,606
HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to
endpoint IP2
 INFO [FLUSH-TIMER] 2010-09-14 10:57:52,381 ColumnFamilyStore.java (line
357) LocationInfo has reached its threshold; switching in a fresh Memtable
at
CommitLogContext(file='/data/cassandra/datadir/commitlog/CommitLog-1284483127473.log',
position=1353)
 INFO [FLUSH-TIMER] 2010-09-14 10:57:52,382 ColumnFamilyStore.java (line
609) Enqueuing flush of Memtable-LocationInfo@2057079871(210 bytes, 6
operations)
 INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,382 Memtable.java (line 148)
Writing Memtable-LocationInfo@2057079871(210 bytes, 6 operations)
 INFO [FLUSH-WRITER-POOL:1] 2010-09-14 10:57:52,464 Memtable.java (line 162)
Completed flushing
/data/cassandra/datadir/cassandradb/system/LocationInfo-10-Data.db


On Tue, Sep 14, 2010 at 10:52 AM, vineet daniel <vi...@gmail.com>wrote:

> Hi Gurpreet
>
> What is the output of  nodetool -h <hostname/IP> streams  -->( to see what
> is going on between the nodes) . If you dont see anything happening try
> switching off firewall or iptables.
>
>
> Regards
> Vineet Daniel
> Cell          : +918106217121
> Websites :
> Blog <http://vinetedaniel.blogspot.com>   |   Linkedin<http://in.linkedin.com/in/vineetdaniel>
> |  Twitter <https://twitter.com/vineetdaniel>
>
>
>
>
>
> On Tue, Sep 14, 2010 at 11:11 PM, Gurpreet Singh <gurpreet.singh@gmail.com
> > wrote:
>
>> I tried this again, it happenned yet again.
>> This time while the transfer messages seemed tobe in order, i also noticed
>> that the source logs talk about having 9 dropped messages in the last 1000
>> ms. The only activity on the whole cluster is this bootstrapping, there is
>> no read/write traffic going on.
>>
>> /G
>>
>> On Tue, Sep 14, 2010 at 10:05 AM, Gurpreet Singh <
>> gurpreet.singh@gmail.com> wrote:
>>
>>> I am using cassandra 0.6.5.
>>>
>>>
>>> On Tue, Sep 14, 2010 at 9:16 AM, Gurpreet Singh <
>>> gurpreet.singh@gmail.com> wrote:
>>>
>>>> Hi,
>>>> I have a cassandra cluster of 4 machines, and I am trying to bootstrap 2
>>>> more machines, one at a time.
>>>> For both these machines, the bootstrapping stays stuck after the
>>>> streaming is done.
>>>>
>>>> When the nodes come up for bootstrapping, I see all the relevant
>>>> messages about getting a new token, assuming load from a particular host. I
>>>> see a couple of nodes anticompacting data to send, and at a later point the
>>>> node that is bootstrapping prints the right streaming mesgs. However, once
>>>> the streaming is over, the node just doesnt do anything. Previously while
>>>> bootstrapping, I have seen that after the streaming is done, the node
>>>> restarts and becomes part of the ring by itself. I dont see this behaviour
>>>> with both the nodes I tried today.
>>>> I even restarted all the nodes in the cluster, and tried bootstrapping
>>>> one of the nodes again, but it again was stuck after streaming. It seems to
>>>> have copied the relevant load as well.
>>>> Any ideas as to what could be going on here?
>>>>
>>>> /G
>>>>
>>>
>>>
>>
>

Re: Bootstrapping stays stuck

Posted by vineet daniel <vi...@gmail.com>.
Hi Gurpreet

What is the output of  nodetool -h <hostname/IP> streams  -->( to see what
is going on between the nodes) . If you dont see anything happening try
switching off firewall or iptables.


Regards
Vineet Daniel
Cell          : +918106217121
Websites :
Blog <http://vinetedaniel.blogspot.com>   |
Linkedin<http://in.linkedin.com/in/vineetdaniel>
|  Twitter <https://twitter.com/vineetdaniel>





On Tue, Sep 14, 2010 at 11:11 PM, Gurpreet Singh
<gu...@gmail.com>wrote:

> I tried this again, it happenned yet again.
> This time while the transfer messages seemed tobe in order, i also noticed
> that the source logs talk about having 9 dropped messages in the last 1000
> ms. The only activity on the whole cluster is this bootstrapping, there is
> no read/write traffic going on.
>
> /G
>
> On Tue, Sep 14, 2010 at 10:05 AM, Gurpreet Singh <gurpreet.singh@gmail.com
> > wrote:
>
>> I am using cassandra 0.6.5.
>>
>>
>> On Tue, Sep 14, 2010 at 9:16 AM, Gurpreet Singh <gurpreet.singh@gmail.com
>> > wrote:
>>
>>> Hi,
>>> I have a cassandra cluster of 4 machines, and I am trying to bootstrap 2
>>> more machines, one at a time.
>>> For both these machines, the bootstrapping stays stuck after the
>>> streaming is done.
>>>
>>> When the nodes come up for bootstrapping, I see all the relevant messages
>>> about getting a new token, assuming load from a particular host. I see a
>>> couple of nodes anticompacting data to send, and at a later point the node
>>> that is bootstrapping prints the right streaming mesgs. However, once the
>>> streaming is over, the node just doesnt do anything. Previously while
>>> bootstrapping, I have seen that after the streaming is done, the node
>>> restarts and becomes part of the ring by itself. I dont see this behaviour
>>> with both the nodes I tried today.
>>> I even restarted all the nodes in the cluster, and tried bootstrapping
>>> one of the nodes again, but it again was stuck after streaming. It seems to
>>> have copied the relevant load as well.
>>> Any ideas as to what could be going on here?
>>>
>>> /G
>>>
>>
>>
>

Re: Bootstrapping stays stuck

Posted by Gurpreet Singh <gu...@gmail.com>.
I tried this again, it happenned yet again.
This time while the transfer messages seemed tobe in order, i also noticed
that the source logs talk about having 9 dropped messages in the last 1000
ms. The only activity on the whole cluster is this bootstrapping, there is
no read/write traffic going on.

/G

On Tue, Sep 14, 2010 at 10:05 AM, Gurpreet Singh
<gu...@gmail.com>wrote:

> I am using cassandra 0.6.5.
>
>
> On Tue, Sep 14, 2010 at 9:16 AM, Gurpreet Singh <gu...@gmail.com>wrote:
>
>> Hi,
>> I have a cassandra cluster of 4 machines, and I am trying to bootstrap 2
>> more machines, one at a time.
>> For both these machines, the bootstrapping stays stuck after the streaming
>> is done.
>>
>> When the nodes come up for bootstrapping, I see all the relevant messages
>> about getting a new token, assuming load from a particular host. I see a
>> couple of nodes anticompacting data to send, and at a later point the node
>> that is bootstrapping prints the right streaming mesgs. However, once the
>> streaming is over, the node just doesnt do anything. Previously while
>> bootstrapping, I have seen that after the streaming is done, the node
>> restarts and becomes part of the ring by itself. I dont see this behaviour
>> with both the nodes I tried today.
>> I even restarted all the nodes in the cluster, and tried bootstrapping one
>> of the nodes again, but it again was stuck after streaming. It seems to have
>> copied the relevant load as well.
>> Any ideas as to what could be going on here?
>>
>> /G
>>
>
>

Re: Bootstrapping stays stuck

Posted by Gurpreet Singh <gu...@gmail.com>.
I am using cassandra 0.6.5.

On Tue, Sep 14, 2010 at 9:16 AM, Gurpreet Singh <gu...@gmail.com>wrote:

> Hi,
> I have a cassandra cluster of 4 machines, and I am trying to bootstrap 2
> more machines, one at a time.
> For both these machines, the bootstrapping stays stuck after the streaming
> is done.
>
> When the nodes come up for bootstrapping, I see all the relevant messages
> about getting a new token, assuming load from a particular host. I see a
> couple of nodes anticompacting data to send, and at a later point the node
> that is bootstrapping prints the right streaming mesgs. However, once the
> streaming is over, the node just doesnt do anything. Previously while
> bootstrapping, I have seen that after the streaming is done, the node
> restarts and becomes part of the ring by itself. I dont see this behaviour
> with both the nodes I tried today.
> I even restarted all the nodes in the cluster, and tried bootstrapping one
> of the nodes again, but it again was stuck after streaming. It seems to have
> copied the relevant load as well.
> Any ideas as to what could be going on here?
>
> /G
>