You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Stan Lemon <sl...@salesforce.com> on 2015/08/04 20:02:06 UTC

Long joining node

Hello,
I have a a cluster with 12 nodes each in 2 datacenters for a total of 24
nodes.

I am attempting to add a 13th node in one of the datacenters. I have been
monitoring this process from the node itself with nodetool netstats and
from one of the existing nodes using nodetool status.

On the existing node I see the new node as UJ. I have watched the load
steadily climb up to about 203.4gb, and then over the last two hours it has
fluctuated a bit and has been steadily dropping to about 203.1gb

On the node that I am adding I watched over several hours as nodetool
netstats received data, however for the last couple of hours nodetool
netsats simply shows the ips of the other nodes in the cluster.  It looks
something like to this...

Mode: JOINING
Bootstrap 659153b0-3ab6-11e5-8c94-5dd79366f3d9
    /10.1.82.160
    /10.1.82.162
    /10.1.82.80
    /10.2.123.74
    /10.1.82.166
    /10.1.82.158
    /10.1.82.168
    /10.1.82.150
    /10.1.82.148
    /10.2.123.2
    /10.1.82.152
    /10.1.82.156
    /10.84.78.120
    /10.2.123.80
    /10.2.123.78
    /10.81.122.64
    /10.2.123.82
    /10.2.123.84
    /10.1.82.164
    /10.81.122.62
    /10.2.123.76
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a         0             24
Responses                       n/a         0        1090793


So I'm trying to figure out... What is the node doing? Why is it still
joining? How long should I wait before being concerned?

Also...

The UUID next to the word 'Bootstrap' is NOT the host ID of the node
joining, it's actually the UUID of a different node already in the cluster.
This seems concerning to me, but again I'm not sure if this is expected
behavior or not.

ANY help would be greatly appreciated.

Thanks,
Stan

Re: Long joining node

Posted by Stan Lemon <sl...@salesforce.com>.

I'm running 2.0.11. These nodes are on bare metal at Soft Layer.

So after I sent my first post logs a RuntimeException popped up in the
logs, not sure if this might be related?

ERROR 14:13:17,648 Exception in thread Thread[CompactionExecutor:505,1,main]
java.lang.RuntimeException: Last written key
DecoratedKey(7750448305743929847,
003076697369746f725f706167655f766965613162616364663133646265316336373266363363666466366361383330323100004930303030616336342d303030302d303033302d343030302d3030303030303030616336343a64323065323964632d643836342d313165342d383030302d30303030306666636666366500)
>= current key DecoratedKey(-8457751561836812744,
002c6f626a6563745f6175646974386164626664623835326465363732633935346338313261306638373264663700005230303030326263322d303030302d303033302d343030302d3030303030303030326263323a30303030326263322d303030302d303033302d343030302d3030303030336331373630303a50726f737065637400)
writing into
/var/lib/cassandra/data/pi/__shardindex/pi-__shardindex-tmp-jb-657-Data.db
at
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143)
at
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166)
at
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:170)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)



On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon <sl...@salesforce.com> wrote:
>
>> I am attempting to add a 13th node in one of the datacenters. I have been
>> monitoring this process from the node itself with nodetool netstats and
>> from one of the existing nodes using nodetool status.
>>
>> On the existing node I see the new node as UJ. I have watched the load
>> steadily climb up to about 203.4gb, and then over the last two hours it has
>> fluctuated a bit and has been steadily dropping to about 203.1gb
>>
>
> It's probably hung. If I were you I'd probably wipe the node and
> re-bootstrap.
>
> (what version of cassandra/what network are you on (AWS?)/etc.)
>
> =Rob
>
>

Re: Long joining node

Posted by Stan Lemon <sl...@salesforce.com>.

It was suggested to me that I try running scrub on the other nodes in the
cluster, as the runtime exceptions I was seeing might be relevant to some
bad data. I am going to try that this morning and see how things go. Not
sure how long is long enough for nodetool scrub to run on a box though.

As for the load... Here's the spread on the current cluster:

[stan.lemon@cass-d101 ~]$ nodetool status
Note: Ownership information does not include topology; for complete
information, specify a keyspace
Datacenter: DALLAS
==================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns   Host ID
      Rack
UN  10.86.123.86  276.19 GB  256     4.0%
2386b94b-fe99-4cb0-8053-321c0540db45  RAC1
UN  10.81.122.66  261.38 GB  256     4.4%
b4533802-83c3-4e57-bbea-6b63294ba377  RAC1
UN  10.81.122.64  266.85 GB  256     4.3%
391a6dfc-254a-43cf-8f25-5518e8ab6511  RAC1
UN  10.86.123.84  290.27 GB  256     4.2%
14979aeb-e0a8-4f7d-866e-0e701a4f774f  RAC1
UN  10.86.123.82  289.96 GB  256     4.5%
65df8d81-0ec1-4f67-81c1-06e86e48593a  RAC1
UN  10.86.123.80  290.81 GB  256     4.4%
c4276398-0c76-4802-b92e-e08a3a0e319f  RAC1
UN  10.84.78.120  290.74 GB  256     4.5%
fce37c3d-c142-40b5-978c-ab8e59939b2f  RAC1
UN  10.84.78.118  287.85 GB  256     4.3%
cfd64c76-fb08-4a3a-b88e-bc19c45115c6  RAC1
UN  10.86.123.78  290.96 GB  256     4.1%
32cc866f-7b5f-4310-ac4a-e0f5dd650b78  RAC1
UN  10.86.123.76  295.52 GB  256     4.1%
bb1b80ba-28bf-4a39-9623-16e326eaaf09  RAC1
UN  10.81.122.62  286.81 GB  256     4.1%
ef255fd1-beee-4dc0-80f5-9ae2271c6398  RAC1
UN  10.86.123.74  303.25 GB  256     4.3%
041d7ab7-d1bd-4a79-afb7-9c6ab1857ee9  RAC1
Datacenter: SEATTLE
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns   Host ID
      Rack
UN  10.29.82.80   297.11 GB  256     4.3%
a0e61c1e-e48f-4ccd-afa4-5069d5671382  RAC1
UN  10.29.82.156  304.74 GB  256     4.3%
d17abc57-eb47-41de-8cd5-a341a38b16de  RAC1
UN  10.29.82.158  289.63 GB  256     4.4%
f47d4019-7fd9-4620-9465-d1199311de36  RAC1
UN  10.29.82.152  285.99 GB  256     4.1%
23ee0c6f-5ac7-475a-be13-7d0536619da3  RAC1
UN  10.29.82.168  285.39 GB  256     3.8%
f5f2f55c-e316-4281-b472-f572601c7618  RAC1
UN  10.29.82.154  287.8 GB   256     4.0%
29cd9781-985a-49ed-9910-46279f50bbba  RAC1
UN  10.29.82.166  282.9 GB   256     4.1%
627b0a9e-c0d0-4a90-9cbe-22f7fbb81f9f  RAC1
UN  10.29.82.148  291.17 GB  256     4.0%
c52b467f-8960-4c4f-951a-b4232bbd25ee  RAC1
UN  10.29.82.164  269.74 GB  256     3.9%
7fba7779-c705-45bb-a0ae-26a5dff93374  RAC1
UN  10.29.82.150  281.93 GB  256     4.1%
63165266-bfda-4bd5-b339-e103546bb853  RAC1
UN  10.29.82.162  294.11 GB  256     3.9%
933a495f-4ed7-4bf9-97d7-2ce2c58f5200  RAC1
UN  10.29.82.160  261.22 GB  256     4.0%
7baaeb81-b46b-441a-bb29-914247ec3fac  RAC1

On Wed, Aug 5, 2015 at 9:54 PM, Sebastian Estevez <
sebastian.estevez@datastax.com> wrote:

> What's your average data per node? Is 230gb close?
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
>
>
> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Wed, Aug 5, 2015 at 8:33 AM, Stan Lemon <sl...@salesforce.com> wrote:
>
>> I set the stream timeout to 1 hour this morning and started fresh trying
>> to join this node.  It took about an hour to stream over 230gb of data, and
>> then into hour 2 I wound up back where I was yesterday, the node's load is
>> slowly reducing and the netstats does not show sending or receiving
>> anything.  I'm not sure how long I should wait before I throw the towel in
>> on this attempt. I'm also not really sure what to try next...
>>
>> The only thing in the logs currently are three entries like this:
>>
>> ERROR 07:39:44,447 Exception in thread
>> Thread[CompactionExecutor:31,1,main]
>> java.lang.RuntimeException: Last written key
>> DecoratedKey(8633837336094175369,
>> 003076697369746f725f706167655f766965623936636232346331623661313935313634346638303838393465313132373700004930303030663264632d303030302d303033302d343030302d3030303030303030663264633a66376436366166382d383564352d313165342d383030302d30303030303035343764623600)
>> >= current key DecoratedKey(-6568345298384940765,
>> 003076697369746f725f706167655f766965623936636232346331623661313935313634346638303838393465313132373700004930303030376464652d303030302d303033302d343030302d3030303030303030376464653a64633930336533382d643766342d313165342d383030302d30303030303730626338386300)
>> writing into
>> /var/lib/cassandra/data/pi/__shardindex/pi-__shardindex-tmp-jb-644-Data.db
>> at
>> org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143)
>> at
>> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166)
>> at
>> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:170)
>> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> at
>> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
>> at
>> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>> at
>> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> ANY help is greatly appreciated.
>>
>> Thanks,
>> Stan
>>
>>
>>
>>
>>
>> On Tue, Aug 4, 2015 at 2:23 PM, Sebastian Estevez <
>> sebastian.estevez@datastax.com> wrote:
>>
>>> That's the one. I set it to an hour to be safe (if a stream goes above
>>> the timeout it will get restarted) but it can probably be lower.
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>>>
>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax>
>>>
>>>
>>> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Tue, Aug 4, 2015 at 2:21 PM, Stan Lemon <sl...@salesforce.com>
>>> wrote:
>>>
>>>> Sebastian,
>>>> You're referring to streaming_socket_timeout_in_ms correct?  What value
>>>> do you recommend?  All of my nodes are currently at the default 0.
>>>>
>>>> Thanks,
>>>> Stan
>>>>
>>>>
>>>> On Tue, Aug 4, 2015 at 2:16 PM, Sebastian Estevez <
>>>> sebastian.estevez@datastax.com> wrote:
>>>>
>>>>> It helps to set stream socket timeout in the yaml so that you don't
>>>>> hang forever on a lost / broken stream.
>>>>>
>>>>> All the best,
>>>>>
>>>>>
>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>
>>>>> Sebastián Estévez
>>>>>
>>>>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>>>>>
>>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>>>> <https://twitter.com/datastax> [image: g+.png]
>>>>> <https://plus.google.com/+Datastax/about>
>>>>> <http://feeds.feedburner.com/datastax>
>>>>>
>>>>>
>>>>> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>>>>>
>>>>> DataStax is the fastest, most scalable distributed database
>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>> scalable to any size. With more than 500 customers in 45 countries, DataStax
>>>>> is the database technology and transactional backbone of choice for the
>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>>>
>>>>> On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli <rc...@eventbrite.com>
>>>>> wrote:
>>>>>
>>>>>> On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon <sl...@salesforce.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am attempting to add a 13th node in one of the datacenters. I have
>>>>>>> been monitoring this process from the node itself with nodetool netstats
>>>>>>> and from one of the existing nodes using nodetool status.
>>>>>>>
>>>>>>> On the existing node I see the new node as UJ. I have watched the
>>>>>>> load steadily climb up to about 203.4gb, and then over the last two hours
>>>>>>> it has fluctuated a bit and has been steadily dropping to about 203.1gb
>>>>>>>
>>>>>>
>>>>>> It's probably hung. If I were you I'd probably wipe the node and
>>>>>> re-bootstrap.
>>>>>>
>>>>>> (what version of cassandra/what network are you on (AWS?)/etc.)
>>>>>>
>>>>>> =Rob
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Long joining node

Posted by Sebastian Estevez <se...@datastax.com>.

What's your average data per node? Is 230gb close?

All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>

<http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Wed, Aug 5, 2015 at 8:33 AM, Stan Lemon <sl...@salesforce.com> wrote:

> I set the stream timeout to 1 hour this morning and started fresh trying
> to join this node.  It took about an hour to stream over 230gb of data, and
> then into hour 2 I wound up back where I was yesterday, the node's load is
> slowly reducing and the netstats does not show sending or receiving
> anything.  I'm not sure how long I should wait before I throw the towel in
> on this attempt. I'm also not really sure what to try next...
>
> The only thing in the logs currently are three entries like this:
>
> ERROR 07:39:44,447 Exception in thread Thread[CompactionExecutor:31,1,main]
> java.lang.RuntimeException: Last written key
> DecoratedKey(8633837336094175369,
> 003076697369746f725f706167655f766965623936636232346331623661313935313634346638303838393465313132373700004930303030663264632d303030302d303033302d343030302d3030303030303030663264633a66376436366166382d383564352d313165342d383030302d30303030303035343764623600)
> >= current key DecoratedKey(-6568345298384940765,
> 003076697369746f725f706167655f766965623936636232346331623661313935313634346638303838393465313132373700004930303030376464652d303030302d303033302d343030302d3030303030303030376464653a64633930336533382d643766342d313165342d383030302d30303030303730626338386300)
> writing into
> /var/lib/cassandra/data/pi/__shardindex/pi-__shardindex-tmp-jb-644-Data.db
> at
> org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143)
> at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166)
> at
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:170)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
> at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
> at
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> ANY help is greatly appreciated.
>
> Thanks,
> Stan
>
>
>
>
>
> On Tue, Aug 4, 2015 at 2:23 PM, Sebastian Estevez <
> sebastian.estevez@datastax.com> wrote:
>
>> That's the one. I set it to an hour to be safe (if a stream goes above
>> the timeout it will get restarted) but it can probably be lower.
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>>
>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>> <https://twitter.com/datastax> [image: g+.png]
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax>
>>
>>
>> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Tue, Aug 4, 2015 at 2:21 PM, Stan Lemon <sl...@salesforce.com> wrote:
>>
>>> Sebastian,
>>> You're referring to streaming_socket_timeout_in_ms correct?  What value
>>> do you recommend?  All of my nodes are currently at the default 0.
>>>
>>> Thanks,
>>> Stan
>>>
>>>
>>> On Tue, Aug 4, 2015 at 2:16 PM, Sebastian Estevez <
>>> sebastian.estevez@datastax.com> wrote:
>>>
>>>> It helps to set stream socket timeout in the yaml so that you don't
>>>> hang forever on a lost / broken stream.
>>>>
>>>> All the best,
>>>>
>>>>
>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>
>>>> Sebastián Estévez
>>>>
>>>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>>>>
>>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>>> <https://twitter.com/datastax> [image: g+.png]
>>>> <https://plus.google.com/+Datastax/about>
>>>> <http://feeds.feedburner.com/datastax>
>>>>
>>>>
>>>> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>>>>
>>>> DataStax is the fastest, most scalable distributed database
>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>> scalable to any size. With more than 500 customers in 45 countries, DataStax
>>>> is the database technology and transactional backbone of choice for the
>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>>
>>>> On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli <rc...@eventbrite.com>
>>>> wrote:
>>>>
>>>>> On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon <sl...@salesforce.com>
>>>>> wrote:
>>>>>
>>>>>> I am attempting to add a 13th node in one of the datacenters. I have
>>>>>> been monitoring this process from the node itself with nodetool netstats
>>>>>> and from one of the existing nodes using nodetool status.
>>>>>>
>>>>>> On the existing node I see the new node as UJ. I have watched the
>>>>>> load steadily climb up to about 203.4gb, and then over the last two hours
>>>>>> it has fluctuated a bit and has been steadily dropping to about 203.1gb
>>>>>>
>>>>>
>>>>> It's probably hung. If I were you I'd probably wipe the node and
>>>>> re-bootstrap.
>>>>>
>>>>> (what version of cassandra/what network are you on (AWS?)/etc.)
>>>>>
>>>>> =Rob
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Long joining node

Posted by Stan Lemon <sl...@salesforce.com>.

I set the stream timeout to 1 hour this morning and started fresh trying to
join this node.  It took about an hour to stream over 230gb of data, and
then into hour 2 I wound up back where I was yesterday, the node's load is
slowly reducing and the netstats does not show sending or receiving
anything.  I'm not sure how long I should wait before I throw the towel in
on this attempt. I'm also not really sure what to try next...

The only thing in the logs currently are three entries like this:

ERROR 07:39:44,447 Exception in thread Thread[CompactionExecutor:31,1,main]
java.lang.RuntimeException: Last written key
DecoratedKey(8633837336094175369,
003076697369746f725f706167655f766965623936636232346331623661313935313634346638303838393465313132373700004930303030663264632d303030302d303033302d343030302d3030303030303030663264633a66376436366166382d383564352d313165342d383030302d30303030303035343764623600)
>= current key DecoratedKey(-6568345298384940765,
003076697369746f725f706167655f766965623936636232346331623661313935313634346638303838393465313132373700004930303030376464652d303030302d303033302d343030302d3030303030303030376464653a64633930336533382d643766342d313165342d383030302d30303030303730626338386300)
writing into
/var/lib/cassandra/data/pi/__shardindex/pi-__shardindex-tmp-jb-644-Data.db
at
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143)
at
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166)
at
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:170)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)



ANY help is greatly appreciated.

Thanks,
Stan





On Tue, Aug 4, 2015 at 2:23 PM, Sebastian Estevez <
sebastian.estevez@datastax.com> wrote:

> That's the one. I set it to an hour to be safe (if a stream goes above the
> timeout it will get restarted) but it can probably be lower.
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
>
>
> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Tue, Aug 4, 2015 at 2:21 PM, Stan Lemon <sl...@salesforce.com> wrote:
>
>> Sebastian,
>> You're referring to streaming_socket_timeout_in_ms correct?  What value
>> do you recommend?  All of my nodes are currently at the default 0.
>>
>> Thanks,
>> Stan
>>
>>
>> On Tue, Aug 4, 2015 at 2:16 PM, Sebastian Estevez <
>> sebastian.estevez@datastax.com> wrote:
>>
>>> It helps to set stream socket timeout in the yaml so that you don't hang
>>> forever on a lost / broken stream.
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>>>
>>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>>> <https://twitter.com/datastax> [image: g+.png]
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax>
>>>
>>>
>>> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli <rc...@eventbrite.com>
>>> wrote:
>>>
>>>> On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon <sl...@salesforce.com>
>>>> wrote:
>>>>
>>>>> I am attempting to add a 13th node in one of the datacenters. I have
>>>>> been monitoring this process from the node itself with nodetool netstats
>>>>> and from one of the existing nodes using nodetool status.
>>>>>
>>>>> On the existing node I see the new node as UJ. I have watched the load
>>>>> steadily climb up to about 203.4gb, and then over the last two hours it has
>>>>> fluctuated a bit and has been steadily dropping to about 203.1gb
>>>>>
>>>>
>>>> It's probably hung. If I were you I'd probably wipe the node and
>>>> re-bootstrap.
>>>>
>>>> (what version of cassandra/what network are you on (AWS?)/etc.)
>>>>
>>>> =Rob
>>>>
>>>>
>>>
>>>
>>
>

Re: Long joining node

Posted by Sebastian Estevez <se...@datastax.com>.

That's the one. I set it to an hour to be safe (if a stream goes above the
timeout it will get restarted) but it can probably be lower.

All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>

<http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Tue, Aug 4, 2015 at 2:21 PM, Stan Lemon <sl...@salesforce.com> wrote:

> Sebastian,
> You're referring to streaming_socket_timeout_in_ms correct?  What value do
> you recommend?  All of my nodes are currently at the default 0.
>
> Thanks,
> Stan
>
>
> On Tue, Aug 4, 2015 at 2:16 PM, Sebastian Estevez <
> sebastian.estevez@datastax.com> wrote:
>
>> It helps to set stream socket timeout in the yaml so that you don't hang
>> forever on a lost / broken stream.
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>>
>> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
>> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
>> <https://twitter.com/datastax> [image: g+.png]
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax>
>>
>>
>> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli <rc...@eventbrite.com> wrote:
>>
>>> On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon <sl...@salesforce.com>
>>> wrote:
>>>
>>>> I am attempting to add a 13th node in one of the datacenters. I have
>>>> been monitoring this process from the node itself with nodetool netstats
>>>> and from one of the existing nodes using nodetool status.
>>>>
>>>> On the existing node I see the new node as UJ. I have watched the load
>>>> steadily climb up to about 203.4gb, and then over the last two hours it has
>>>> fluctuated a bit and has been steadily dropping to about 203.1gb
>>>>
>>>
>>> It's probably hung. If I were you I'd probably wipe the node and
>>> re-bootstrap.
>>>
>>> (what version of cassandra/what network are you on (AWS?)/etc.)
>>>
>>> =Rob
>>>
>>>
>>
>>
>

Re: Long joining node

Posted by Stan Lemon <sl...@salesforce.com>.

Sebastian,
You're referring to streaming_socket_timeout_in_ms correct?  What value do
you recommend?  All of my nodes are currently at the default 0.

Thanks,
Stan


On Tue, Aug 4, 2015 at 2:16 PM, Sebastian Estevez <
sebastian.estevez@datastax.com> wrote:

> It helps to set stream socket timeout in the yaml so that you don't hang
> forever on a lost / broken stream.
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
>
>
> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon <sl...@salesforce.com>
>> wrote:
>>
>>> I am attempting to add a 13th node in one of the datacenters. I have
>>> been monitoring this process from the node itself with nodetool netstats
>>> and from one of the existing nodes using nodetool status.
>>>
>>> On the existing node I see the new node as UJ. I have watched the load
>>> steadily climb up to about 203.4gb, and then over the last two hours it has
>>> fluctuated a bit and has been steadily dropping to about 203.1gb
>>>
>>
>> It's probably hung. If I were you I'd probably wipe the node and
>> re-bootstrap.
>>
>> (what version of cassandra/what network are you on (AWS?)/etc.)
>>
>> =Rob
>>
>>
>
>

Re: Long joining node

Posted by Sebastian Estevez <se...@datastax.com>.

It helps to set stream socket timeout in the yaml so that you don't hang
forever on a lost / broken stream.

All the best,

[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>

<http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon <sl...@salesforce.com> wrote:
>
>> I am attempting to add a 13th node in one of the datacenters. I have been
>> monitoring this process from the node itself with nodetool netstats and
>> from one of the existing nodes using nodetool status.
>>
>> On the existing node I see the new node as UJ. I have watched the load
>> steadily climb up to about 203.4gb, and then over the last two hours it has
>> fluctuated a bit and has been steadily dropping to about 203.1gb
>>
>
> It's probably hung. If I were you I'd probably wipe the node and
> re-bootstrap.
>
> (what version of cassandra/what network are you on (AWS?)/etc.)
>
> =Rob
>
>

Re: Long joining node

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon <sl...@salesforce.com> wrote:

> I am attempting to add a 13th node in one of the datacenters. I have been
> monitoring this process from the node itself with nodetool netstats and
> from one of the existing nodes using nodetool status.
>
> On the existing node I see the new node as UJ. I have watched the load
> steadily climb up to about 203.4gb, and then over the last two hours it has
> fluctuated a bit and has been steadily dropping to about 203.1gb
>

It's probably hung. If I were you I'd probably wipe the node and
re-bootstrap.

(what version of cassandra/what network are you on (AWS?)/etc.)

=Rob