You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Léo FERLIN SUTTON <lf...@mailjet.com.INVALID> on 2019/03/12 16:46:24 UTC

Re: Bootstrap keeps failing

Hello !

Just wanted to let you know : We finally managed to get a solution !

First of all we increased `streaming_socket_timeout_in_ms` to `86400000`.

We are using cassandra-reaper to manage our repairs, they last about 15
days on this cluster and a re-launched almost immediately once they are
finished.
Before bootstrapping we paused the repair operations and launched a
bootstrap.

With these changes we were able to bootstrap a node without any errors. Not
sure if it is due to to the new `streaming_socket_timeout_in_ms` or the
pause of repairs but it now works !

Regards,

Leo

On Thu, Feb 14, 2019 at 7:41 PM Léo FERLIN SUTTON <lf...@mailjet.com>
wrote:

> On Thu, Feb 14, 2019 at 6:56 PM Kenneth Brotman
> <ke...@yahoo.com.invalid> wrote:
>
>> Those aren’t the same error messages so I think progress has been made.
>>
>>
>>
>> What version of C* are you running?
>>
> 3.0.17 We will upgrade to 3.0.18 soon.
>
>> How did you clear out the space?
>>
> I had a few topology changes to cleanup. `nodetool cleanup` did miracles.
>
> Regards,
>
> Leo
>
>>
>> *From:* Léo FERLIN SUTTON [mailto:lferlin@mailjet.com.INVALID]
>> *Sent:* Thursday, February 14, 2019 7:54 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Bootstrap keeps failing
>>
>>
>>
>> Hello again !
>>
>>
>>
>> I have managed to free a lot of disk space and now most nodes hover
>> between 50% and 80%.
>>
>> I am still getting bootstrapping failures :(
>>
>>
>>
>> Here I have some logs :
>>
>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company.internal user err
>> cassandra  [org.apache.cassandra.streaming.StreamSession] [onError] -
>> [Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d] Streaming error occurred
>>
>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user info
>> cassandra  [org.apache.cassandra.streaming.StreamResultFuture]
>> [handleSessionComplete] - [Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d]
>> Session with /10.10.23.1
>>
>> 55 is complete
>>
>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning
>> cassandra  [org.apache.cassandra.streaming.StreamResultFuture]
>> [maybeComplete] - [Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d] Stream
>> failed
>>
>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning
>> cassandra  [org.apache.cassandra.service.StorageService] [onFailure] -
>> Error during bootstrap.
>>
>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user err
>> cassandra  [org.apache.cassandra.service.StorageService] [bootstrap] -
>> Error while waiting on bootstrap to complete. Bootstrap will have to be
>> restarted.
>>
>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning
>> cassandra  [org.apache.cassandra.service.StorageService] [joinTokenRing] -
>> Some data streaming failed. Use nodetool to check bootstrap state and
>> resume. For more, see `nodetool help bootstrap`. IN_PROGRESS
>>
>>
>>
>>
>>
>> I can see a `Streaming error occured` for all of my nodes it is trying to
>> stream from. Is there a way to have more logs to know why the failures
>> occurred ?
>>
>> I have set `<logger name="org.apache.cassandra.streaming.StreamSession"
>> level="DEBUG" />` but it doesn't seem to give me more details, is there
>> another class I should set to DEBUG ?
>>
>>
>>
>> Finally I have also noticed a lot of :
>>
>> [org.apache.cassandra.db.compaction.LeveledManifest]
>> [getCompactionCandidates] - Bootstrapping - doing STCS in L0
>>
>> In my logs files, It might be important.
>>
>>
>>
>> Regards,
>>
>>
>>
>> Leo
>>
>>
>>
>> On Fri, Feb 8, 2019 at 3:59 PM Léo FERLIN SUTTON <lf...@mailjet.com>
>> wrote:
>>
>> On Fri, Feb 8, 2019 at 3:37 PM Kenneth Brotman
>> <ke...@yahoo.com.invalid> wrote:
>>
>> Thanks for the details that helps us understand the situation.  I’m
>> pretty sure you’ve exceed the working capacity of some of those nodes.
>> Going over 50% - 75% depending on compaction strategy is ill-advised.
>>
>> 50% free disk space is a steep price to pay for disk space not used. We
>> have about 90 terabytes of data on SSD and we are paying about 100$ per
>> terabytes of ssd storage (on google cloud).
>>
>> Maybe we can get closer to 75%.
>>
>>
>>
>> Our compaction strategy is `LeveledCompactionStrategy` on our two biggest
>> tables (90% of the data).
>>
>>
>>
>> You need to clear out as much room as possible to add more nodes.
>>
>> Are the tombstones clearing out.
>>
>> I think we don't have a lot of tombstones :
>>
>> We have 0 deletes on our two biggest tables.
>>
>> One of them get updated with new data (messages.messages), but the
>> updates are filling columns previously empty, I am unsure but I think this
>> doesn't cause any tombstones.
>>
>> I have joined the info from `nodetool tablestats` for our two largest
>> tables.
>>
>>
>>
>> We are using cassandra-reaper that manages our repairs. A full repair
>> takes about 13 days. So if we have tombstones they should not be older than
>> 13 days.
>>
>>
>>
>> Are there old snap shots that you can delete.  And so on.
>>
>> Unfortunately no. We take a daily snapshot that we backup then drop.
>>
>>
>>
>> You have to make more room on the existing nodes.
>>
>>
>>
>> I am trying to run `nodetool cleanup` on our most "critical" nodes to see
>> if it helps. If that doesn't do the trick we will only have two solutions :
>>
>>    - Add more disk space on each node
>>    - Adding new nodes
>>
>> We have looked at some other companies case studies and it looks like we
>> have a few very big nodes instead of a lot of smaller ones.
>>
>> We are currently trying to add nodes, and are hoping to eventually
>> transition to a "lot of small nodes" model and be able to add nodes a lot
>> faster.
>>
>>
>>
>> Thank you again for your interest,
>>
>>
>>
>> Regards,
>>
>>
>>
>> Leo
>>
>>
>>
>> *From:* Léo FERLIN SUTTON [mailto:lferlin@mailjet.com.INVALID]
>> *Sent:* Friday, February 08, 2019 6:16 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Bootstrap keeps failing
>>
>>
>>
>> On Thu, Feb 7, 2019 at 10:11 PM Kenneth Brotman
>> <ke...@yahoo.com.invalid> wrote:
>>
>> Lots of things come to mind. We need more information from you to help us
>> understand:
>>
>> How long have you had your cluster running?
>>
>> A bit more than a year old. But it has been constantly growing (3 nodes
>> to 6 nodes to 12 nodes, etc).
>>
>> We have a replication_factor of 3 on all keyspaces and 3 racks with an
>> equal amount of nodes.
>>
>>
>>
>> Is it generally working ok?
>>
>> Works fine. Good performance, repairs managed by cassandra-reaper.
>>
>>
>>
>> Is it just one node that is misbehaving at a time?
>>
>> We only bootstrap nodes one at a time. Sometimes it works flawlessly,
>> sometimes it fails. When it fails it tends to fail a lot in a row before we
>> manage to get it bootstrapped.
>>
>>
>>
>> How many nodes do you need to replace?
>>
>> I am adding nodes, not replacing any. Our nodes are starting to get very
>> full and we wish to add at least 6 more nodes (short-term).
>>
>> Adding a new node is quite slow (48 to 72 hours) and that's when the
>> boostrap process works at the first try.
>>
>>
>>
>> Are you doing rolling restarts instead of simultaneously?
>>
>> Yes.
>>
>>
>>
>> Do you have enough capacity on your machines?  Did you say some of the
>> nodes are at 90% capacity?
>>
>> The free disk space left fluctuates but is generally between 80% and 90%,
>> this is why we are planning to add a lot more nodes.
>>
>>
>>
>> When did this problem begin?
>>
>>  Not sure about this one. Probably since our nodes have more than 2to
>> data, I don't remember it being an issue when our nodes were smaller.
>>
>>
>>
>> Could something be causing a racing condition?
>>
>> We have schema changes every day.
>>
>> We have temporary data stored in cassandra, only used for 6 days then
>> destroyed.
>>
>>
>>
>> In order to avoid tombstones we have a table rotation, every day we
>> create a new table to contain the data for the next day, and we drop the
>> oldest temporary table.
>>
>>
>>
>> This means that when the node starts to bootstrap it will ask other nodes
>> for data that will almost certainly be dropped before the boostrap process
>> is finished.
>>
>>
>>
>> Did you recheck the commands you used to make sure they are correct?
>>
>> What procedure do you use?
>>
>>
>>
>> Our procedure is :
>>
>>    1. We install cassandra on a brand new instance (debian).
>>    2. We install cassandra.
>>    3. We stop the default cassandra (launched by the debian package).
>>    4. We empty these directories :
>>    /var/lib/cassandra/commitlog
>>    /var/lib/cassandra/data
>>    /var/lib/cassandra/saved_caches
>>    5. We put our configuration in place of the default one.
>>    6. We start the cassandra.
>>
>> If after 3 days we see that the node hasn't joined the cluster we check
>> the `nodetool netstats` command to see if the node is still streaming data.
>> If it is not we launch `nodetool bootstrap resume` on the instance.
>>
>>
>>
>> Thank you for you interest in our issue !
>>
>>
>>
>> Regards,
>>
>>
>>
>> Leo
>>
>>
>>
>>
>>
>> *From:* Léo FERLIN SUTTON [mailto:lferlin@mailjet.com.INVALID]
>> *Sent:* Thursday, February 07, 2019 9:16 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: [EXTERNAL] Re: Bootstrap keeps failing
>>
>>
>>
>> Thank you for the recommendation.
>>
>>
>>
>> We are already using datastax's recommended settings for tcp_keepalive
>>
>>
>>
>> Regards,
>>
>>
>>
>> Leo
>>
>>
>>
>> On Thu, Feb 7, 2019 at 5:49 PM Durity, Sean R <
>> SEAN_R_DURITY@homedepot.com> wrote:
>>
>> I have seen unreliable streaming (streaming that doesn’t finish) because
>> of TCP timeouts from firewalls or switches. The default tcp_keepalive
>> kernel parameters are usually not tuned for that. See
>> https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html
>> for more details. These “remote” timeouts are difficult to detect or prove
>> if you don’t have access to the intermediate network equipment.
>>
>>
>>
>> Sean Durity
>>
>> *From:* Léo FERLIN SUTTON <lf...@mailjetcomINVALID>
>> *Sent:* Thursday, February 07, 2019 10:26 AM
>> *To:* user@cassandra.apache.org; dinesh.joshi@yahoo.com
>> *Subject:* [EXTERNAL] Re: Bootstrap keeps failing
>>
>>
>>
>> Hello !
>>
>> Thank you for your answers.
>>
>>
>>
>> So I have tried, multiple times, to start bootstrapping from scratch. I
>> often have the same problem (on other nodes as well) but sometimes it works
>> and I can move on to another node.
>>
>>
>>
>> I have joined a jstack dump and some logs.
>>
>>
>>
>> Our node was shut down at around 97% disk space used
>>
>> I turned it back on and it starting the bootstrap process again.
>>
>>
>>
>> The log file is the log from this attempt, same for the thread dump.
>>
>>
>>
>> Small warning, I have somewhat anonymised the log files so there may be
>> some inconsistencies
>>
>>
>>
>> Regards,
>>
>>
>>
>> Leo
>>
>>
>>
>> On Thu, Feb 7, 2019 at 8:13 AM dinesh.joshi@yahoo.com.INVALID <
>> dinesh.joshi@yahoo.com.invalid <di...@yahoocom.invalid>> wrote:
>>
>> Would it be possible for you to take a thread dump & logs and share them?
>>
>>
>>
>> Dinesh
>>
>>
>>
>>
>>
>> On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON <
>> lferlin@mailjet.com.INVALID> wrote:
>>
>>
>>
>>
>>
>> Hello !
>>
>>
>>
>> I am having a recurrent problem when trying to bootstrap a few new nodes.
>>
>>
>>
>> Some general info :
>>
>>    - I am running cassandra 3.0.17
>>    - We have about 30 nodes in our cluster
>>    - All healthy nodes have between 60% to 90% used disk space on
>>    /var/lib/cassandra
>>
>> So I create a new node and let auto_bootstrap do it's job. After a few
>> days the bootstrapping node stops streaming new data but is still not a
>> member of the cluster.
>>
>>
>>
>> `nodetool status` says the node is still joining,
>>
>>
>>
>> When this happens I run `nodetool bootstrap resume`. This usually ends up
>> in two different ways :
>>
>>    1. The node fills up to 100% disk space and crashes.
>>    2. The bootstrap resume finishes with errors
>>
>> When I look at `nodetool netstats -H` is  looks like `bootstrap resume`
>> does not resume but restarts a full transfer of every data from every node.
>>
>>
>>
>> This is the output I get from `nodetool resume` :
>>
>> [2019-02-06 01:39:14,369] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:39:16,821] received file
>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:39:17,003] received file
>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress:
>> 2113%)
>>
>> [2019-02-06 01:41:15,160] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:42:02,864] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:42:09,284] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:42:10,522] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:42:10,622] received file
>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
>> (progress: 2113%)
>>
>> [2019-02-06 01:42:11,925] received file
>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
>> (progress: 2114%)
>>
>> [2019-02-06 01:42:14,887] received file
>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db
>> (progress: 2114%)
>>
>> [2019-02-06 01:42:14,980] session with /10.16.XX.ZZZ complete (progress:
>> 2114%)
>>
>> [2019-02-06 01:42:14,980] Stream failed
>>
>> [2019-02-06 01:42:14,982] Error during bootstrap: Stream failed
>>
>> [2019-02-06 01:42:14,982] Resume bootstrap complete
>>
>>
>>
>> The bootstrap `progress` goes way over 100% and eventually fails.
>>
>>
>>
>>
>>
>> Right now I have a node with this output from `nodetool status` :
>>
>> `UJ  10.16.XX.YYY  2.93 TB    256          ?
>>  5788f061-a3c0-46af-b712-ebeecd397bf7  c`
>>
>>
>>
>> It is almost filled with data, yet if I look at `nodetool netstats` :
>>
>>         Receiving 480 files, 325.39 GB total. Already received 5 files,
>> 68.32 MB total
>>         Receiving 499 files, 328.96 GB total. Already received 1 files,
>> 1.32 GB total
>>         Receiving 506 files, 345.33 GB total. Already received 6 files,
>> 24.19 MB total
>>         Receiving 362 files, 206.73 GB total. Already received 7 files,
>> 34 MB total
>>         Receiving 424 files, 281.25 GB total. Already received 1 files,
>> 1.3 GB total
>>         Receiving 581 files, 349.26 GB total. Already received 8 files,
>> 45.96 MB total
>>         Receiving 443 files, 337.26 GB total. Already received 6 files,
>> 96.15 MB total
>>         Receiving 424 files, 275.23 GB total. Already received 5 files,
>> 42.67 MB total
>>
>>
>>
>> It is trying to pull all the data again.
>>
>>
>>
>> Am I missing something about the way `nodetool bootstrap resume` is
>> supposed to be used ?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Leo
>>
>>
>>
>>
>> ------------------------------
>>
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>>

Re: Bootstrap keeps failing

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.
We are experiencing bootstrap problems in 2.2.x and 3.11.x with
bootstrapping when clusters hit 30 nodes, across multiple datacenters.

We will try some of the stuff here and hopefully it helps us.

On Tue, Mar 12, 2019 at 11:46 AM Léo FERLIN SUTTON
<lf...@mailjet.com.invalid> wrote:

> Hello !
>
> Just wanted to let you know : We finally managed to get a solution !
>
> First of all we increased `streaming_socket_timeout_in_ms` to `86400000`.
>
> We are using cassandra-reaper to manage our repairs, they last about 15
> days on this cluster and a re-launched almost immediately once they are
> finished.
> Before bootstrapping we paused the repair operations and launched a
> bootstrap.
>
> With these changes we were able to bootstrap a node without any errors.
> Not sure if it is due to to the new `streaming_socket_timeout_in_ms` or the
> pause of repairs but it now works !
>
> Regards,
>
> Leo
>
> On Thu, Feb 14, 2019 at 7:41 PM Léo FERLIN SUTTON <lf...@mailjet.com>
> wrote:
>
>> On Thu, Feb 14, 2019 at 6:56 PM Kenneth Brotman
>> <ke...@yahoo.com.invalid> wrote:
>>
>>> Those aren’t the same error messages so I think progress has been made.
>>>
>>>
>>>
>>> What version of C* are you running?
>>>
>> 3.0.17 We will upgrade to 3.0.18 soon.
>>
>>> How did you clear out the space?
>>>
>> I had a few topology changes to cleanup. `nodetool cleanup` did miracles.
>>
>> Regards,
>>
>> Leo
>>
>>>
>>> *From:* Léo FERLIN SUTTON [mailto:lferlin@mailjet.com.INVALID]
>>> *Sent:* Thursday, February 14, 2019 7:54 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Bootstrap keeps failing
>>>
>>>
>>>
>>> Hello again !
>>>
>>>
>>>
>>> I have managed to free a lot of disk space and now most nodes hover
>>> between 50% and 80%.
>>>
>>> I am still getting bootstrapping failures :(
>>>
>>>
>>>
>>> Here I have some logs :
>>>
>>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company.internal user err
>>> cassandra  [org.apache.cassandra.streaming.StreamSession] [onError] -
>>> [Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d] Streaming error occurred
>>>
>>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user info
>>> cassandra  [org.apache.cassandra.streaming.StreamResultFuture]
>>> [handleSessionComplete] - [Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d]
>>> Session with /10.10.23.1
>>>
>>> 55 is complete
>>>
>>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning
>>> cassandra  [org.apache.cassandra.streaming.StreamResultFuture]
>>> [maybeComplete] - [Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d] Stream
>>> failed
>>>
>>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning
>>> cassandra  [org.apache.cassandra.service.StorageService] [onFailure] -
>>> Error during bootstrap.
>>>
>>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user err
>>> cassandra  [org.apache.cassandra.service.StorageService] [bootstrap] -
>>> Error while waiting on bootstrap to complete. Bootstrap will have to be
>>> restarted.
>>>
>>> 2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning
>>> cassandra  [org.apache.cassandra.service.StorageService] [joinTokenRing] -
>>> Some data streaming failed. Use nodetool to check bootstrap state and
>>> resume. For more, see `nodetool help bootstrap`. IN_PROGRESS
>>>
>>>
>>>
>>>
>>>
>>> I can see a `Streaming error occured` for all of my nodes it is trying
>>> to stream from. Is there a way to have more logs to know why the failures
>>> occurred ?
>>>
>>> I have set `<logger name="org.apache.cassandra.streaming.StreamSession"
>>> level="DEBUG" />` but it doesn't seem to give me more details, is there
>>> another class I should set to DEBUG ?
>>>
>>>
>>>
>>> Finally I have also noticed a lot of :
>>>
>>> [org.apache.cassandra.db.compaction.LeveledManifest]
>>> [getCompactionCandidates] - Bootstrapping - doing STCS in L0
>>>
>>> In my logs files, It might be important.
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Leo
>>>
>>>
>>>
>>> On Fri, Feb 8, 2019 at 3:59 PM Léo FERLIN SUTTON <lf...@mailjet.com>
>>> wrote:
>>>
>>> On Fri, Feb 8, 2019 at 3:37 PM Kenneth Brotman
>>> <ke...@yahoo.com.invalid> wrote:
>>>
>>> Thanks for the details that helps us understand the situation.  I’m
>>> pretty sure you’ve exceed the working capacity of some of those nodes.
>>> Going over 50% - 75% depending on compaction strategy is ill-advised.
>>>
>>> 50% free disk space is a steep price to pay for disk space not used. We
>>> have about 90 terabytes of data on SSD and we are paying about 100$ per
>>> terabytes of ssd storage (on google cloud).
>>>
>>> Maybe we can get closer to 75%.
>>>
>>>
>>>
>>> Our compaction strategy is `LeveledCompactionStrategy` on our two
>>> biggest tables (90% of the data).
>>>
>>>
>>>
>>> You need to clear out as much room as possible to add more nodes.
>>>
>>> Are the tombstones clearing out.
>>>
>>> I think we don't have a lot of tombstones :
>>>
>>> We have 0 deletes on our two biggest tables.
>>>
>>> One of them get updated with new data (messages.messages), but the
>>> updates are filling columns previously empty, I am unsure but I think this
>>> doesn't cause any tombstones.
>>>
>>> I have joined the info from `nodetool tablestats` for our two largest
>>> tables.
>>>
>>>
>>>
>>> We are using cassandra-reaper that manages our repairs. A full repair
>>> takes about 13 days. So if we have tombstones they should not be older than
>>> 13 days.
>>>
>>>
>>>
>>> Are there old snap shots that you can delete.  And so on.
>>>
>>> Unfortunately no. We take a daily snapshot that we backup then drop.
>>>
>>>
>>>
>>> You have to make more room on the existing nodes.
>>>
>>>
>>>
>>> I am trying to run `nodetool cleanup` on our most "critical" nodes to
>>> see if it helps. If that doesn't do the trick we will only have two
>>> solutions :
>>>
>>>    - Add more disk space on each node
>>>    - Adding new nodes
>>>
>>> We have looked at some other companies case studies and it looks like we
>>> have a few very big nodes instead of a lot of smaller ones.
>>>
>>> We are currently trying to add nodes, and are hoping to eventually
>>> transition to a "lot of small nodes" model and be able to add nodes a lot
>>> faster.
>>>
>>>
>>>
>>> Thank you again for your interest,
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Leo
>>>
>>>
>>>
>>> *From:* Léo FERLIN SUTTON [mailto:lferlin@mailjet.com.INVALID]
>>> *Sent:* Friday, February 08, 2019 6:16 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Bootstrap keeps failing
>>>
>>>
>>>
>>> On Thu, Feb 7, 2019 at 10:11 PM Kenneth Brotman
>>> <ke...@yahoo.com.invalid> wrote:
>>>
>>> Lots of things come to mind. We need more information from you to help
>>> us understand:
>>>
>>> How long have you had your cluster running?
>>>
>>> A bit more than a year old. But it has been constantly growing (3 nodes
>>> to 6 nodes to 12 nodes, etc).
>>>
>>> We have a replication_factor of 3 on all keyspaces and 3 racks with an
>>> equal amount of nodes.
>>>
>>>
>>>
>>> Is it generally working ok?
>>>
>>> Works fine. Good performance, repairs managed by cassandra-reaper.
>>>
>>>
>>>
>>> Is it just one node that is misbehaving at a time?
>>>
>>> We only bootstrap nodes one at a time. Sometimes it works flawlessly,
>>> sometimes it fails. When it fails it tends to fail a lot in a row before we
>>> manage to get it bootstrapped.
>>>
>>>
>>>
>>> How many nodes do you need to replace?
>>>
>>> I am adding nodes, not replacing any. Our nodes are starting to get very
>>> full and we wish to add at least 6 more nodes (short-term).
>>>
>>> Adding a new node is quite slow (48 to 72 hours) and that's when the
>>> boostrap process works at the first try.
>>>
>>>
>>>
>>> Are you doing rolling restarts instead of simultaneously?
>>>
>>> Yes.
>>>
>>>
>>>
>>> Do you have enough capacity on your machines?  Did you say some of the
>>> nodes are at 90% capacity?
>>>
>>> The free disk space left fluctuates but is generally between 80% and
>>> 90%, this is why we are planning to add a lot more nodes.
>>>
>>>
>>>
>>> When did this problem begin?
>>>
>>>  Not sure about this one. Probably since our nodes have more than 2to
>>> data, I don't remember it being an issue when our nodes were smaller.
>>>
>>>
>>>
>>> Could something be causing a racing condition?
>>>
>>> We have schema changes every day.
>>>
>>> We have temporary data stored in cassandra, only used for 6 days then
>>> destroyed.
>>>
>>>
>>>
>>> In order to avoid tombstones we have a table rotation, every day we
>>> create a new table to contain the data for the next day, and we drop the
>>> oldest temporary table.
>>>
>>>
>>>
>>> This means that when the node starts to bootstrap it will ask other
>>> nodes for data that will almost certainly be dropped before the boostrap
>>> process is finished.
>>>
>>>
>>>
>>> Did you recheck the commands you used to make sure they are correct?
>>>
>>> What procedure do you use?
>>>
>>>
>>>
>>> Our procedure is :
>>>
>>>    1. We install cassandra on a brand new instance (debian).
>>>    2. We install cassandra.
>>>    3. We stop the default cassandra (launched by the debian package).
>>>    4. We empty these directories :
>>>    /var/lib/cassandra/commitlog
>>>    /var/lib/cassandra/data
>>>    /var/lib/cassandra/saved_caches
>>>    5. We put our configuration in place of the default one.
>>>    6. We start the cassandra.
>>>
>>> If after 3 days we see that the node hasn't joined the cluster we check
>>> the `nodetool netstats` command to see if the node is still streaming data.
>>> If it is not we launch `nodetool bootstrap resume` on the instance.
>>>
>>>
>>>
>>> Thank you for you interest in our issue !
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Leo
>>>
>>>
>>>
>>>
>>>
>>> *From:* Léo FERLIN SUTTON [mailto:lferlin@mailjet.com.INVALID]
>>> *Sent:* Thursday, February 07, 2019 9:16 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: [EXTERNAL] Re: Bootstrap keeps failing
>>>
>>>
>>>
>>> Thank you for the recommendation.
>>>
>>>
>>>
>>> We are already using datastax's recommended settings for tcp_keepalive
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Leo
>>>
>>>
>>>
>>> On Thu, Feb 7, 2019 at 5:49 PM Durity, Sean R <
>>> SEAN_R_DURITY@homedepot.com> wrote:
>>>
>>> I have seen unreliable streaming (streaming that doesn’t finish) because
>>> of TCP timeouts from firewalls or switches. The default tcp_keepalive
>>> kernel parameters are usually not tuned for that. See
>>> https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html
>>> for more details. These “remote” timeouts are difficult to detect or prove
>>> if you don’t have access to the intermediate network equipment.
>>>
>>>
>>>
>>> Sean Durity
>>>
>>> *From:* Léo FERLIN SUTTON <lf...@mailjetcomINVALID>
>>> *Sent:* Thursday, February 07, 2019 10:26 AM
>>> *To:* user@cassandra.apache.org; dinesh.joshi@yahoo.com
>>> *Subject:* [EXTERNAL] Re: Bootstrap keeps failing
>>>
>>>
>>>
>>> Hello !
>>>
>>> Thank you for your answers.
>>>
>>>
>>>
>>> So I have tried, multiple times, to start bootstrapping from scratch. I
>>> often have the same problem (on other nodes as well) but sometimes it works
>>> and I can move on to another node.
>>>
>>>
>>>
>>> I have joined a jstack dump and some logs.
>>>
>>>
>>>
>>> Our node was shut down at around 97% disk space used
>>>
>>> I turned it back on and it starting the bootstrap process again.
>>>
>>>
>>>
>>> The log file is the log from this attempt, same for the thread dump.
>>>
>>>
>>>
>>> Small warning, I have somewhat anonymised the log files so there may be
>>> some inconsistencies
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Leo
>>>
>>>
>>>
>>> On Thu, Feb 7, 2019 at 8:13 AM dinesh.joshi@yahoo.com.INVALID <
>>> dinesh.joshi@yahoo.com.invalid <di...@yahoocom.invalid>> wrote:
>>>
>>> Would it be possible for you to take a thread dump & logs and share them?
>>>
>>>
>>>
>>> Dinesh
>>>
>>>
>>>
>>>
>>>
>>> On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON <
>>> lferlin@mailjet.com.INVALID> wrote:
>>>
>>>
>>>
>>>
>>>
>>> Hello !
>>>
>>>
>>>
>>> I am having a recurrent problem when trying to bootstrap a few new nodes.
>>>
>>>
>>>
>>> Some general info :
>>>
>>>    - I am running cassandra 3.0.17
>>>    - We have about 30 nodes in our cluster
>>>    - All healthy nodes have between 60% to 90% used disk space on
>>>    /var/lib/cassandra
>>>
>>> So I create a new node and let auto_bootstrap do it's job. After a few
>>> days the bootstrapping node stops streaming new data but is still not a
>>> member of the cluster.
>>>
>>>
>>>
>>> `nodetool status` says the node is still joining,
>>>
>>>
>>>
>>> When this happens I run `nodetool bootstrap resume`. This usually ends
>>> up in two different ways :
>>>
>>>    1. The node fills up to 100% disk space and crashes.
>>>    2. The bootstrap resume finishes with errors
>>>
>>> When I look at `nodetool netstats -H` is  looks like `bootstrap resume`
>>> does not resume but restarts a full transfer of every data from every node.
>>>
>>>
>>>
>>> This is the output I get from `nodetool resume` :
>>>
>>> [2019-02-06 01:39:14,369] received file
>>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
>>> (progress: 2113%)
>>>
>>> [2019-02-06 01:39:16,821] received file
>>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
>>> (progress: 2113%)
>>>
>>> [2019-02-06 01:39:17,003] received file
>>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
>>> (progress: 2113%)
>>>
>>> [2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress:
>>> 2113%)
>>>
>>> [2019-02-06 01:41:15,160] received file
>>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
>>> (progress: 2113%)
>>>
>>> [2019-02-06 01:42:02,864] received file
>>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
>>> (progress: 2113%)
>>>
>>> [2019-02-06 01:42:09,284] received file
>>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
>>> (progress: 2113%)
>>>
>>> [2019-02-06 01:42:10,522] received file
>>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
>>> (progress: 2113%)
>>>
>>> [2019-02-06 01:42:10,622] received file
>>> /var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
>>> (progress: 2113%)
>>>
>>> [2019-02-06 01:42:11,925] received file
>>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
>>> (progress: 2114%)
>>>
>>> [2019-02-06 01:42:14,887] received file
>>> /var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db
>>> (progress: 2114%)
>>>
>>> [2019-02-06 01:42:14,980] session with /10.16.XX.ZZZ complete (progress:
>>> 2114%)
>>>
>>> [2019-02-06 01:42:14,980] Stream failed
>>>
>>> [2019-02-06 01:42:14,982] Error during bootstrap: Stream failed
>>>
>>> [2019-02-06 01:42:14,982] Resume bootstrap complete
>>>
>>>
>>>
>>> The bootstrap `progress` goes way over 100% and eventually fails.
>>>
>>>
>>>
>>>
>>>
>>> Right now I have a node with this output from `nodetool status` :
>>>
>>> `UJ  10.16.XX.YYY  2.93 TB    256          ?
>>>  5788f061-a3c0-46af-b712-ebeecd397bf7  c`
>>>
>>>
>>>
>>> It is almost filled with data, yet if I look at `nodetool netstats` :
>>>
>>>         Receiving 480 files, 325.39 GB total. Already received 5 files,
>>> 68.32 MB total
>>>         Receiving 499 files, 328.96 GB total. Already received 1 files,
>>> 1.32 GB total
>>>         Receiving 506 files, 345.33 GB total. Already received 6 files,
>>> 24.19 MB total
>>>         Receiving 362 files, 206.73 GB total. Already received 7 files,
>>> 34 MB total
>>>         Receiving 424 files, 281.25 GB total. Already received 1 files,
>>> 1.3 GB total
>>>         Receiving 581 files, 349.26 GB total. Already received 8 files,
>>> 45.96 MB total
>>>         Receiving 443 files, 337.26 GB total. Already received 6 files,
>>> 96.15 MB total
>>>         Receiving 424 files, 275.23 GB total. Already received 5 files,
>>> 42.67 MB total
>>>
>>>
>>>
>>> It is trying to pull all the data again.
>>>
>>>
>>>
>>> Am I missing something about the way `nodetool bootstrap resume` is
>>> supposed to be used ?
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Leo
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>
>>> The information in this Internet Email is confidential and may be
>>> legally privileged. It is intended solely for the addressee. Access to this
>>> Email by anyone else is unauthorized. If you are not the intended
>>> recipient, any disclosure, copying, distribution or any action taken or
>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>> When addressed to our clients any opinions or advice contained in this
>>> Email are subject to the terms and conditions expressed in any applicable
>>> governing The Home Depot terms of business or client engagement letter. The
>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>> content of this attachment and for any damages or losses arising from any
>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>> items of a destructive nature, which may be contained in this attachment
>>> and shall not be liable for direct, indirect, consequential or special
>>> damages in connection with this e-mail message or its attachment.
>>>
>>>