You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Stefano Ortolani <os...@gmail.com> on 2017/10/13 12:28:56 UTC

Re: Bootstrapping a node fails because of compactions not keeping up

I have been trying to add another node to the cluster (after upgrading to
3.0.15) and I just noticed through "nodetool netstats" that all nodes have
been streaming to the joining node approx 1/3 of their SSTables, basically
their whole primary range (using RF=3)?

Is this expected/normal?
I was under the impression only the necessary SSTables were going to be
streamed...

Thanks for the help,
Stefano

On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <ku...@instaclustr.com> wrote:

> But if it also streams, it means I'd still be under-pressure if I am not
>> mistaken. I am under the assumption that the compactions are the by-product
>> of streaming too many SStables at the same time, and not because of my
>> current write load.
>>
> Ah yeah I wasn't thinking about the capacity problem, more of the
> performance impact from the node being backed up with compactions. If you
> haven't already, you should try disable stcs in l0 on the joining node. You
> will likely still need to do a lot of compactions, but generally they
> should be smaller. The  option is -Dcassandra.disable_stcs_in_l0=true
>
>>  I just noticed you were mentioning L1 tables too. Why would that affect
>> the disk footprint?
>
> If you've been doing a lot of STCS in L0, you generally end up with some
> large SSTables. These will eventually have to be compacted with L1. Could
> also be suffering the problem of streamed SSTables causing large
> cross-level compactions in the higher levels as well.
> 
>

Re: Bootstrapping a node fails because of compactions not keeping up

Posted by Stefano Ortolani <os...@gmail.com>.

Nice catch!
I’ve totally overlooked it.

Thanks a lot!
Stefano

On Sun, 15 Oct 2017 at 22:14, Jeff Jirsa <jj...@gmail.com> wrote:

> (Should still be able to complete, unless you’re running out of disk or
> memory or similar, but that’s why it’s streaming more than you expect)
>
>
> --
> Jeff Jirsa
>
>
> On Oct 15, 2017, at 1:51 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>
> I
> You’re adding the new node as rac3
>
> The rack aware policy is going to make sure you get the rack diversity you
> asked for by making sure one replica of each partition is in rac3, which is
> going to blow up that instance
>
>
>
> --
> Jeff Jirsa
>
>
> On Oct 15, 2017, at 1:42 PM, Stefano Ortolani <os...@gmail.com> wrote:
>
> Hi Jeff,
>
> this my third attempt bootstrapping the node so I tried several tricks
> that might partially explain the output I am posting.
>
> * To make the bootstrap incremental, I have been throttling the streams on
> all nodes to 1Mbits. I have selectively unthrottling one node at a time
> hoping that would unlock some routines compacting away redundant data
> (you'll see that nodetool netstats reports back fewer nodes than nodetool
> status).
> * Since compactions have had the tendency of getting stuck (hundreds
> pending but none executing) in previous bootstraps, I've tried issuing a
> manual "nodetool compact" on the boostrapping node.
>
> Having said that, this is the output of the commands,
>
> Thanks a lot,
> Stefano
>
> *nodetool status*
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address      Load       Tokens       Owns    Host ID
>             Rack
> UN  X.Y.33.8   342.4 GB   256          ?
> afaae414-30cc-439d-9785-1b7d35f74529  RAC1
> UN  X.Y.81.4   325.98 GB  256          ?
> 00a96a5d-3bfd-497f-91f3-973b75146162  RAC2
> UN  X.Y.33.4   348.81 GB  256          ?
> 1d8e6588-e25b-456a-8f29-0dedc35bda8e  RAC1
> UN  X.Y.33.5   384.99 GB  256          ?
> 13d03fd2-7528-466b-b4b5-1b46508e2465  RAC1
> UN  X.Y.81.5   336.27 GB  256          ?
> aa161400-6c0e-4bde-bcb3-b2e7e7840196  RAC2
> UN  X.Y.33.6   377.22 GB  256          ?
> 43a393ba-6805-4e33-866f-124360174b28  RAC1
> UN  X.Y.81.6   329.61 GB  256          ?
> 4c3c64ae-ef4f-4986-9341-573830416997  RAC2
> UN  X.Y.33.7   344.25 GB  256          ?
> 03d81879-dc0d-4118-92e3-b3013dfde480  RAC1
> UN  X.Y.81.7   324.93 GB  256          ?
> 24bbf4b6-9427-4ed1-a751-a55cc24cc756  RAC2
> UN  X.Y.81.1   323.8 GB   256          ?
> 26244100-0565-4567-ae9c-0fc5346f5558  RAC2
> UJ  X.Y.177.2  724.5 GB   256          ?
> e269a06b-c0c0-43a6-922c-f04c98898e0d  RAC3
> UN  X.Y.81.2   337.83 GB  256          ?
> 09e29429-15ff-44d6-9742-ac95c83c4d9e  RAC2
> UN  X.Y.81.3   326.4 GB   256          ?
> feaa7b27-7ab8-4fc2-b64a-c9df3dd86d97  RAC2
> UN  X.Y.33.3   350.4 GB   256          ?
> cc115991-b7e7-4d06-87b5-8ad5efd45da5  RAC1
>
>
> *nodetool netstats -H | grep "Already received" -B 1*
>     /X.Y.81.4
>         Receiving 1992 files, 103.68 GB total. Already received 515 files,
> 23.32 GB total
> --
>     /X.Y.81.7
>         Receiving 1936 files, 89.35 GB total. Already received 554 files,
> 23.32 GB total
> --
>     /X.Y.81.5
>         Receiving 1926 files, 95.69 GB total. Already received 545 files,
> 23.31 GB total
> --
>     /X.Y.81.2
>         Receiving 1992 files, 100.81 GB total. Already received 537 files,
> 23.32 GB total
> --
>     /X.Y.81.3
>         Receiving 1958 files, 104.72 GB total. Already received 503 files,
> 23.31 GB total
> --
>     /X.Y.81.1
>         Receiving 2034 files, 104.51 GB total. Already received 520 files,
> 23.33 GB total
> --
>     /X.Y.81.6
>         Receiving 1962 files, 96.19 GB total. Already received 547 files,
> 23.32 GB total
> --
>     /X.Y.33.5
>         Receiving 2121 files, 97.44 GB total. Already received 601 files,
> 23.32 GB total
>
> *nodetool tpstats*
> Pool Name                    Active   Pending      Completed   Blocked
>  All time blocked
> MutationStage                     0         0      828367015         0
>             0
> ViewMutationStage                 0         0              0         0
>             0
> ReadStage                         0         0              0         0
>             0
> RequestResponseStage              0         0             13         0
>             0
> ReadRepairStage                   0         0              0         0
>             0
> CounterMutationStage              0         0              0         0
>             0
> MiscStage                         0         0              0         0
>             0
> CompactionExecutor                1         1          12150         0
>             0
> MemtableReclaimMemory             0         0           7368         0
>             0
> PendingRangeCalculator            0         0             14         0
>             0
> GossipStage                       0         0         599329         0
>             0
> SecondaryIndexManagement          0         0              0         0
>             0
> HintsDispatcher                   0         0              0         0
>             0
> MigrationStage                    0         0             27         0
>             0
> MemtablePostFlush                 0         0           8112         0
>             0
> ValidationExecutor                0         0              0         0
>             0
> Sampler                           0         0              0         0
>             0
> MemtableFlushWriter               0         0           7368         0
>             0
> InternalResponseStage             0         0             25         0
>             0
> AntiEntropyStage                  0         0              0         0
>             0
> CacheCleanupExecutor              0         0              0         0
>             0
>
> Message type           Dropped
> READ                         0
> RANGE_SLICE                  0
> _TRACE                       0
> HINT                         0
> MUTATION                     1
> COUNTER_MUTATION             0
> BATCH_STORE                  0
> BATCH_REMOVE                 0
> REQUEST_RESPONSE             0
> PAGED_RANGE                  0
> READ_REPAIR                  0
>
> *nodetool compactionstats -H*
> pending tasks: 776
>                                      id   compaction type         keyspace
>                   table   completed     total    unit   progress
>    24d039f2-b1e6-11e7-ac57-3d25e38b2f5c        Compaction   keyspace_1
> table_1     4.85 GB   7.67 GB   bytes     63.25%
> Active compaction remaining time :        n/a
>
>
> On Sun, Oct 15, 2017 at 9:27 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>
>> Can you post (anonymize as needed) nodetool status, nodetool netstats,
>> nodetool tpstats, and nodetool compctionstats ?
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Oct 15, 2017, at 1:14 PM, Stefano Ortolani <os...@gmail.com> wrote:
>>
>> Hi Jeff,
>>
>> that would be 3.0.15, single disk, vnodes enabled (num_tokens 256).
>>
>> Stefano
>>
>> On Sun, Oct 15, 2017 at 9:11 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>>
>>> What version?
>>>
>>> Single disk or JBOD?
>>>
>>> Vnodes?
>>>
>>> --
>>> Jeff Jirsa
>>>
>>>
>>> On Oct 15, 2017, at 12:49 PM, Stefano Ortolani <os...@gmail.com>
>>> wrote:
>>>
>>> Hi all,
>>>
>>> I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck
>>> so far.
>>> Based on the source code it seems that this option doesn't affect
>>> compactions while bootstrapping.
>>>
>>> I am getting quite confused as it seems I am not able to bootstrap a
>>> node if I don't have at least 6/7 times the disk space used by other nodes.
>>> This is weird. The host I am bootstrapping is using a SSD. Also
>>> compaction throughput is unthrottled (set to 0) and the compacting threads
>>> are set to 8.
>>> Nevertheless, primary ranges from other nodes are being streamed, but
>>> data is never compacted away.
>>>
>>> Does anybody know anything else I could try?
>>>
>>> Cheers,
>>> Stefano
>>>
>>> On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <os...@gmail.com>
>>> wrote:
>>>
>>>> Other little update: at the same time I see the number of pending tasks
>>>> stuck (in this case at 1847); restarting the node doesn't help, so I can't
>>>> really force the node to "digest" all those compactions. In the meanwhile
>>>> the disk occupied is already twice the average load I have on other nodes.
>>>>
>>>> Feeling more and more puzzled here :S
>>>>
>>>> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <os...@gmail.com>
>>>> wrote:
>>>>
>>>>> I have been trying to add another node to the cluster (after upgrading
>>>>> to 3.0.15) and I just noticed through "nodetool netstats" that all nodes
>>>>> have been streaming to the joining node approx 1/3 of their SSTables,
>>>>> basically their whole primary range (using RF=3)?
>>>>>
>>>>> Is this expected/normal?
>>>>> I was under the impression only the necessary SSTables were going to
>>>>> be streamed...
>>>>>
>>>>> Thanks for the help,
>>>>> Stefano
>>>>>
>>>>>
>>>>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <ku...@instaclustr.com>
>>>>> wrote:
>>>>>
>>>>>> But if it also streams, it means I'd still be under-pressure if I am
>>>>>>> not mistaken. I am under the assumption that the compactions are the
>>>>>>> by-product of streaming too many SStables at the same time, and not because
>>>>>>> of my current write load.
>>>>>>>
>>>>>> Ah yeah I wasn't thinking about the capacity problem, more of the
>>>>>> performance impact from the node being backed up with compactions. If you
>>>>>> haven't already, you should try disable stcs in l0 on the joining node. You
>>>>>> will likely still need to do a lot of compactions, but generally they
>>>>>> should be smaller. The  option is -Dcassandra.disable_stcs_in_l0=true
>>>>>>
>>>>>>>  I just noticed you were mentioning L1 tables too. Why would that
>>>>>>> affect the disk footprint?
>>>>>>
>>>>>> If you've been doing a lot of STCS in L0, you generally end up with
>>>>>> some large SSTables. These will eventually have to be compacted with L1.
>>>>>> Could also be suffering the problem of streamed SSTables causing large
>>>>>> cross-level compactions in the higher levels as well.
>>>>>> 
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Bootstrapping a node fails because of compactions not keeping up

Posted by Jeff Jirsa <jj...@gmail.com>.

(Should still be able to complete, unless you’re running out of disk or memory or similar, but that’s why it’s streaming more than you expect)


-- 
Jeff Jirsa


> On Oct 15, 2017, at 1:51 PM, Jeff Jirsa <jj...@gmail.com> wrote:
> 
> I
> You’re adding the new node as rac3
> 
> The rack aware policy is going to make sure you get the rack diversity you asked for by making sure one replica of each partition is in rac3, which is going to blow up that instance
> 
> 
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Oct 15, 2017, at 1:42 PM, Stefano Ortolani <os...@gmail.com> wrote:
>> 
>> Hi Jeff,
>> 
>> this my third attempt bootstrapping the node so I tried several tricks that might partially explain the output I am posting.
>> 
>> * To make the bootstrap incremental, I have been throttling the streams on all nodes to 1Mbits. I have selectively unthrottling one node at a time hoping that would unlock some routines compacting away redundant data (you'll see that nodetool netstats reports back fewer nodes than nodetool status).
>> * Since compactions have had the tendency of getting stuck (hundreds pending but none executing) in previous bootstraps, I've tried issuing a manual "nodetool compact" on the boostrapping node.
>> 
>> Having said that, this is the output of the commands,
>> 
>> Thanks a lot,
>> Stefano
>> 
>> nodetool status
>> Datacenter: DC1
>> ===============
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address      Load       Tokens       Owns    Host ID                               Rack
>> UN  X.Y.33.8   342.4 GB   256          ?       afaae414-30cc-439d-9785-1b7d35f74529  RAC1
>> UN  X.Y.81.4   325.98 GB  256          ?       00a96a5d-3bfd-497f-91f3-973b75146162  RAC2
>> UN  X.Y.33.4   348.81 GB  256          ?       1d8e6588-e25b-456a-8f29-0dedc35bda8e  RAC1
>> UN  X.Y.33.5   384.99 GB  256          ?       13d03fd2-7528-466b-b4b5-1b46508e2465  RAC1
>> UN  X.Y.81.5   336.27 GB  256          ?       aa161400-6c0e-4bde-bcb3-b2e7e7840196  RAC2
>> UN  X.Y.33.6   377.22 GB  256          ?       43a393ba-6805-4e33-866f-124360174b28  RAC1
>> UN  X.Y.81.6   329.61 GB  256          ?       4c3c64ae-ef4f-4986-9341-573830416997  RAC2
>> UN  X.Y.33.7   344.25 GB  256          ?       03d81879-dc0d-4118-92e3-b3013dfde480  RAC1
>> UN  X.Y.81.7   324.93 GB  256          ?       24bbf4b6-9427-4ed1-a751-a55cc24cc756  RAC2
>> UN  X.Y.81.1   323.8 GB   256          ?       26244100-0565-4567-ae9c-0fc5346f5558  RAC2
>> UJ  X.Y.177.2  724.5 GB   256          ?       e269a06b-c0c0-43a6-922c-f04c98898e0d  RAC3
>> UN  X.Y.81.2   337.83 GB  256          ?       09e29429-15ff-44d6-9742-ac95c83c4d9e  RAC2
>> UN  X.Y.81.3   326.4 GB   256          ?       feaa7b27-7ab8-4fc2-b64a-c9df3dd86d97  RAC2
>> UN  X.Y.33.3   350.4 GB   256          ?       cc115991-b7e7-4d06-87b5-8ad5efd45da5  RAC1
>> 
>> 
>> nodetool netstats -H | grep "Already received" -B 1
>>     /X.Y.81.4
>>         Receiving 1992 files, 103.68 GB total. Already received 515 files, 23.32 GB total
>> --
>>     /X.Y.81.7
>>         Receiving 1936 files, 89.35 GB total. Already received 554 files, 23.32 GB total
>> --
>>     /X.Y.81.5
>>         Receiving 1926 files, 95.69 GB total. Already received 545 files, 23.31 GB total
>> --
>>     /X.Y.81.2
>>         Receiving 1992 files, 100.81 GB total. Already received 537 files, 23.32 GB total
>> --
>>     /X.Y.81.3
>>         Receiving 1958 files, 104.72 GB total. Already received 503 files, 23.31 GB total
>> --
>>     /X.Y.81.1
>>         Receiving 2034 files, 104.51 GB total. Already received 520 files, 23.33 GB total
>> --
>>     /X.Y.81.6
>>         Receiving 1962 files, 96.19 GB total. Already received 547 files, 23.32 GB total
>> --
>>     /X.Y.33.5
>>         Receiving 2121 files, 97.44 GB total. Already received 601 files, 23.32 GB total
>> 
>> nodetool tpstats
>> Pool Name                    Active   Pending      Completed   Blocked  All time blocked
>> MutationStage                     0         0      828367015         0                 0
>> ViewMutationStage                 0         0              0         0                 0
>> ReadStage                         0         0              0         0                 0
>> RequestResponseStage              0         0             13         0                 0
>> ReadRepairStage                   0         0              0         0                 0
>> CounterMutationStage              0         0              0         0                 0
>> MiscStage                         0         0              0         0                 0
>> CompactionExecutor                1         1          12150         0                 0
>> MemtableReclaimMemory             0         0           7368         0                 0
>> PendingRangeCalculator            0         0             14         0                 0
>> GossipStage                       0         0         599329         0                 0
>> SecondaryIndexManagement          0         0              0         0                 0
>> HintsDispatcher                   0         0              0         0                 0
>> MigrationStage                    0         0             27         0                 0
>> MemtablePostFlush                 0         0           8112         0                 0
>> ValidationExecutor                0         0              0         0                 0
>> Sampler                           0         0              0         0                 0
>> MemtableFlushWriter               0         0           7368         0                 0
>> InternalResponseStage             0         0             25         0                 0
>> AntiEntropyStage                  0         0              0         0                 0
>> CacheCleanupExecutor              0         0              0         0                 0
>> 
>> Message type           Dropped
>> READ                         0
>> RANGE_SLICE                  0
>> _TRACE                       0
>> HINT                         0
>> MUTATION                     1
>> COUNTER_MUTATION             0
>> BATCH_STORE                  0
>> BATCH_REMOVE                 0
>> REQUEST_RESPONSE             0
>> PAGED_RANGE                  0
>> READ_REPAIR                  0
>> 
>> nodetool compactionstats -H
>> pending tasks: 776
>>                                      id   compaction type         keyspace                   table   completed     total    unit   progress
>>    24d039f2-b1e6-11e7-ac57-3d25e38b2f5c        Compaction   keyspace_1   table_1     4.85 GB   7.67 GB   bytes     63.25%
>> Active compaction remaining time :        n/a
>> 
>> 
>>> On Sun, Oct 15, 2017 at 9:27 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>>> Can you post (anonymize as needed) nodetool status, nodetool netstats, nodetool tpstats, and nodetool compctionstats ?
>>> 
>>> -- 
>>> Jeff Jirsa
>>> 
>>> 
>>>> On Oct 15, 2017, at 1:14 PM, Stefano Ortolani <os...@gmail.com> wrote:
>>>> 
>>>> Hi Jeff,
>>>> 
>>>> that would be 3.0.15, single disk, vnodes enabled (num_tokens 256).
>>>> 
>>>> Stefano
>>>> 
>>>>> On Sun, Oct 15, 2017 at 9:11 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>>>>> What version?
>>>>> 
>>>>> Single disk or JBOD?
>>>>> 
>>>>> Vnodes?
>>>>> 
>>>>> -- 
>>>>> Jeff Jirsa
>>>>> 
>>>>> 
>>>>>> On Oct 15, 2017, at 12:49 PM, Stefano Ortolani <os...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck so far. 
>>>>>> Based on the source code it seems that this option doesn't affect compactions while bootstrapping.
>>>>>> 
>>>>>> I am getting quite confused as it seems I am not able to bootstrap a node if I don't have at least 6/7 times the disk space used by other nodes.
>>>>>> This is weird. The host I am bootstrapping is using a SSD. Also compaction throughput is unthrottled (set to 0) and the compacting threads are set to 8.
>>>>>> Nevertheless, primary ranges from other nodes are being streamed, but data is never compacted away.
>>>>>> 
>>>>>> Does anybody know anything else I could try?
>>>>>> 
>>>>>> Cheers,
>>>>>> Stefano
>>>>>> 
>>>>>>> On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <os...@gmail.com> wrote:
>>>>>>> Other little update: at the same time I see the number of pending tasks stuck (in this case at 1847); restarting the node doesn't help, so I can't really force the node to "digest" all those compactions. In the meanwhile the disk occupied is already twice the average load I have on other nodes.
>>>>>>> 
>>>>>>> Feeling more and more puzzled here :S
>>>>>>> 
>>>>>>>> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <os...@gmail.com> wrote:
>>>>>>>> I have been trying to add another node to the cluster (after upgrading to 3.0.15) and I just noticed through "nodetool netstats" that all nodes have been streaming to the joining node approx 1/3 of their SSTables, basically their whole primary range (using RF=3)?
>>>>>>>> 
>>>>>>>> Is this expected/normal? 
>>>>>>>> I was under the impression only the necessary SSTables were going to be streamed...
>>>>>>>> 
>>>>>>>> Thanks for the help,
>>>>>>>> Stefano
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <ku...@instaclustr.com> wrote:
>>>>>>>>>> But if it also streams, it means I'd still be under-pressure if I am not mistaken. I am under the assumption that the compactions are the by-product of streaming too many SStables at the same time, and not because of my current write load.
>>>>>>>>> 
>>>>>>>>> Ah yeah I wasn't thinking about the capacity problem, more of the performance impact from the node being backed up with compactions. If you haven't already, you should try disable stcs in l0 on the joining node. You will likely still need to do a lot of compactions, but generally they should be smaller. The  option is -Dcassandra.disable_stcs_in_l0=true
>>>>>>>>>>  I just noticed you were mentioning L1 tables too. Why would that affect the disk footprint?
>>>>>>>>> 
>>>>>>>>> If you've been doing a lot of STCS in L0, you generally end up with some large SSTables. These will eventually have to be compacted with L1. Could also be suffering the problem of streamed SSTables causing large cross-level compactions in the higher levels as well.
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>

Re: Bootstrapping a node fails because of compactions not keeping up

Posted by Jeff Jirsa <jj...@gmail.com>.

I
You’re adding the new node as rac3

The rack aware policy is going to make sure you get the rack diversity you asked for by making sure one replica of each partition is in rac3, which is going to blow up that instance



-- 
Jeff Jirsa


> On Oct 15, 2017, at 1:42 PM, Stefano Ortolani <os...@gmail.com> wrote:
> 
> Hi Jeff,
> 
> this my third attempt bootstrapping the node so I tried several tricks that might partially explain the output I am posting.
> 
> * To make the bootstrap incremental, I have been throttling the streams on all nodes to 1Mbits. I have selectively unthrottling one node at a time hoping that would unlock some routines compacting away redundant data (you'll see that nodetool netstats reports back fewer nodes than nodetool status).
> * Since compactions have had the tendency of getting stuck (hundreds pending but none executing) in previous bootstraps, I've tried issuing a manual "nodetool compact" on the boostrapping node.
> 
> Having said that, this is the output of the commands,
> 
> Thanks a lot,
> Stefano
> 
> nodetool status
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address      Load       Tokens       Owns    Host ID                               Rack
> UN  X.Y.33.8   342.4 GB   256          ?       afaae414-30cc-439d-9785-1b7d35f74529  RAC1
> UN  X.Y.81.4   325.98 GB  256          ?       00a96a5d-3bfd-497f-91f3-973b75146162  RAC2
> UN  X.Y.33.4   348.81 GB  256          ?       1d8e6588-e25b-456a-8f29-0dedc35bda8e  RAC1
> UN  X.Y.33.5   384.99 GB  256          ?       13d03fd2-7528-466b-b4b5-1b46508e2465  RAC1
> UN  X.Y.81.5   336.27 GB  256          ?       aa161400-6c0e-4bde-bcb3-b2e7e7840196  RAC2
> UN  X.Y.33.6   377.22 GB  256          ?       43a393ba-6805-4e33-866f-124360174b28  RAC1
> UN  X.Y.81.6   329.61 GB  256          ?       4c3c64ae-ef4f-4986-9341-573830416997  RAC2
> UN  X.Y.33.7   344.25 GB  256          ?       03d81879-dc0d-4118-92e3-b3013dfde480  RAC1
> UN  X.Y.81.7   324.93 GB  256          ?       24bbf4b6-9427-4ed1-a751-a55cc24cc756  RAC2
> UN  X.Y.81.1   323.8 GB   256          ?       26244100-0565-4567-ae9c-0fc5346f5558  RAC2
> UJ  X.Y.177.2  724.5 GB   256          ?       e269a06b-c0c0-43a6-922c-f04c98898e0d  RAC3
> UN  X.Y.81.2   337.83 GB  256          ?       09e29429-15ff-44d6-9742-ac95c83c4d9e  RAC2
> UN  X.Y.81.3   326.4 GB   256          ?       feaa7b27-7ab8-4fc2-b64a-c9df3dd86d97  RAC2
> UN  X.Y.33.3   350.4 GB   256          ?       cc115991-b7e7-4d06-87b5-8ad5efd45da5  RAC1
> 
> 
> nodetool netstats -H | grep "Already received" -B 1
>     /X.Y.81.4
>         Receiving 1992 files, 103.68 GB total. Already received 515 files, 23.32 GB total
> --
>     /X.Y.81.7
>         Receiving 1936 files, 89.35 GB total. Already received 554 files, 23.32 GB total
> --
>     /X.Y.81.5
>         Receiving 1926 files, 95.69 GB total. Already received 545 files, 23.31 GB total
> --
>     /X.Y.81.2
>         Receiving 1992 files, 100.81 GB total. Already received 537 files, 23.32 GB total
> --
>     /X.Y.81.3
>         Receiving 1958 files, 104.72 GB total. Already received 503 files, 23.31 GB total
> --
>     /X.Y.81.1
>         Receiving 2034 files, 104.51 GB total. Already received 520 files, 23.33 GB total
> --
>     /X.Y.81.6
>         Receiving 1962 files, 96.19 GB total. Already received 547 files, 23.32 GB total
> --
>     /X.Y.33.5
>         Receiving 2121 files, 97.44 GB total. Already received 601 files, 23.32 GB total
> 
> nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All time blocked
> MutationStage                     0         0      828367015         0                 0
> ViewMutationStage                 0         0              0         0                 0
> ReadStage                         0         0              0         0                 0
> RequestResponseStage              0         0             13         0                 0
> ReadRepairStage                   0         0              0         0                 0
> CounterMutationStage              0         0              0         0                 0
> MiscStage                         0         0              0         0                 0
> CompactionExecutor                1         1          12150         0                 0
> MemtableReclaimMemory             0         0           7368         0                 0
> PendingRangeCalculator            0         0             14         0                 0
> GossipStage                       0         0         599329         0                 0
> SecondaryIndexManagement          0         0              0         0                 0
> HintsDispatcher                   0         0              0         0                 0
> MigrationStage                    0         0             27         0                 0
> MemtablePostFlush                 0         0           8112         0                 0
> ValidationExecutor                0         0              0         0                 0
> Sampler                           0         0              0         0                 0
> MemtableFlushWriter               0         0           7368         0                 0
> InternalResponseStage             0         0             25         0                 0
> AntiEntropyStage                  0         0              0         0                 0
> CacheCleanupExecutor              0         0              0         0                 0
> 
> Message type           Dropped
> READ                         0
> RANGE_SLICE                  0
> _TRACE                       0
> HINT                         0
> MUTATION                     1
> COUNTER_MUTATION             0
> BATCH_STORE                  0
> BATCH_REMOVE                 0
> REQUEST_RESPONSE             0
> PAGED_RANGE                  0
> READ_REPAIR                  0
> 
> nodetool compactionstats -H
> pending tasks: 776
>                                      id   compaction type         keyspace                   table   completed     total    unit   progress
>    24d039f2-b1e6-11e7-ac57-3d25e38b2f5c        Compaction   keyspace_1   table_1     4.85 GB   7.67 GB   bytes     63.25%
> Active compaction remaining time :        n/a
> 
> 
>> On Sun, Oct 15, 2017 at 9:27 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>> Can you post (anonymize as needed) nodetool status, nodetool netstats, nodetool tpstats, and nodetool compctionstats ?
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>>> On Oct 15, 2017, at 1:14 PM, Stefano Ortolani <os...@gmail.com> wrote:
>>> 
>>> Hi Jeff,
>>> 
>>> that would be 3.0.15, single disk, vnodes enabled (num_tokens 256).
>>> 
>>> Stefano
>>> 
>>>> On Sun, Oct 15, 2017 at 9:11 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>>>> What version?
>>>> 
>>>> Single disk or JBOD?
>>>> 
>>>> Vnodes?
>>>> 
>>>> -- 
>>>> Jeff Jirsa
>>>> 
>>>> 
>>>>> On Oct 15, 2017, at 12:49 PM, Stefano Ortolani <os...@gmail.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck so far. 
>>>>> Based on the source code it seems that this option doesn't affect compactions while bootstrapping.
>>>>> 
>>>>> I am getting quite confused as it seems I am not able to bootstrap a node if I don't have at least 6/7 times the disk space used by other nodes.
>>>>> This is weird. The host I am bootstrapping is using a SSD. Also compaction throughput is unthrottled (set to 0) and the compacting threads are set to 8.
>>>>> Nevertheless, primary ranges from other nodes are being streamed, but data is never compacted away.
>>>>> 
>>>>> Does anybody know anything else I could try?
>>>>> 
>>>>> Cheers,
>>>>> Stefano
>>>>> 
>>>>>> On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <os...@gmail.com> wrote:
>>>>>> Other little update: at the same time I see the number of pending tasks stuck (in this case at 1847); restarting the node doesn't help, so I can't really force the node to "digest" all those compactions. In the meanwhile the disk occupied is already twice the average load I have on other nodes.
>>>>>> 
>>>>>> Feeling more and more puzzled here :S
>>>>>> 
>>>>>>> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <os...@gmail.com> wrote:
>>>>>>> I have been trying to add another node to the cluster (after upgrading to 3.0.15) and I just noticed through "nodetool netstats" that all nodes have been streaming to the joining node approx 1/3 of their SSTables, basically their whole primary range (using RF=3)?
>>>>>>> 
>>>>>>> Is this expected/normal? 
>>>>>>> I was under the impression only the necessary SSTables were going to be streamed...
>>>>>>> 
>>>>>>> Thanks for the help,
>>>>>>> Stefano
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <ku...@instaclustr.com> wrote:
>>>>>>>>> But if it also streams, it means I'd still be under-pressure if I am not mistaken. I am under the assumption that the compactions are the by-product of streaming too many SStables at the same time, and not because of my current write load.
>>>>>>>> 
>>>>>>>> Ah yeah I wasn't thinking about the capacity problem, more of the performance impact from the node being backed up with compactions. If you haven't already, you should try disable stcs in l0 on the joining node. You will likely still need to do a lot of compactions, but generally they should be smaller. The  option is -Dcassandra.disable_stcs_in_l0=true
>>>>>>>>>  I just noticed you were mentioning L1 tables too. Why would that affect the disk footprint?
>>>>>>>> 
>>>>>>>> If you've been doing a lot of STCS in L0, you generally end up with some large SSTables. These will eventually have to be compacted with L1. Could also be suffering the problem of streamed SSTables causing large cross-level compactions in the higher levels as well.
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>

Re: Bootstrapping a node fails because of compactions not keeping up

Posted by Stefano Ortolani <os...@gmail.com>.

Hi Jeff,

this my third attempt bootstrapping the node so I tried several tricks that
might partially explain the output I am posting.

* To make the bootstrap incremental, I have been throttling the streams on
all nodes to 1Mbits. I have selectively unthrottling one node at a time
hoping that would unlock some routines compacting away redundant data
(you'll see that nodetool netstats reports back fewer nodes than nodetool
status).
* Since compactions have had the tendency of getting stuck (hundreds
pending but none executing) in previous bootstraps, I've tried issuing a
manual "nodetool compact" on the boostrapping node.

Having said that, this is the output of the commands,

Thanks a lot,
Stefano

*nodetool status*
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID
            Rack
UN  X.Y.33.8   342.4 GB   256          ?
afaae414-30cc-439d-9785-1b7d35f74529
 RAC1
UN  X.Y.81.4   325.98 GB  256          ?
00a96a5d-3bfd-497f-91f3-973b75146162
 RAC2
UN  X.Y.33.4   348.81 GB  256          ?
1d8e6588-e25b-456a-8f29-0dedc35bda8e
 RAC1
UN  X.Y.33.5   384.99 GB  256          ?
13d03fd2-7528-466b-b4b5-1b46508e2465
 RAC1
UN  X.Y.81.5   336.27 GB  256          ?
aa161400-6c0e-4bde-bcb3-b2e7e7840196
 RAC2
UN  X.Y.33.6   377.22 GB  256          ?
43a393ba-6805-4e33-866f-124360174b28
 RAC1
UN  X.Y.81.6   329.61 GB  256          ?
4c3c64ae-ef4f-4986-9341-573830416997
 RAC2
UN  X.Y.33.7   344.25 GB  256          ?
03d81879-dc0d-4118-92e3-b3013dfde480
 RAC1
UN  X.Y.81.7   324.93 GB  256          ?
24bbf4b6-9427-4ed1-a751-a55cc24cc756
 RAC2
UN  X.Y.81.1   323.8 GB   256          ?
26244100-0565-4567-ae9c-0fc5346f5558
 RAC2
UJ  X.Y.177.2  724.5 GB   256          ?
e269a06b-c0c0-43a6-922c-f04c98898e0d
 RAC3
UN  X.Y.81.2   337.83 GB  256          ?
09e29429-15ff-44d6-9742-ac95c83c4d9e
 RAC2
UN  X.Y.81.3   326.4 GB   256          ?
feaa7b27-7ab8-4fc2-b64a-c9df3dd86d97
 RAC2
UN  X.Y.33.3   350.4 GB   256          ?
cc115991-b7e7-4d06-87b5-8ad5efd45da5
 RAC1


*nodetool netstats -H | grep "Already received" -B 1*
    /X.Y.81.4
        Receiving 1992 files, 103.68 GB total. Already received 515 files,
23.32 GB total
--
    /X.Y.81.7
        Receiving 1936 files, 89.35 GB total. Already received 554 files,
23.32 GB total
--
    /X.Y.81.5
        Receiving 1926 files, 95.69 GB total. Already received 545 files,
23.31 GB total
--
    /X.Y.81.2
        Receiving 1992 files, 100.81 GB total. Already received 537 files,
23.32 GB total
--
    /X.Y.81.3
        Receiving 1958 files, 104.72 GB total. Already received 503 files,
23.31 GB total
--
    /X.Y.81.1
        Receiving 2034 files, 104.51 GB total. Already received 520 files,
23.33 GB total
--
    /X.Y.81.6
        Receiving 1962 files, 96.19 GB total. Already received 547 files,
23.32 GB total
--
    /X.Y.33.5
        Receiving 2121 files, 97.44 GB total. Already received 601 files,
23.32 GB total

*nodetool tpstats*
Pool Name                    Active   Pending      Completed   Blocked  All
time blocked
MutationStage                     0         0      828367015         0
            0
ViewMutationStage                 0         0              0         0
            0
ReadStage                         0         0              0         0
            0
RequestResponseStage              0         0             13         0
            0
ReadRepairStage                   0         0              0         0
            0
CounterMutationStage              0         0              0         0
            0
MiscStage                         0         0              0         0
            0
CompactionExecutor                1         1          12150         0
            0
MemtableReclaimMemory             0         0           7368         0
            0
PendingRangeCalculator            0         0             14         0
            0
GossipStage                       0         0         599329         0
            0
SecondaryIndexManagement          0         0              0         0
            0
HintsDispatcher                   0         0              0         0
            0
MigrationStage                    0         0             27         0
            0
MemtablePostFlush                 0         0           8112         0
            0
ValidationExecutor                0         0              0         0
            0
Sampler                           0         0              0         0
            0
MemtableFlushWriter               0         0           7368         0
            0
InternalResponseStage             0         0             25         0
            0
AntiEntropyStage                  0         0              0         0
            0
CacheCleanupExecutor              0         0              0         0
            0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     1
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

*nodetool compactionstats -H*
pending tasks: 776
                                     id   compaction type         keyspace
                  table   completed     total    unit   progress
   24d039f2-b1e6-11e7-ac57-3d25e38b2f5c        Compaction   keyspace_1
table_1     4.85 GB   7.67 GB   bytes     63.25%
Active compaction remaining time :        n/a


On Sun, Oct 15, 2017 at 9:27 PM, Jeff Jirsa <jj...@gmail.com> wrote:

> Can you post (anonymize as needed) nodetool status, nodetool netstats,
> nodetool tpstats, and nodetool compctionstats ?
>
> --
> Jeff Jirsa
>
>
> On Oct 15, 2017, at 1:14 PM, Stefano Ortolani <os...@gmail.com> wrote:
>
> Hi Jeff,
>
> that would be 3.0.15, single disk, vnodes enabled (num_tokens 256).
>
> Stefano
>
> On Sun, Oct 15, 2017 at 9:11 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>
>> What version?
>>
>> Single disk or JBOD?
>>
>> Vnodes?
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Oct 15, 2017, at 12:49 PM, Stefano Ortolani <os...@gmail.com>
>> wrote:
>>
>> Hi all,
>>
>> I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck so
>> far.
>> Based on the source code it seems that this option doesn't affect
>> compactions while bootstrapping.
>>
>> I am getting quite confused as it seems I am not able to bootstrap a node
>> if I don't have at least 6/7 times the disk space used by other nodes.
>> This is weird. The host I am bootstrapping is using a SSD. Also
>> compaction throughput is unthrottled (set to 0) and the compacting threads
>> are set to 8.
>> Nevertheless, primary ranges from other nodes are being streamed, but
>> data is never compacted away.
>>
>> Does anybody know anything else I could try?
>>
>> Cheers,
>> Stefano
>>
>> On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <os...@gmail.com>
>> wrote:
>>
>>> Other little update: at the same time I see the number of pending tasks
>>> stuck (in this case at 1847); restarting the node doesn't help, so I can't
>>> really force the node to "digest" all those compactions. In the meanwhile
>>> the disk occupied is already twice the average load I have on other nodes.
>>>
>>> Feeling more and more puzzled here :S
>>>
>>> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <os...@gmail.com>
>>> wrote:
>>>
>>>> I have been trying to add another node to the cluster (after upgrading
>>>> to 3.0.15) and I just noticed through "nodetool netstats" that all nodes
>>>> have been streaming to the joining node approx 1/3 of their SSTables,
>>>> basically their whole primary range (using RF=3)?
>>>>
>>>> Is this expected/normal?
>>>> I was under the impression only the necessary SSTables were going to be
>>>> streamed...
>>>>
>>>> Thanks for the help,
>>>> Stefano
>>>>
>>>>
>>>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <ku...@instaclustr.com>
>>>> wrote:
>>>>
>>>>> But if it also streams, it means I'd still be under-pressure if I am
>>>>>> not mistaken. I am under the assumption that the compactions are the
>>>>>> by-product of streaming too many SStables at the same time, and not because
>>>>>> of my current write load.
>>>>>>
>>>>> Ah yeah I wasn't thinking about the capacity problem, more of the
>>>>> performance impact from the node being backed up with compactions. If you
>>>>> haven't already, you should try disable stcs in l0 on the joining node. You
>>>>> will likely still need to do a lot of compactions, but generally they
>>>>> should be smaller. The  option is -Dcassandra.disable_stcs_in_l0=true
>>>>>
>>>>>>  I just noticed you were mentioning L1 tables too. Why would that
>>>>>> affect the disk footprint?
>>>>>
>>>>> If you've been doing a lot of STCS in L0, you generally end up with
>>>>> some large SSTables. These will eventually have to be compacted with L1.
>>>>> Could also be suffering the problem of streamed SSTables causing large
>>>>> cross-level compactions in the higher levels as well.
>>>>> 
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Bootstrapping a node fails because of compactions not keeping up

Posted by Jeff Jirsa <jj...@gmail.com>.

Can you post (anonymize as needed) nodetool status, nodetool netstats, nodetool tpstats, and nodetool compctionstats ?

-- 
Jeff Jirsa


> On Oct 15, 2017, at 1:14 PM, Stefano Ortolani <os...@gmail.com> wrote:
> 
> Hi Jeff,
> 
> that would be 3.0.15, single disk, vnodes enabled (num_tokens 256).
> 
> Stefano
> 
>> On Sun, Oct 15, 2017 at 9:11 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>> What version?
>> 
>> Single disk or JBOD?
>> 
>> Vnodes?
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>>> On Oct 15, 2017, at 12:49 PM, Stefano Ortolani <os...@gmail.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck so far. 
>>> Based on the source code it seems that this option doesn't affect compactions while bootstrapping.
>>> 
>>> I am getting quite confused as it seems I am not able to bootstrap a node if I don't have at least 6/7 times the disk space used by other nodes.
>>> This is weird. The host I am bootstrapping is using a SSD. Also compaction throughput is unthrottled (set to 0) and the compacting threads are set to 8.
>>> Nevertheless, primary ranges from other nodes are being streamed, but data is never compacted away.
>>> 
>>> Does anybody know anything else I could try?
>>> 
>>> Cheers,
>>> Stefano
>>> 
>>>> On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <os...@gmail.com> wrote:
>>>> Other little update: at the same time I see the number of pending tasks stuck (in this case at 1847); restarting the node doesn't help, so I can't really force the node to "digest" all those compactions. In the meanwhile the disk occupied is already twice the average load I have on other nodes.
>>>> 
>>>> Feeling more and more puzzled here :S
>>>> 
>>>>> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <os...@gmail.com> wrote:
>>>>> I have been trying to add another node to the cluster (after upgrading to 3.0.15) and I just noticed through "nodetool netstats" that all nodes have been streaming to the joining node approx 1/3 of their SSTables, basically their whole primary range (using RF=3)?
>>>>> 
>>>>> Is this expected/normal? 
>>>>> I was under the impression only the necessary SSTables were going to be streamed...
>>>>> 
>>>>> Thanks for the help,
>>>>> Stefano
>>>>> 
>>>>> 
>>>>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <ku...@instaclustr.com> wrote:
>>>>>>> But if it also streams, it means I'd still be under-pressure if I am not mistaken. I am under the assumption that the compactions are the by-product of streaming too many SStables at the same time, and not because of my current write load.
>>>>>> 
>>>>>> Ah yeah I wasn't thinking about the capacity problem, more of the performance impact from the node being backed up with compactions. If you haven't already, you should try disable stcs in l0 on the joining node. You will likely still need to do a lot of compactions, but generally they should be smaller. The  option is -Dcassandra.disable_stcs_in_l0=true
>>>>>>>  I just noticed you were mentioning L1 tables too. Why would that affect the disk footprint?
>>>>>> 
>>>>>> If you've been doing a lot of STCS in L0, you generally end up with some large SSTables. These will eventually have to be compacted with L1. Could also be suffering the problem of streamed SSTables causing large cross-level compactions in the higher levels as well.
>>>>>> 
>>>>> 
>>>> 
>>> 
>

Re: Bootstrapping a node fails because of compactions not keeping up

Posted by Stefano Ortolani <os...@gmail.com>.

Hi Jeff,

that would be 3.0.15, single disk, vnodes enabled (num_tokens 256).

Stefano

On Sun, Oct 15, 2017 at 9:11 PM, Jeff Jirsa <jj...@gmail.com> wrote:

> What version?
>
> Single disk or JBOD?
>
> Vnodes?
>
> --
> Jeff Jirsa
>
>
> On Oct 15, 2017, at 12:49 PM, Stefano Ortolani <os...@gmail.com> wrote:
>
> Hi all,
>
> I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck so
> far.
> Based on the source code it seems that this option doesn't affect
> compactions while bootstrapping.
>
> I am getting quite confused as it seems I am not able to bootstrap a node
> if I don't have at least 6/7 times the disk space used by other nodes.
> This is weird. The host I am bootstrapping is using a SSD. Also compaction
> throughput is unthrottled (set to 0) and the compacting threads are set to
> 8.
> Nevertheless, primary ranges from other nodes are being streamed, but data
> is never compacted away.
>
> Does anybody know anything else I could try?
>
> Cheers,
> Stefano
>
> On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <os...@gmail.com>
> wrote:
>
>> Other little update: at the same time I see the number of pending tasks
>> stuck (in this case at 1847); restarting the node doesn't help, so I can't
>> really force the node to "digest" all those compactions. In the meanwhile
>> the disk occupied is already twice the average load I have on other nodes.
>>
>> Feeling more and more puzzled here :S
>>
>> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <os...@gmail.com>
>> wrote:
>>
>>> I have been trying to add another node to the cluster (after upgrading
>>> to 3.0.15) and I just noticed through "nodetool netstats" that all nodes
>>> have been streaming to the joining node approx 1/3 of their SSTables,
>>> basically their whole primary range (using RF=3)?
>>>
>>> Is this expected/normal?
>>> I was under the impression only the necessary SSTables were going to be
>>> streamed...
>>>
>>> Thanks for the help,
>>> Stefano
>>>
>>>
>>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <ku...@instaclustr.com>
>>> wrote:
>>>
>>>> But if it also streams, it means I'd still be under-pressure if I am
>>>>> not mistaken. I am under the assumption that the compactions are the
>>>>> by-product of streaming too many SStables at the same time, and not because
>>>>> of my current write load.
>>>>>
>>>> Ah yeah I wasn't thinking about the capacity problem, more of the
>>>> performance impact from the node being backed up with compactions. If you
>>>> haven't already, you should try disable stcs in l0 on the joining node. You
>>>> will likely still need to do a lot of compactions, but generally they
>>>> should be smaller. The  option is -Dcassandra.disable_stcs_in_l0=true
>>>>
>>>>>  I just noticed you were mentioning L1 tables too. Why would that
>>>>> affect the disk footprint?
>>>>
>>>> If you've been doing a lot of STCS in L0, you generally end up with
>>>> some large SSTables. These will eventually have to be compacted with L1.
>>>> Could also be suffering the problem of streamed SSTables causing large
>>>> cross-level compactions in the higher levels as well.
>>>> 
>>>>
>>>
>>>
>>
>

Re: Bootstrapping a node fails because of compactions not keeping up

Posted by Jeff Jirsa <jj...@gmail.com>.

What version?

Single disk or JBOD?

Vnodes?

-- 
Jeff Jirsa


> On Oct 15, 2017, at 12:49 PM, Stefano Ortolani <os...@gmail.com> wrote:
> 
> Hi all,
> 
> I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck so far. 
> Based on the source code it seems that this option doesn't affect compactions while bootstrapping.
> 
> I am getting quite confused as it seems I am not able to bootstrap a node if I don't have at least 6/7 times the disk space used by other nodes.
> This is weird. The host I am bootstrapping is using a SSD. Also compaction throughput is unthrottled (set to 0) and the compacting threads are set to 8.
> Nevertheless, primary ranges from other nodes are being streamed, but data is never compacted away.
> 
> Does anybody know anything else I could try?
> 
> Cheers,
> Stefano
> 
>> On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <os...@gmail.com> wrote:
>> Other little update: at the same time I see the number of pending tasks stuck (in this case at 1847); restarting the node doesn't help, so I can't really force the node to "digest" all those compactions. In the meanwhile the disk occupied is already twice the average load I have on other nodes.
>> 
>> Feeling more and more puzzled here :S
>> 
>>> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <os...@gmail.com> wrote:
>>> I have been trying to add another node to the cluster (after upgrading to 3.0.15) and I just noticed through "nodetool netstats" that all nodes have been streaming to the joining node approx 1/3 of their SSTables, basically their whole primary range (using RF=3)?
>>> 
>>> Is this expected/normal? 
>>> I was under the impression only the necessary SSTables were going to be streamed...
>>> 
>>> Thanks for the help,
>>> Stefano
>>> 
>>> 
>>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <ku...@instaclustr.com> wrote:
>>>>> But if it also streams, it means I'd still be under-pressure if I am not mistaken. I am under the assumption that the compactions are the by-product of streaming too many SStables at the same time, and not because of my current write load.
>>>> 
>>>> Ah yeah I wasn't thinking about the capacity problem, more of the performance impact from the node being backed up with compactions. If you haven't already, you should try disable stcs in l0 on the joining node. You will likely still need to do a lot of compactions, but generally they should be smaller. The  option is -Dcassandra.disable_stcs_in_l0=true
>>>>>  I just noticed you were mentioning L1 tables too. Why would that affect the disk footprint?
>>>> 
>>>> If you've been doing a lot of STCS in L0, you generally end up with some large SSTables. These will eventually have to be compacted with L1. Could also be suffering the problem of streamed SSTables causing large cross-level compactions in the higher levels as well.
>>>> 
>>> 
>> 
>

Re: Bootstrapping a node fails because of compactions not keeping up

Posted by Stefano Ortolani <os...@gmail.com>.

Hi all,

I have been trying "-Dcassandra.disable_stcs_in_l0=true", but no luck so
far.
Based on the source code it seems that this option doesn't affect
compactions while bootstrapping.

I am getting quite confused as it seems I am not able to bootstrap a node
if I don't have at least 6/7 times the disk space used by other nodes.
This is weird. The host I am bootstrapping is using a SSD. Also compaction
throughput is unthrottled (set to 0) and the compacting threads are set to
8.
Nevertheless, primary ranges from other nodes are being streamed, but data
is never compacted away.

Does anybody know anything else I could try?

Cheers,
Stefano

On Fri, Oct 13, 2017 at 3:58 PM, Stefano Ortolani <os...@gmail.com>
wrote:

> Other little update: at the same time I see the number of pending tasks
> stuck (in this case at 1847); restarting the node doesn't help, so I can't
> really force the node to "digest" all those compactions. In the meanwhile
> the disk occupied is already twice the average load I have on other nodes.
>
> Feeling more and more puzzled here :S
>
> On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <os...@gmail.com>
> wrote:
>
>> I have been trying to add another node to the cluster (after upgrading to
>> 3.0.15) and I just noticed through "nodetool netstats" that all nodes have
>> been streaming to the joining node approx 1/3 of their SSTables, basically
>> their whole primary range (using RF=3)?
>>
>> Is this expected/normal?
>> I was under the impression only the necessary SSTables were going to be
>> streamed...
>>
>> Thanks for the help,
>> Stefano
>>
>>
>> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <ku...@instaclustr.com>
>> wrote:
>>
>>> But if it also streams, it means I'd still be under-pressure if I am not
>>>> mistaken. I am under the assumption that the compactions are the by-product
>>>> of streaming too many SStables at the same time, and not because of my
>>>> current write load.
>>>>
>>> Ah yeah I wasn't thinking about the capacity problem, more of the
>>> performance impact from the node being backed up with compactions. If you
>>> haven't already, you should try disable stcs in l0 on the joining node. You
>>> will likely still need to do a lot of compactions, but generally they
>>> should be smaller. The  option is -Dcassandra.disable_stcs_in_l0=true
>>>
>>>>  I just noticed you were mentioning L1 tables too. Why would that
>>>> affect the disk footprint?
>>>
>>> If you've been doing a lot of STCS in L0, you generally end up with some
>>> large SSTables. These will eventually have to be compacted with L1. Could
>>> also be suffering the problem of streamed SSTables causing large
>>> cross-level compactions in the higher levels as well.
>>> 
>>>
>>
>>
>

Re: Bootstrapping a node fails because of compactions not keeping up

Posted by Stefano Ortolani <os...@gmail.com>.

Other little update: at the same time I see the number of pending tasks
stuck (in this case at 1847); restarting the node doesn't help, so I can't
really force the node to "digest" all those compactions. In the meanwhile
the disk occupied is already twice the average load I have on other nodes.

Feeling more and more puzzled here :S

On Fri, Oct 13, 2017 at 1:28 PM, Stefano Ortolani <os...@gmail.com>
wrote:

> I have been trying to add another node to the cluster (after upgrading to
> 3.0.15) and I just noticed through "nodetool netstats" that all nodes have
> been streaming to the joining node approx 1/3 of their SSTables, basically
> their whole primary range (using RF=3)?
>
> Is this expected/normal?
> I was under the impression only the necessary SSTables were going to be
> streamed...
>
> Thanks for the help,
> Stefano
>
>
> On Wed, Aug 23, 2017 at 1:37 PM, kurt greaves <ku...@instaclustr.com>
> wrote:
>
>> But if it also streams, it means I'd still be under-pressure if I am not
>>> mistaken. I am under the assumption that the compactions are the by-product
>>> of streaming too many SStables at the same time, and not because of my
>>> current write load.
>>>
>> Ah yeah I wasn't thinking about the capacity problem, more of the
>> performance impact from the node being backed up with compactions. If you
>> haven't already, you should try disable stcs in l0 on the joining node. You
>> will likely still need to do a lot of compactions, but generally they
>> should be smaller. The  option is -Dcassandra.disable_stcs_in_l0=true
>>
>>>  I just noticed you were mentioning L1 tables too. Why would that affect
>>> the disk footprint?
>>
>> If you've been doing a lot of STCS in L0, you generally end up with some
>> large SSTables. These will eventually have to be compacted with L1. Could
>> also be suffering the problem of streamed SSTables causing large
>> cross-level compactions in the higher levels as well.
>> 
>>
>
>