You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ruchir Jha <ru...@gmail.com> on 2014/08/04 19:41:32 UTC

Node bootstrap

I am trying to bootstrap the thirteenth node in a 12 node cluster where the
average data size per node is about 2.1 TB. The bootstrap streaming has
been going on for 2 days now, and the disk size on the new node is already
above 4 TB and still going. Is this because the new node is running major
compactions while the streaming is going on?

One thing that I noticed that seemed off was the seeds property in the yaml
of the 13th node comprises of 1..12. Where as the seeds property on the
existing 12 nodes consists of all the other nodes except the thirteenth
node. Is this an issue?

Any other insight is appreciated?

Ruchir.

Re: Node bootstrap

Posted by Ruchir Jha <ru...@gmail.com>.
Still having issues with node bootstrapping. The new node just died,
because it Full Gced, the nodes it had actual streams with noticed its
down. After the full gc finished the new node printed this log :

ERROR 02:52:36,259 Stream failed because /10.10.20.35 died or was
restarted/removed (streams may still be active in background, but further
streams won't be started)

Here 10.10.20.35 is an existing node, the new guy was streaming from. A
similar log was printed for every other node on the cluster. Why did the
new node just exit after the FGC pause?

We have heap dumps enabled on Full GC's and this are the top offenders on
the new node. A new entry that I noticed is the CompressionMetaData chunks.
Anything I can do to optimize that?

 num     #instances         #bytes  class name
----------------------------------------------
   1:      42508421     4818885752  [B
   2:      65860543     3161306064  java.nio.HeapByteBuffer
   3:     124361093     2984666232
 org.apache.cassandra.io.compress.CompressionMetadata$Chunk
   4:      29745665     1427791920
 edu.stanford.ppl.concurrent.SnapTreeMap$Node
   5:      29810362      953931584  org.apache.cassandra.db.Column
   6:         31623      498012768
 [Lorg.apache.cassandra.io.compress.CompressionMetadata$Chunk;



On Tue, Aug 5, 2014 at 2:59 PM, Ruchir Jha <ru...@gmail.com> wrote:

> Also, right now the "top" command shows that we are at 500-700% CPU, and
> we have 23 total processors, which means we have a lot of idle CPU left
> over, so throwing more threads at compaction and flush should alleviate the
> problem?
>
>
> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com> wrote:
>
>>
>> Right now, we have 6 flush writers and compaction_throughput_mb_per_sec
>> is set to 0, which I believe disables throttling.
>>
>> Also, Here is the iostat -x 5 5 output:
>>
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
>> avgrq-sz avgqu-sz   await  svctm  %util
>> sda              10.00  1450.35   50.79   55.92  9775.97 12030.14
>> 204.34     1.56   14.62   1.05  11.21
>> dm-0              0.00     0.00    3.59   18.82   166.52   150.35
>>  14.14     0.44   19.49   0.54   1.22
>> dm-1              0.00     0.00    2.32    5.37    18.56    42.98
>> 8.00     0.76   98.82   0.43   0.33
>> dm-2              0.00     0.00  162.17 5836.66 32714.46 47040.87
>>  13.30     5.57    0.90   0.06  36.00
>> sdb               0.40  4251.90  106.72  107.35 23123.61 35204.09
>> 272.46     4.43   20.68   1.29  27.64
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>                 14.64   10.75    1.81   13.50    0.00   59.29
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
>> avgrq-sz avgqu-sz   await  svctm  %util
>> sda              15.40  1344.60   68.80  145.60  4964.80 11790.40
>>  78.15     0.38    1.80   0.80  17.10
>> dm-0              0.00     0.00   43.00 1186.20  2292.80  9489.60
>> 9.59     4.88    3.90   0.09  11.58
>> dm-1              0.00     0.00    1.60    0.00    12.80     0.00
>> 8.00     0.03   16.00   2.00   0.32
>> dm-2              0.00     0.00  197.20 17583.80 35152.00 140664.00
>> 9.89  2847.50  109.52   0.05  93.50
>> sdb              13.20 16552.20  159.00  742.20 32745.60 129129.60
>> 179.62    72.88   66.01   1.04  93.42
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>                   15.51   19.77    1.97    5.02    0.00   57.73
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
>> avgrq-sz avgqu-sz   await  svctm  %util
>> sda              16.20   523.40   60.00  285.00  5220.80  5913.60
>>  32.27     0.25    0.72   0.60  20.86
>> dm-0              0.00     0.00    0.80    1.40    32.00    11.20
>>  19.64     0.01    3.18   1.55   0.34
>> dm-1              0.00     0.00    1.60    0.00    12.80     0.00
>> 8.00     0.03   21.00   2.62   0.42
>> dm-2              0.00     0.00  339.40 5886.80 66219.20 47092.80
>>  18.20   251.66  184.72   0.10  63.48
>> sdb               1.00  5025.40  264.20  209.20 60992.00 50422.40
>> 235.35     5.98   40.92   1.23  58.28
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>                   16.59   16.34    2.03    9.01    0.00   56.04
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
>> avgrq-sz avgqu-sz   await  svctm  %util
>> sda               5.40   320.00   37.40  159.80  2483.20  3529.60
>>  30.49     0.10    0.52   0.39   7.76
>> dm-0              0.00     0.00    0.20    3.60     1.60    28.80
>> 8.00     0.00    0.68   0.68   0.26
>> dm-1              0.00     0.00    0.00    0.00     0.00     0.00
>> 0.00     0.00    0.00   0.00   0.00
>> dm-2              0.00     0.00  287.20 13108.20 53985.60 104864.00
>>  11.86   869.18   48.82   0.06  76.96
>> sdb               5.20 12163.40  238.20  532.00 51235.20 93753.60
>> 188.25    21.46   23.75   0.97  75.08
>>
>>
>>
>> On Tue, Aug 5, 2014 at 1:55 PM, Mark Reddy <ma...@boxever.com>
>> wrote:
>>
>>> Hi Ruchir,
>>>
>>> With the large number of blocked flushes and the number of pending
>>> compactions would still indicate IO contention. Can you post the output of
>>> 'iostat -x 5 5'
>>>
>>> If you do in fact have spare IO, there are several configuration options
>>> you can tune such as increasing the number of flush writers and
>>> compaction_throughput_mb_per_sec
>>>
>>> Mark
>>>
>>>
>>> On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>>
>>>> Also Mark to your comment on my tpstats output, below is my iostat
>>>> output, and the iowait is at 4.59%, which means no IO pressure, but we are
>>>> still seeing the bad flush performance. Should we try increasing the flush
>>>> writers?
>>>>
>>>>
>>>> Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp)  08/05/2014
>>>>  _x86_64_        (24 CPU)
>>>>
>>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>>                   5.80   10.25    0.65    4.59    0.00   78.72
>>>>
>>>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>>>> sda             103.83      9630.62     11982.60 3231174328 4020290310
>>>> dm-0             13.57       160.17        81.12   53739546   27217432
>>>> dm-1              7.59        16.94        43.77    5682200   14686784
>>>> dm-2           5792.76     32242.66     45427.12 10817753530 15241278360
>>>> sdb             206.09     22789.19     33569.27 7646015080 11262843224
>>>>
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> nodetool status:
>>>>>
>>>>> Datacenter: datacenter1
>>>>> =======================
>>>>> Status=Up/Down
>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>> --  Address      Load       Tokens  Owns (effective)  Host ID
>>>>>                       Rack
>>>>> UN  10.10.20.27  1.89 TB    256     25.4%
>>>>> 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
>>>>> UN  10.10.20.62  1.83 TB    256     25.5%
>>>>> 84b47313-da75-4519-94f3-3951d554a3e5  rack1
>>>>> UN  10.10.20.47  1.87 TB    256     24.7%
>>>>> bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
>>>>> UN  10.10.20.45  1.7 TB     256     22.6%
>>>>> 8d6bce33-8179-4660-8443-2cf822074ca4  rack1
>>>>> UN  10.10.20.15  1.86 TB    256     24.5%
>>>>> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
>>>>> UN  10.10.20.31  1.87 TB    256     24.9%
>>>>> 1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
>>>>> UN  10.10.20.35  1.86 TB    256     25.8%
>>>>> 17cb8772-2444-46ff-8525-33746514727d  rack1
>>>>> UN  10.10.20.51  1.89 TB    256     25.0%
>>>>> 0343cd58-3686-465f-8280-56fb72d161e2  rack1
>>>>> UN  10.10.20.19  1.91 TB    256     25.5%
>>>>> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
>>>>> UN  10.10.20.39  1.93 TB    256     26.0%
>>>>> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
>>>>> UN  10.10.20.52  1.81 TB    256     25.4%
>>>>> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
>>>>> UN  10.10.20.22  1.89 TB    256     24.8%
>>>>> 46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1
>>>>>
>>>>>
>>>>> Note: The new node is not part of the above list.
>>>>>
>>>>> nodetool compactionstats:
>>>>>
>>>>> pending tasks: 1649
>>>>>           compaction type        keyspace   column family
>>>>> completed           total      unit  progress
>>>>>                Compaction           iprod   customerorder
>>>>>  1682804084     17956558077     bytes     9.37%
>>>>>                Compaction            prodgatecustomerorder
>>>>>  1664239271      1693502275     bytes    98.27%
>>>>>                Compaction  qa_config_bkupfixsessionconfig_hist
>>>>>    2443           27253     bytes     8.96%
>>>>>                Compaction            prodgatecustomerorder_hist
>>>>>  1770577280      5026699390     bytes    35.22%
>>>>>                Compaction           iprodgatecustomerorder_hist
>>>>>  2959560205    312350192622     bytes     0.95%
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <ma...@boxever.com>
>>>>> wrote:
>>>>>
>>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>>> including the new one.
>>>>>>
>>>>>>
>>>>>> Ok so you have num_tokens set to 256 for all nodes with initial_token
>>>>>> commented out, this means you are using vnodes and the new node will
>>>>>> automatically grab a list of tokens to take over responsibility for.
>>>>>>
>>>>>> Pool Name                    Active   Pending      Completed
>>>>>>> Blocked  All time blocked
>>>>>>> FlushWriter                       0         0           1136
>>>>>>> 0               512
>>>>>>>
>>>>>>> Looks like about 50% of flushes are blocked.
>>>>>>>
>>>>>>
>>>>>> This is a problem as it indicates that the IO system cannot keep up.
>>>>>>
>>>>>> Just ran this on the new node:
>>>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>>>> 10
>>>>>>
>>>>>>
>>>>>> This is normal as the new node will most likely take tokens from all
>>>>>> nodes in the cluster.
>>>>>>
>>>>>> Sorry for the multiple updates, but another thing I found was all the
>>>>>>> other existing nodes have themselves in the seeds list, but the new node
>>>>>>> does not have itself in the seeds list. Can that cause this issue?
>>>>>>
>>>>>>
>>>>>> Seeds are only used when a new node is bootstrapping into the cluster
>>>>>> and needs a set of ips to contact and discover the cluster, so this would
>>>>>> have no impact on data sizes or streaming. In general it would be
>>>>>> considered best practice to have a set of 2-3 seeds from each data center,
>>>>>> with all nodes having the same seed list.
>>>>>>
>>>>>>
>>>>>> What is the current output of 'nodetool compactionstats'? Could you
>>>>>> also paste the output of nodetool status <keyspace>?
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Sorry for the multiple updates, but another thing I found was all
>>>>>>> the other existing nodes have themselves in the seeds list, but the new
>>>>>>> node does not have itself in the seeds list. Can that cause this issue?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Just ran this on the new node:
>>>>>>>>
>>>>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>>>>> 10
>>>>>>>>
>>>>>>>> Seems like the new node is receiving data from 10 other nodes. Is
>>>>>>>> that expected in a vnodes enabled environment?
>>>>>>>>
>>>>>>>> Ruchir.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Also not sure if this is relevant but just noticed the nodetool
>>>>>>>>> tpstats output:
>>>>>>>>>
>>>>>>>>> Pool Name                    Active   Pending      Completed
>>>>>>>>> Blocked  All time blocked
>>>>>>>>> FlushWriter                       0         0           1136
>>>>>>>>>   0               512
>>>>>>>>>
>>>>>>>>> Looks like about 50% of flushes are blocked.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>>>>>> including the new one.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <
>>>>>>>>>> mark.reddy@boxever.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> My understanding was that if initial_token is left empty on the
>>>>>>>>>>>> new node, it just contacts the heaviest node and bisects its token range.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> If you are using vnodes and you have num_tokens set to 256 the
>>>>>>>>>>> new node will take token ranges dynamically. What is the configuration of
>>>>>>>>>>> your other nodes, are you setting num_tokens or initial_token on those?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Mark
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ruchir.jha@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Patricia for your response!
>>>>>>>>>>>>
>>>>>>>>>>>> On the new node, I just see a lot of the following:
>>>>>>>>>>>>
>>>>>>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java
>>>>>>>>>>>> (line 400) Writing Memtable
>>>>>>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>>>>>>>
>>>>>>>>>>>> so basically it is just busy flushing, and compacting. Would
>>>>>>>>>>>> you have any ideas on why the 2x disk space blow up. My understanding was
>>>>>>>>>>>> that if initial_token is left empty on the new node, it just contacts the
>>>>>>>>>>>> heaviest node and bisects its token range. And the heaviest node is around
>>>>>>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because
>>>>>>>>>>>> compaction is falling behind?
>>>>>>>>>>>>
>>>>>>>>>>>> Ruchir
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>>>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Ruchir,
>>>>>>>>>>>>>
>>>>>>>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>>>>>>>> compactions on the new bootstrapping node?
>>>>>>>>>>>>>
>>>>>>>>>>>>> With respect to the seed list, it is generally advisable to
>>>>>>>>>>>>> use 3 seed nodes per AZ / DC.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <
>>>>>>>>>>>>> ruchir.jha@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node
>>>>>>>>>>>>>> cluster where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One thing that I noticed that seemed off was the seeds
>>>>>>>>>>>>>> property in the yaml of the 13th node comprises of 1..12. Where as the
>>>>>>>>>>>>>> seeds property on the existing 12 nodes consists of all the other nodes
>>>>>>>>>>>>>> except the thirteenth node. Is this an issue?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any other insight is appreciated?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ruchir.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Patricia Gorla
>>>>>>>>>>>>> @patriciagorla
>>>>>>>>>>>>>
>>>>>>>>>>>>> Consultant
>>>>>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Node bootstrap

Posted by Ruchir Jha <ru...@gmail.com>.
Also, right now the "top" command shows that we are at 500-700% CPU, and we
have 23 total processors, which means we have a lot of idle CPU left over,
so throwing more threads at compaction and flush should alleviate the
problem?


On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com> wrote:

>
> Right now, we have 6 flush writers and compaction_throughput_mb_per_sec is
> set to 0, which I believe disables throttling.
>
> Also, Here is the iostat -x 5 5 output:
>
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda              10.00  1450.35   50.79   55.92  9775.97 12030.14   204.34
>     1.56   14.62   1.05  11.21
> dm-0              0.00     0.00    3.59   18.82   166.52   150.35    14.14
>     0.44   19.49   0.54   1.22
> dm-1              0.00     0.00    2.32    5.37    18.56    42.98     8.00
>     0.76   98.82   0.43   0.33
> dm-2              0.00     0.00  162.17 5836.66 32714.46 47040.87    13.30
>     5.57    0.90   0.06  36.00
> sdb               0.40  4251.90  106.72  107.35 23123.61 35204.09   272.46
>     4.43   20.68   1.29  27.64
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>                 14.64   10.75    1.81   13.50    0.00   59.29
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda              15.40  1344.60   68.80  145.60  4964.80 11790.40    78.15
>     0.38    1.80   0.80  17.10
> dm-0              0.00     0.00   43.00 1186.20  2292.80  9489.60     9.59
>     4.88    3.90   0.09  11.58
> dm-1              0.00     0.00    1.60    0.00    12.80     0.00     8.00
>     0.03   16.00   2.00   0.32
> dm-2              0.00     0.00  197.20 17583.80 35152.00 140664.00
> 9.89  2847.50  109.52   0.05  93.50
> sdb              13.20 16552.20  159.00  742.20 32745.60 129129.60
> 179.62    72.88   66.01   1.04  93.42
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>                   15.51   19.77    1.97    5.02    0.00   57.73
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda              16.20   523.40   60.00  285.00  5220.80  5913.60    32.27
>     0.25    0.72   0.60  20.86
> dm-0              0.00     0.00    0.80    1.40    32.00    11.20    19.64
>     0.01    3.18   1.55   0.34
> dm-1              0.00     0.00    1.60    0.00    12.80     0.00     8.00
>     0.03   21.00   2.62   0.42
> dm-2              0.00     0.00  339.40 5886.80 66219.20 47092.80    18.20
>   251.66  184.72   0.10  63.48
> sdb               1.00  5025.40  264.20  209.20 60992.00 50422.40   235.35
>     5.98   40.92   1.23  58.28
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>                   16.59   16.34    2.03    9.01    0.00   56.04
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               5.40   320.00   37.40  159.80  2483.20  3529.60    30.49
>     0.10    0.52   0.39   7.76
> dm-0              0.00     0.00    0.20    3.60     1.60    28.80     8.00
>     0.00    0.68   0.68   0.26
> dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00
>     0.00    0.00   0.00   0.00
> dm-2              0.00     0.00  287.20 13108.20 53985.60 104864.00
>  11.86   869.18   48.82   0.06  76.96
> sdb               5.20 12163.40  238.20  532.00 51235.20 93753.60   188.25
>    21.46   23.75   0.97  75.08
>
>
>
> On Tue, Aug 5, 2014 at 1:55 PM, Mark Reddy <ma...@boxever.com> wrote:
>
>> Hi Ruchir,
>>
>> With the large number of blocked flushes and the number of pending
>> compactions would still indicate IO contention. Can you post the output of
>> 'iostat -x 5 5'
>>
>> If you do in fact have spare IO, there are several configuration options
>> you can tune such as increasing the number of flush writers and
>> compaction_throughput_mb_per_sec
>>
>> Mark
>>
>>
>> On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> Also Mark to your comment on my tpstats output, below is my iostat
>>> output, and the iowait is at 4.59%, which means no IO pressure, but we are
>>> still seeing the bad flush performance. Should we try increasing the flush
>>> writers?
>>>
>>>
>>> Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp)  08/05/2014
>>>  _x86_64_        (24 CPU)
>>>
>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>                   5.80   10.25    0.65    4.59    0.00   78.72
>>>
>>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>>> sda             103.83      9630.62     11982.60 3231174328 4020290310
>>> dm-0             13.57       160.17        81.12   53739546   27217432
>>> dm-1              7.59        16.94        43.77    5682200   14686784
>>> dm-2           5792.76     32242.66     45427.12 10817753530 15241278360
>>> sdb             206.09     22789.19     33569.27 7646015080 11262843224
>>>
>>>
>>>
>>> On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha <ru...@gmail.com>
>>> wrote:
>>>
>>>> nodetool status:
>>>>
>>>> Datacenter: datacenter1
>>>> =======================
>>>> Status=Up/Down
>>>> |/ State=Normal/Leaving/Joining/Moving
>>>> --  Address      Load       Tokens  Owns (effective)  Host ID
>>>>                     Rack
>>>> UN  10.10.20.27  1.89 TB    256     25.4%
>>>> 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
>>>> UN  10.10.20.62  1.83 TB    256     25.5%
>>>> 84b47313-da75-4519-94f3-3951d554a3e5  rack1
>>>> UN  10.10.20.47  1.87 TB    256     24.7%
>>>> bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
>>>> UN  10.10.20.45  1.7 TB     256     22.6%
>>>> 8d6bce33-8179-4660-8443-2cf822074ca4  rack1
>>>> UN  10.10.20.15  1.86 TB    256     24.5%
>>>> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
>>>> UN  10.10.20.31  1.87 TB    256     24.9%
>>>> 1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
>>>> UN  10.10.20.35  1.86 TB    256     25.8%
>>>> 17cb8772-2444-46ff-8525-33746514727d  rack1
>>>> UN  10.10.20.51  1.89 TB    256     25.0%
>>>> 0343cd58-3686-465f-8280-56fb72d161e2  rack1
>>>> UN  10.10.20.19  1.91 TB    256     25.5%
>>>> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
>>>> UN  10.10.20.39  1.93 TB    256     26.0%
>>>> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
>>>> UN  10.10.20.52  1.81 TB    256     25.4%
>>>> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
>>>> UN  10.10.20.22  1.89 TB    256     24.8%
>>>> 46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1
>>>>
>>>>
>>>> Note: The new node is not part of the above list.
>>>>
>>>> nodetool compactionstats:
>>>>
>>>> pending tasks: 1649
>>>>           compaction type        keyspace   column family
>>>> completed           total      unit  progress
>>>>                Compaction           iprod   customerorder
>>>>  1682804084     17956558077     bytes     9.37%
>>>>                Compaction            prodgatecustomerorder
>>>>  1664239271      1693502275     bytes    98.27%
>>>>                Compaction  qa_config_bkupfixsessionconfig_hist
>>>>    2443           27253     bytes     8.96%
>>>>                Compaction            prodgatecustomerorder_hist
>>>>  1770577280      5026699390     bytes    35.22%
>>>>                Compaction           iprodgatecustomerorder_hist
>>>>  2959560205    312350192622     bytes     0.95%
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <ma...@boxever.com>
>>>> wrote:
>>>>
>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>> including the new one.
>>>>>
>>>>>
>>>>> Ok so you have num_tokens set to 256 for all nodes with initial_token
>>>>> commented out, this means you are using vnodes and the new node will
>>>>> automatically grab a list of tokens to take over responsibility for.
>>>>>
>>>>> Pool Name                    Active   Pending      Completed   Blocked
>>>>>>  All time blocked
>>>>>> FlushWriter                       0         0           1136
>>>>>> 0               512
>>>>>>
>>>>>> Looks like about 50% of flushes are blocked.
>>>>>>
>>>>>
>>>>> This is a problem as it indicates that the IO system cannot keep up.
>>>>>
>>>>> Just ran this on the new node:
>>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>>> 10
>>>>>
>>>>>
>>>>> This is normal as the new node will most likely take tokens from all
>>>>> nodes in the cluster.
>>>>>
>>>>> Sorry for the multiple updates, but another thing I found was all the
>>>>>> other existing nodes have themselves in the seeds list, but the new node
>>>>>> does not have itself in the seeds list. Can that cause this issue?
>>>>>
>>>>>
>>>>> Seeds are only used when a new node is bootstrapping into the cluster
>>>>> and needs a set of ips to contact and discover the cluster, so this would
>>>>> have no impact on data sizes or streaming. In general it would be
>>>>> considered best practice to have a set of 2-3 seeds from each data center,
>>>>> with all nodes having the same seed list.
>>>>>
>>>>>
>>>>> What is the current output of 'nodetool compactionstats'? Could you
>>>>> also paste the output of nodetool status <keyspace>?
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Sorry for the multiple updates, but another thing I found was all the
>>>>>> other existing nodes have themselves in the seeds list, but the new node
>>>>>> does not have itself in the seeds list. Can that cause this issue?
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Just ran this on the new node:
>>>>>>>
>>>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>>>> 10
>>>>>>>
>>>>>>> Seems like the new node is receiving data from 10 other nodes. Is
>>>>>>> that expected in a vnodes enabled environment?
>>>>>>>
>>>>>>> Ruchir.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Also not sure if this is relevant but just noticed the nodetool
>>>>>>>> tpstats output:
>>>>>>>>
>>>>>>>> Pool Name                    Active   Pending      Completed
>>>>>>>> Blocked  All time blocked
>>>>>>>> FlushWriter                       0         0           1136
>>>>>>>>   0               512
>>>>>>>>
>>>>>>>> Looks like about 50% of flushes are blocked.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>>>>> including the new one.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <
>>>>>>>>> mark.reddy@boxever.com> wrote:
>>>>>>>>>
>>>>>>>>>> My understanding was that if initial_token is left empty on the
>>>>>>>>>>> new node, it just contacts the heaviest node and bisects its token range.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If you are using vnodes and you have num_tokens set to 256 the
>>>>>>>>>> new node will take token ranges dynamically. What is the configuration of
>>>>>>>>>> your other nodes, are you setting num_tokens or initial_token on those?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Mark
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Patricia for your response!
>>>>>>>>>>>
>>>>>>>>>>> On the new node, I just see a lot of the following:
>>>>>>>>>>>
>>>>>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java
>>>>>>>>>>> (line 400) Writing Memtable
>>>>>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>>>>>>
>>>>>>>>>>> so basically it is just busy flushing, and compacting. Would you
>>>>>>>>>>> have any ideas on why the 2x disk space blow up. My understanding was that
>>>>>>>>>>> if initial_token is left empty on the new node, it just contacts the
>>>>>>>>>>> heaviest node and bisects its token range. And the heaviest node is around
>>>>>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because
>>>>>>>>>>> compaction is falling behind?
>>>>>>>>>>>
>>>>>>>>>>> Ruchir
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Ruchir,
>>>>>>>>>>>>
>>>>>>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>>>>>>> compactions on the new bootstrapping node?
>>>>>>>>>>>>
>>>>>>>>>>>> With respect to the seed list, it is generally advisable to use
>>>>>>>>>>>> 3 seed nodes per AZ / DC.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <
>>>>>>>>>>>> ruchir.jha@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node
>>>>>>>>>>>>> cluster where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>>>>>>
>>>>>>>>>>>>> One thing that I noticed that seemed off was the seeds
>>>>>>>>>>>>> property in the yaml of the 13th node comprises of 1..12. Where as the
>>>>>>>>>>>>> seeds property on the existing 12 nodes consists of all the other nodes
>>>>>>>>>>>>> except the thirteenth node. Is this an issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any other insight is appreciated?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ruchir.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Patricia Gorla
>>>>>>>>>>>> @patriciagorla
>>>>>>>>>>>>
>>>>>>>>>>>> Consultant
>>>>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Node bootstrap

Posted by Ruchir Jha <ru...@gmail.com>.
Right now, we have 6 flush writers and compaction_throughput_mb_per_sec is
set to 0, which I believe disables throttling.

Also, Here is the iostat -x 5 5 output:


Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda              10.00  1450.35   50.79   55.92  9775.97 12030.14   204.34
    1.56   14.62   1.05  11.21
dm-0              0.00     0.00    3.59   18.82   166.52   150.35    14.14
    0.44   19.49   0.54   1.22
dm-1              0.00     0.00    2.32    5.37    18.56    42.98     8.00
    0.76   98.82   0.43   0.33
dm-2              0.00     0.00  162.17 5836.66 32714.46 47040.87    13.30
    5.57    0.90   0.06  36.00
sdb               0.40  4251.90  106.72  107.35 23123.61 35204.09   272.46
    4.43   20.68   1.29  27.64

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                14.64   10.75    1.81   13.50    0.00   59.29

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda              15.40  1344.60   68.80  145.60  4964.80 11790.40    78.15
    0.38    1.80   0.80  17.10
dm-0              0.00     0.00   43.00 1186.20  2292.80  9489.60     9.59
    4.88    3.90   0.09  11.58
dm-1              0.00     0.00    1.60    0.00    12.80     0.00     8.00
    0.03   16.00   2.00   0.32
dm-2              0.00     0.00  197.20 17583.80 35152.00 140664.00
9.89  2847.50  109.52   0.05  93.50
sdb              13.20 16552.20  159.00  742.20 32745.60 129129.60   179.62
   72.88   66.01   1.04  93.42

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                  15.51   19.77    1.97    5.02    0.00   57.73

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda              16.20   523.40   60.00  285.00  5220.80  5913.60    32.27
    0.25    0.72   0.60  20.86
dm-0              0.00     0.00    0.80    1.40    32.00    11.20    19.64
    0.01    3.18   1.55   0.34
dm-1              0.00     0.00    1.60    0.00    12.80     0.00     8.00
    0.03   21.00   2.62   0.42
dm-2              0.00     0.00  339.40 5886.80 66219.20 47092.80    18.20
  251.66  184.72   0.10  63.48
sdb               1.00  5025.40  264.20  209.20 60992.00 50422.40   235.35
    5.98   40.92   1.23  58.28

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                  16.59   16.34    2.03    9.01    0.00   56.04

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               5.40   320.00   37.40  159.80  2483.20  3529.60    30.49
    0.10    0.52   0.39   7.76
dm-0              0.00     0.00    0.20    3.60     1.60    28.80     8.00
    0.00    0.68   0.68   0.26
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
dm-2              0.00     0.00  287.20 13108.20 53985.60 104864.00
 11.86   869.18   48.82   0.06  76.96
sdb               5.20 12163.40  238.20  532.00 51235.20 93753.60   188.25
   21.46   23.75   0.97  75.08



On Tue, Aug 5, 2014 at 1:55 PM, Mark Reddy <ma...@boxever.com> wrote:

> Hi Ruchir,
>
> With the large number of blocked flushes and the number of pending
> compactions would still indicate IO contention. Can you post the output of
> 'iostat -x 5 5'
>
> If you do in fact have spare IO, there are several configuration options
> you can tune such as increasing the number of flush writers and
> compaction_throughput_mb_per_sec
>
> Mark
>
>
> On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> Also Mark to your comment on my tpstats output, below is my iostat
>> output, and the iowait is at 4.59%, which means no IO pressure, but we are
>> still seeing the bad flush performance. Should we try increasing the flush
>> writers?
>>
>>
>> Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp)  08/05/2014
>>  _x86_64_        (24 CPU)
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>                   5.80   10.25    0.65    4.59    0.00   78.72
>>
>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>> sda             103.83      9630.62     11982.60 3231174328 4020290310
>> dm-0             13.57       160.17        81.12   53739546   27217432
>> dm-1              7.59        16.94        43.77    5682200   14686784
>> dm-2           5792.76     32242.66     45427.12 10817753530 15241278360
>> sdb             206.09     22789.19     33569.27 7646015080 11262843224
>>
>>
>>
>> On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> nodetool status:
>>>
>>> Datacenter: datacenter1
>>> =======================
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address      Load       Tokens  Owns (effective)  Host ID
>>>                     Rack
>>> UN  10.10.20.27  1.89 TB    256     25.4%
>>> 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
>>> UN  10.10.20.62  1.83 TB    256     25.5%
>>> 84b47313-da75-4519-94f3-3951d554a3e5  rack1
>>> UN  10.10.20.47  1.87 TB    256     24.7%
>>> bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
>>> UN  10.10.20.45  1.7 TB     256     22.6%
>>> 8d6bce33-8179-4660-8443-2cf822074ca4  rack1
>>> UN  10.10.20.15  1.86 TB    256     24.5%
>>> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
>>> UN  10.10.20.31  1.87 TB    256     24.9%
>>> 1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
>>> UN  10.10.20.35  1.86 TB    256     25.8%
>>> 17cb8772-2444-46ff-8525-33746514727d  rack1
>>> UN  10.10.20.51  1.89 TB    256     25.0%
>>> 0343cd58-3686-465f-8280-56fb72d161e2  rack1
>>> UN  10.10.20.19  1.91 TB    256     25.5%
>>> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
>>> UN  10.10.20.39  1.93 TB    256     26.0%
>>> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
>>> UN  10.10.20.52  1.81 TB    256     25.4%
>>> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
>>> UN  10.10.20.22  1.89 TB    256     24.8%
>>> 46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1
>>>
>>>
>>> Note: The new node is not part of the above list.
>>>
>>> nodetool compactionstats:
>>>
>>> pending tasks: 1649
>>>           compaction type        keyspace   column family
>>> completed           total      unit  progress
>>>                Compaction           iprod   customerorder
>>>  1682804084     17956558077     bytes     9.37%
>>>                Compaction            prodgatecustomerorder
>>>  1664239271      1693502275     bytes    98.27%
>>>                Compaction  qa_config_bkupfixsessionconfig_hist
>>>  2443           27253     bytes     8.96%
>>>                Compaction            prodgatecustomerorder_hist
>>>  1770577280      5026699390     bytes    35.22%
>>>                Compaction           iprodgatecustomerorder_hist
>>>  2959560205    312350192622     bytes     0.95%
>>>
>>>
>>>
>>>
>>> On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <ma...@boxever.com>
>>> wrote:
>>>
>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>> including the new one.
>>>>
>>>>
>>>> Ok so you have num_tokens set to 256 for all nodes with initial_token
>>>> commented out, this means you are using vnodes and the new node will
>>>> automatically grab a list of tokens to take over responsibility for.
>>>>
>>>> Pool Name                    Active   Pending      Completed   Blocked
>>>>>  All time blocked
>>>>> FlushWriter                       0         0           1136         0
>>>>>               512
>>>>>
>>>>> Looks like about 50% of flushes are blocked.
>>>>>
>>>>
>>>> This is a problem as it indicates that the IO system cannot keep up.
>>>>
>>>> Just ran this on the new node:
>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>> 10
>>>>
>>>>
>>>> This is normal as the new node will most likely take tokens from all
>>>> nodes in the cluster.
>>>>
>>>> Sorry for the multiple updates, but another thing I found was all the
>>>>> other existing nodes have themselves in the seeds list, but the new node
>>>>> does not have itself in the seeds list. Can that cause this issue?
>>>>
>>>>
>>>> Seeds are only used when a new node is bootstrapping into the cluster
>>>> and needs a set of ips to contact and discover the cluster, so this would
>>>> have no impact on data sizes or streaming. In general it would be
>>>> considered best practice to have a set of 2-3 seeds from each data center,
>>>> with all nodes having the same seed list.
>>>>
>>>>
>>>> What is the current output of 'nodetool compactionstats'? Could you
>>>> also paste the output of nodetool status <keyspace>?
>>>>
>>>> Mark
>>>>
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> Sorry for the multiple updates, but another thing I found was all the
>>>>> other existing nodes have themselves in the seeds list, but the new node
>>>>> does not have itself in the seeds list. Can that cause this issue?
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Just ran this on the new node:
>>>>>>
>>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>>> 10
>>>>>>
>>>>>> Seems like the new node is receiving data from 10 other nodes. Is
>>>>>> that expected in a vnodes enabled environment?
>>>>>>
>>>>>> Ruchir.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Also not sure if this is relevant but just noticed the nodetool
>>>>>>> tpstats output:
>>>>>>>
>>>>>>> Pool Name                    Active   Pending      Completed
>>>>>>> Blocked  All time blocked
>>>>>>> FlushWriter                       0         0           1136
>>>>>>> 0               512
>>>>>>>
>>>>>>> Looks like about 50% of flushes are blocked.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>>>> including the new one.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <mark.reddy@boxever.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> My understanding was that if initial_token is left empty on the
>>>>>>>>>> new node, it just contacts the heaviest node and bisects its token range.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you are using vnodes and you have num_tokens set to 256 the new
>>>>>>>>> node will take token ranges dynamically. What is the configuration of your
>>>>>>>>> other nodes, are you setting num_tokens or initial_token on those?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Patricia for your response!
>>>>>>>>>>
>>>>>>>>>> On the new node, I just see a lot of the following:
>>>>>>>>>>
>>>>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line
>>>>>>>>>> 400) Writing Memtable
>>>>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>>>>>
>>>>>>>>>> so basically it is just busy flushing, and compacting. Would you
>>>>>>>>>> have any ideas on why the 2x disk space blow up. My understanding was that
>>>>>>>>>> if initial_token is left empty on the new node, it just contacts the
>>>>>>>>>> heaviest node and bisects its token range. And the heaviest node is around
>>>>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because
>>>>>>>>>> compaction is falling behind?
>>>>>>>>>>
>>>>>>>>>> Ruchir
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Ruchir,
>>>>>>>>>>>
>>>>>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>>>>>> compactions on the new bootstrapping node?
>>>>>>>>>>>
>>>>>>>>>>> With respect to the seed list, it is generally advisable to use
>>>>>>>>>>> 3 seed nodes per AZ / DC.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <
>>>>>>>>>>> ruchir.jha@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node
>>>>>>>>>>>> cluster where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>>>>>
>>>>>>>>>>>> One thing that I noticed that seemed off was the seeds property
>>>>>>>>>>>> in the yaml of the 13th node comprises of 1..12. Where as the seeds
>>>>>>>>>>>> property on the existing 12 nodes consists of all the other nodes except
>>>>>>>>>>>> the thirteenth node. Is this an issue?
>>>>>>>>>>>>
>>>>>>>>>>>> Any other insight is appreciated?
>>>>>>>>>>>>
>>>>>>>>>>>> Ruchir.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Patricia Gorla
>>>>>>>>>>> @patriciagorla
>>>>>>>>>>>
>>>>>>>>>>> Consultant
>>>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Node bootstrap

Posted by Mark Reddy <ma...@boxever.com>.
Hi Ruchir,

With the large number of blocked flushes and the number of pending
compactions would still indicate IO contention. Can you post the output of
'iostat -x 5 5'

If you do in fact have spare IO, there are several configuration options
you can tune such as increasing the number of flush writers and
compaction_throughput_mb_per_sec

Mark


On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha <ru...@gmail.com> wrote:

> Also Mark to your comment on my tpstats output, below is my iostat output,
> and the iowait is at 4.59%, which means no IO pressure, but we are still
> seeing the bad flush performance. Should we try increasing the flush
> writers?
>
>
> Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp)  08/05/2014
>  _x86_64_        (24 CPU)
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>                   5.80   10.25    0.65    4.59    0.00   78.72
>
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda             103.83      9630.62     11982.60 3231174328 4020290310
> dm-0             13.57       160.17        81.12   53739546   27217432
> dm-1              7.59        16.94        43.77    5682200   14686784
> dm-2           5792.76     32242.66     45427.12 10817753530 15241278360
> sdb             206.09     22789.19     33569.27 7646015080 11262843224
>
>
>
> On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> nodetool status:
>>
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address      Load       Tokens  Owns (effective)  Host ID
>>                   Rack
>> UN  10.10.20.27  1.89 TB    256     25.4%
>> 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
>> UN  10.10.20.62  1.83 TB    256     25.5%
>> 84b47313-da75-4519-94f3-3951d554a3e5  rack1
>> UN  10.10.20.47  1.87 TB    256     24.7%
>> bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
>> UN  10.10.20.45  1.7 TB     256     22.6%
>> 8d6bce33-8179-4660-8443-2cf822074ca4  rack1
>> UN  10.10.20.15  1.86 TB    256     24.5%
>> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
>> UN  10.10.20.31  1.87 TB    256     24.9%
>> 1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
>> UN  10.10.20.35  1.86 TB    256     25.8%
>> 17cb8772-2444-46ff-8525-33746514727d  rack1
>> UN  10.10.20.51  1.89 TB    256     25.0%
>> 0343cd58-3686-465f-8280-56fb72d161e2  rack1
>> UN  10.10.20.19  1.91 TB    256     25.5%
>> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
>> UN  10.10.20.39  1.93 TB    256     26.0%
>> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
>> UN  10.10.20.52  1.81 TB    256     25.4%
>> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
>> UN  10.10.20.22  1.89 TB    256     24.8%
>> 46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1
>>
>>
>> Note: The new node is not part of the above list.
>>
>> nodetool compactionstats:
>>
>> pending tasks: 1649
>>           compaction type        keyspace   column family       completed
>>           total      unit  progress
>>                Compaction           iprod   customerorder      1682804084
>>     17956558077     bytes     9.37%
>>                Compaction            prodgatecustomerorder
>>  1664239271      1693502275     bytes    98.27%
>>                Compaction  qa_config_bkupfixsessionconfig_hist
>>  2443           27253     bytes     8.96%
>>                Compaction            prodgatecustomerorder_hist
>>  1770577280      5026699390     bytes    35.22%
>>                Compaction           iprodgatecustomerorder_hist
>>  2959560205    312350192622     bytes     0.95%
>>
>>
>>
>>
>> On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <ma...@boxever.com>
>> wrote:
>>
>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>> including the new one.
>>>
>>>
>>> Ok so you have num_tokens set to 256 for all nodes with initial_token
>>> commented out, this means you are using vnodes and the new node will
>>> automatically grab a list of tokens to take over responsibility for.
>>>
>>> Pool Name                    Active   Pending      Completed   Blocked
>>>>  All time blocked
>>>> FlushWriter                       0         0           1136         0
>>>>               512
>>>>
>>>> Looks like about 50% of flushes are blocked.
>>>>
>>>
>>> This is a problem as it indicates that the IO system cannot keep up.
>>>
>>> Just ran this on the new node:
>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>> 10
>>>
>>>
>>> This is normal as the new node will most likely take tokens from all
>>> nodes in the cluster.
>>>
>>> Sorry for the multiple updates, but another thing I found was all the
>>>> other existing nodes have themselves in the seeds list, but the new node
>>>> does not have itself in the seeds list. Can that cause this issue?
>>>
>>>
>>> Seeds are only used when a new node is bootstrapping into the cluster
>>> and needs a set of ips to contact and discover the cluster, so this would
>>> have no impact on data sizes or streaming. In general it would be
>>> considered best practice to have a set of 2-3 seeds from each data center,
>>> with all nodes having the same seed list.
>>>
>>>
>>> What is the current output of 'nodetool compactionstats'? Could you also
>>> paste the output of nodetool status <keyspace>?
>>>
>>> Mark
>>>
>>>
>>>
>>> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>>
>>>> Sorry for the multiple updates, but another thing I found was all the
>>>> other existing nodes have themselves in the seeds list, but the new node
>>>> does not have itself in the seeds list. Can that cause this issue?
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> Just ran this on the new node:
>>>>>
>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>> 10
>>>>>
>>>>> Seems like the new node is receiving data from 10 other nodes. Is that
>>>>> expected in a vnodes enabled environment?
>>>>>
>>>>> Ruchir.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Also not sure if this is relevant but just noticed the nodetool
>>>>>> tpstats output:
>>>>>>
>>>>>> Pool Name                    Active   Pending      Completed
>>>>>> Blocked  All time blocked
>>>>>> FlushWriter                       0         0           1136
>>>>>> 0               512
>>>>>>
>>>>>> Looks like about 50% of flushes are blocked.
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>>> including the new one.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> My understanding was that if initial_token is left empty on the new
>>>>>>>>> node, it just contacts the heaviest node and bisects its token range.
>>>>>>>>
>>>>>>>>
>>>>>>>> If you are using vnodes and you have num_tokens set to 256 the new
>>>>>>>> node will take token ranges dynamically. What is the configuration of your
>>>>>>>> other nodes, are you setting num_tokens or initial_token on those?
>>>>>>>>
>>>>>>>>
>>>>>>>> Mark
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Patricia for your response!
>>>>>>>>>
>>>>>>>>> On the new node, I just see a lot of the following:
>>>>>>>>>
>>>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line
>>>>>>>>> 400) Writing Memtable
>>>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>>>>
>>>>>>>>> so basically it is just busy flushing, and compacting. Would you
>>>>>>>>> have any ideas on why the 2x disk space blow up. My understanding was that
>>>>>>>>> if initial_token is left empty on the new node, it just contacts the
>>>>>>>>> heaviest node and bisects its token range. And the heaviest node is around
>>>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because
>>>>>>>>> compaction is falling behind?
>>>>>>>>>
>>>>>>>>> Ruchir
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>>>>
>>>>>>>>>> Ruchir,
>>>>>>>>>>
>>>>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>>>>> compactions on the new bootstrapping node?
>>>>>>>>>>
>>>>>>>>>> With respect to the seed list, it is generally advisable to use 3
>>>>>>>>>> seed nodes per AZ / DC.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ruchir.jha@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node
>>>>>>>>>>> cluster where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>>>>
>>>>>>>>>>> One thing that I noticed that seemed off was the seeds property
>>>>>>>>>>> in the yaml of the 13th node comprises of 1..12. Where as the seeds
>>>>>>>>>>> property on the existing 12 nodes consists of all the other nodes except
>>>>>>>>>>> the thirteenth node. Is this an issue?
>>>>>>>>>>>
>>>>>>>>>>> Any other insight is appreciated?
>>>>>>>>>>>
>>>>>>>>>>> Ruchir.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Patricia Gorla
>>>>>>>>>> @patriciagorla
>>>>>>>>>>
>>>>>>>>>> Consultant
>>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Node bootstrap

Posted by Ruchir Jha <ru...@gmail.com>.
Also Mark to your comment on my tpstats output, below is my iostat output,
and the iowait is at 4.59%, which means no IO pressure, but we are still
seeing the bad flush performance. Should we try increasing the flush
writers?


Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp)  08/05/2014
 _x86_64_        (24 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                  5.80   10.25    0.65    4.59    0.00   78.72

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda             103.83      9630.62     11982.60 3231174328 4020290310
dm-0             13.57       160.17        81.12   53739546   27217432
dm-1              7.59        16.94        43.77    5682200   14686784
dm-2           5792.76     32242.66     45427.12 10817753530 15241278360
sdb             206.09     22789.19     33569.27 7646015080 11262843224



On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha <ru...@gmail.com> wrote:

> nodetool status:
>
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address      Load       Tokens  Owns (effective)  Host ID
>                   Rack
> UN  10.10.20.27  1.89 TB    256     25.4%
> 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
> UN  10.10.20.62  1.83 TB    256     25.5%
> 84b47313-da75-4519-94f3-3951d554a3e5  rack1
> UN  10.10.20.47  1.87 TB    256     24.7%
> bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
> UN  10.10.20.45  1.7 TB     256     22.6%
> 8d6bce33-8179-4660-8443-2cf822074ca4  rack1
> UN  10.10.20.15  1.86 TB    256     24.5%
> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
> UN  10.10.20.31  1.87 TB    256     24.9%
> 1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
> UN  10.10.20.35  1.86 TB    256     25.8%
> 17cb8772-2444-46ff-8525-33746514727d  rack1
> UN  10.10.20.51  1.89 TB    256     25.0%
> 0343cd58-3686-465f-8280-56fb72d161e2  rack1
> UN  10.10.20.19  1.91 TB    256     25.5%
> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
> UN  10.10.20.39  1.93 TB    256     26.0%
> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
> UN  10.10.20.52  1.81 TB    256     25.4%
> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
> UN  10.10.20.22  1.89 TB    256     24.8%
> 46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1
>
>
> Note: The new node is not part of the above list.
>
> nodetool compactionstats:
>
> pending tasks: 1649
>           compaction type        keyspace   column family       completed
>           total      unit  progress
>                Compaction           iprod   customerorder      1682804084
>     17956558077     bytes     9.37%
>                Compaction            prodgatecustomerorder      1664239271
>      1693502275     bytes    98.27%
>                Compaction  qa_config_bkupfixsessionconfig_hist
>  2443           27253     bytes     8.96%
>                Compaction            prodgatecustomerorder_hist
>  1770577280      5026699390     bytes    35.22%
>                Compaction           iprodgatecustomerorder_hist
>  2959560205    312350192622     bytes     0.95%
>
>
>
>
> On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <ma...@boxever.com>
> wrote:
>
>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>> including the new one.
>>
>>
>> Ok so you have num_tokens set to 256 for all nodes with initial_token
>> commented out, this means you are using vnodes and the new node will
>> automatically grab a list of tokens to take over responsibility for.
>>
>> Pool Name                    Active   Pending      Completed   Blocked
>>>  All time blocked
>>> FlushWriter                       0         0           1136         0
>>>             512
>>>
>>> Looks like about 50% of flushes are blocked.
>>>
>>
>> This is a problem as it indicates that the IO system cannot keep up.
>>
>> Just ran this on the new node:
>>> nodetool netstats | grep "Streaming from" | wc -l
>>> 10
>>
>>
>> This is normal as the new node will most likely take tokens from all
>> nodes in the cluster.
>>
>> Sorry for the multiple updates, but another thing I found was all the
>>> other existing nodes have themselves in the seeds list, but the new node
>>> does not have itself in the seeds list. Can that cause this issue?
>>
>>
>> Seeds are only used when a new node is bootstrapping into the cluster and
>> needs a set of ips to contact and discover the cluster, so this would have
>> no impact on data sizes or streaming. In general it would be considered
>> best practice to have a set of 2-3 seeds from each data center, with all
>> nodes having the same seed list.
>>
>>
>> What is the current output of 'nodetool compactionstats'? Could you also
>> paste the output of nodetool status <keyspace>?
>>
>> Mark
>>
>>
>>
>> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> Sorry for the multiple updates, but another thing I found was all the
>>> other existing nodes have themselves in the seeds list, but the new node
>>> does not have itself in the seeds list. Can that cause this issue?
>>>
>>>
>>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com>
>>> wrote:
>>>
>>>> Just ran this on the new node:
>>>>
>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>> 10
>>>>
>>>> Seems like the new node is receiving data from 10 other nodes. Is that
>>>> expected in a vnodes enabled environment?
>>>>
>>>> Ruchir.
>>>>
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> Also not sure if this is relevant but just noticed the nodetool
>>>>> tpstats output:
>>>>>
>>>>> Pool Name                    Active   Pending      Completed   Blocked
>>>>>  All time blocked
>>>>> FlushWriter                       0         0           1136         0
>>>>>               512
>>>>>
>>>>> Looks like about 50% of flushes are blocked.
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>> including the new one.
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
>>>>>> wrote:
>>>>>>
>>>>>>> My understanding was that if initial_token is left empty on the new
>>>>>>>> node, it just contacts the heaviest node and bisects its token range.
>>>>>>>
>>>>>>>
>>>>>>> If you are using vnodes and you have num_tokens set to 256 the new
>>>>>>> node will take token ranges dynamically. What is the configuration of your
>>>>>>> other nodes, are you setting num_tokens or initial_token on those?
>>>>>>>
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Patricia for your response!
>>>>>>>>
>>>>>>>> On the new node, I just see a lot of the following:
>>>>>>>>
>>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line
>>>>>>>> 400) Writing Memtable
>>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>>>
>>>>>>>> so basically it is just busy flushing, and compacting. Would you
>>>>>>>> have any ideas on why the 2x disk space blow up. My understanding was that
>>>>>>>> if initial_token is left empty on the new node, it just contacts the
>>>>>>>> heaviest node and bisects its token range. And the heaviest node is around
>>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because
>>>>>>>> compaction is falling behind?
>>>>>>>>
>>>>>>>> Ruchir
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>>>
>>>>>>>>> Ruchir,
>>>>>>>>>
>>>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>>>> compactions on the new bootstrapping node?
>>>>>>>>>
>>>>>>>>> With respect to the seed list, it is generally advisable to use 3
>>>>>>>>> seed nodes per AZ / DC.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster
>>>>>>>>>> where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>>>
>>>>>>>>>> One thing that I noticed that seemed off was the seeds property
>>>>>>>>>> in the yaml of the 13th node comprises of 1..12. Where as the seeds
>>>>>>>>>> property on the existing 12 nodes consists of all the other nodes except
>>>>>>>>>> the thirteenth node. Is this an issue?
>>>>>>>>>>
>>>>>>>>>> Any other insight is appreciated?
>>>>>>>>>>
>>>>>>>>>> Ruchir.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Patricia Gorla
>>>>>>>>> @patriciagorla
>>>>>>>>>
>>>>>>>>> Consultant
>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Node bootstrap

Posted by Ruchir Jha <ru...@gmail.com>.
nodetool status:

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens  Owns (effective)  Host ID
                Rack
UN  10.10.20.27  1.89 TB    256     25.4%
76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
UN  10.10.20.62  1.83 TB    256     25.5%
84b47313-da75-4519-94f3-3951d554a3e5  rack1
UN  10.10.20.47  1.87 TB    256     24.7%
bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
UN  10.10.20.45  1.7 TB     256     22.6%
8d6bce33-8179-4660-8443-2cf822074ca4  rack1
UN  10.10.20.15  1.86 TB    256     24.5%
01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
UN  10.10.20.31  1.87 TB    256     24.9%
1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
UN  10.10.20.35  1.86 TB    256     25.8%
17cb8772-2444-46ff-8525-33746514727d  rack1
UN  10.10.20.51  1.89 TB    256     25.0%
0343cd58-3686-465f-8280-56fb72d161e2  rack1
UN  10.10.20.19  1.91 TB    256     25.5%
30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
UN  10.10.20.39  1.93 TB    256     26.0%
b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
UN  10.10.20.52  1.81 TB    256     25.4%
6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
UN  10.10.20.22  1.89 TB    256     24.8%
46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1


Note: The new node is not part of the above list.

nodetool compactionstats:

pending tasks: 1649
          compaction type        keyspace   column family       completed
        total      unit  progress
               Compaction           iprod   customerorder      1682804084
  17956558077     bytes     9.37%
               Compaction            prodgatecustomerorder      1664239271
     1693502275     bytes    98.27%
               Compaction  qa_config_bkupfixsessionconfig_hist
 2443           27253     bytes     8.96%
               Compaction            prodgatecustomerorder_hist
 1770577280      5026699390     bytes    35.22%
               Compaction           iprodgatecustomerorder_hist
 2959560205    312350192622     bytes     0.95%




On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <ma...@boxever.com> wrote:

> Yes num_tokens is set to 256. initial_token is blank on all nodes
>> including the new one.
>
>
> Ok so you have num_tokens set to 256 for all nodes with initial_token
> commented out, this means you are using vnodes and the new node will
> automatically grab a list of tokens to take over responsibility for.
>
> Pool Name                    Active   Pending      Completed   Blocked
>>  All time blocked
>> FlushWriter                       0         0           1136         0
>>             512
>>
>> Looks like about 50% of flushes are blocked.
>>
>
> This is a problem as it indicates that the IO system cannot keep up.
>
> Just ran this on the new node:
>> nodetool netstats | grep "Streaming from" | wc -l
>> 10
>
>
> This is normal as the new node will most likely take tokens from all nodes
> in the cluster.
>
> Sorry for the multiple updates, but another thing I found was all the
>> other existing nodes have themselves in the seeds list, but the new node
>> does not have itself in the seeds list. Can that cause this issue?
>
>
> Seeds are only used when a new node is bootstrapping into the cluster and
> needs a set of ips to contact and discover the cluster, so this would have
> no impact on data sizes or streaming. In general it would be considered
> best practice to have a set of 2-3 seeds from each data center, with all
> nodes having the same seed list.
>
>
> What is the current output of 'nodetool compactionstats'? Could you also
> paste the output of nodetool status <keyspace>?
>
> Mark
>
>
>
> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> Sorry for the multiple updates, but another thing I found was all the
>> other existing nodes have themselves in the seeds list, but the new node
>> does not have itself in the seeds list. Can that cause this issue?
>>
>>
>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> Just ran this on the new node:
>>>
>>> nodetool netstats | grep "Streaming from" | wc -l
>>> 10
>>>
>>> Seems like the new node is receiving data from 10 other nodes. Is that
>>> expected in a vnodes enabled environment?
>>>
>>> Ruchir.
>>>
>>>
>>>
>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com>
>>> wrote:
>>>
>>>> Also not sure if this is relevant but just noticed the nodetool tpstats
>>>> output:
>>>>
>>>> Pool Name                    Active   Pending      Completed   Blocked
>>>>  All time blocked
>>>> FlushWriter                       0         0           1136         0
>>>>               512
>>>>
>>>> Looks like about 50% of flushes are blocked.
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>> including the new one.
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
>>>>> wrote:
>>>>>
>>>>>> My understanding was that if initial_token is left empty on the new
>>>>>>> node, it just contacts the heaviest node and bisects its token range.
>>>>>>
>>>>>>
>>>>>> If you are using vnodes and you have num_tokens set to 256 the new
>>>>>> node will take token ranges dynamically. What is the configuration of your
>>>>>> other nodes, are you setting num_tokens or initial_token on those?
>>>>>>
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Patricia for your response!
>>>>>>>
>>>>>>> On the new node, I just see a lot of the following:
>>>>>>>
>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line
>>>>>>> 400) Writing Memtable
>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>>
>>>>>>> so basically it is just busy flushing, and compacting. Would you
>>>>>>> have any ideas on why the 2x disk space blow up. My understanding was that
>>>>>>> if initial_token is left empty on the new node, it just contacts the
>>>>>>> heaviest node and bisects its token range. And the heaviest node is around
>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because
>>>>>>> compaction is falling behind?
>>>>>>>
>>>>>>> Ruchir
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>>
>>>>>>>> Ruchir,
>>>>>>>>
>>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>>> compactions on the new bootstrapping node?
>>>>>>>>
>>>>>>>> With respect to the seed list, it is generally advisable to use 3
>>>>>>>> seed nodes per AZ / DC.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster
>>>>>>>>> where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>>
>>>>>>>>> One thing that I noticed that seemed off was the seeds property in
>>>>>>>>> the yaml of the 13th node comprises of 1..12. Where as the seeds property
>>>>>>>>> on the existing 12 nodes consists of all the other nodes except the
>>>>>>>>> thirteenth node. Is this an issue?
>>>>>>>>>
>>>>>>>>> Any other insight is appreciated?
>>>>>>>>>
>>>>>>>>> Ruchir.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Patricia Gorla
>>>>>>>> @patriciagorla
>>>>>>>>
>>>>>>>> Consultant
>>>>>>>> Apache Cassandra Consulting
>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Node bootstrap

Posted by Mark Reddy <ma...@boxever.com>.
>
> Yes num_tokens is set to 256. initial_token is blank on all nodes
> including the new one.


Ok so you have num_tokens set to 256 for all nodes with initial_token
commented out, this means you are using vnodes and the new node will
automatically grab a list of tokens to take over responsibility for.

Pool Name                    Active   Pending      Completed   Blocked  All
> time blocked
> FlushWriter                       0         0           1136         0
>           512
>
> Looks like about 50% of flushes are blocked.
>

This is a problem as it indicates that the IO system cannot keep up.

Just ran this on the new node:
> nodetool netstats | grep "Streaming from" | wc -l
> 10


This is normal as the new node will most likely take tokens from all nodes
in the cluster.

Sorry for the multiple updates, but another thing I found was all the other
> existing nodes have themselves in the seeds list, but the new node does not
> have itself in the seeds list. Can that cause this issue?


Seeds are only used when a new node is bootstrapping into the cluster and
needs a set of ips to contact and discover the cluster, so this would have
no impact on data sizes or streaming. In general it would be considered
best practice to have a set of 2-3 seeds from each data center, with all
nodes having the same seed list.


What is the current output of 'nodetool compactionstats'? Could you also
paste the output of nodetool status <keyspace>?

Mark



On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com> wrote:

> Sorry for the multiple updates, but another thing I found was all the
> other existing nodes have themselves in the seeds list, but the new node
> does not have itself in the seeds list. Can that cause this issue?
>
>
> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> Just ran this on the new node:
>>
>> nodetool netstats | grep "Streaming from" | wc -l
>> 10
>>
>> Seems like the new node is receiving data from 10 other nodes. Is that
>> expected in a vnodes enabled environment?
>>
>> Ruchir.
>>
>>
>>
>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> Also not sure if this is relevant but just noticed the nodetool tpstats
>>> output:
>>>
>>> Pool Name                    Active   Pending      Completed   Blocked
>>>  All time blocked
>>> FlushWriter                       0         0           1136         0
>>>             512
>>>
>>> Looks like about 50% of flushes are blocked.
>>>
>>>
>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>> wrote:
>>>
>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>> including the new one.
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
>>>> wrote:
>>>>
>>>>> My understanding was that if initial_token is left empty on the new
>>>>>> node, it just contacts the heaviest node and bisects its token range.
>>>>>
>>>>>
>>>>> If you are using vnodes and you have num_tokens set to 256 the new
>>>>> node will take token ranges dynamically. What is the configuration of your
>>>>> other nodes, are you setting num_tokens or initial_token on those?
>>>>>
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Patricia for your response!
>>>>>>
>>>>>> On the new node, I just see a lot of the following:
>>>>>>
>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line
>>>>>> 400) Writing Memtable
>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>
>>>>>> so basically it is just busy flushing, and compacting. Would you have
>>>>>> any ideas on why the 2x disk space blow up. My understanding was that if
>>>>>> initial_token is left empty on the new node, it just contacts the heaviest
>>>>>> node and bisects its token range. And the heaviest node is around 2.1 TB,
>>>>>> and the new node is already at 4 TB. Could this be because compaction is
>>>>>> falling behind?
>>>>>>
>>>>>> Ruchir
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>
>>>>>>> Ruchir,
>>>>>>>
>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>> compactions on the new bootstrapping node?
>>>>>>>
>>>>>>> With respect to the seed list, it is generally advisable to use 3
>>>>>>> seed nodes per AZ / DC.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster
>>>>>>>> where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>
>>>>>>>> One thing that I noticed that seemed off was the seeds property in
>>>>>>>> the yaml of the 13th node comprises of 1..12. Where as the seeds property
>>>>>>>> on the existing 12 nodes consists of all the other nodes except the
>>>>>>>> thirteenth node. Is this an issue?
>>>>>>>>
>>>>>>>> Any other insight is appreciated?
>>>>>>>>
>>>>>>>> Ruchir.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Patricia Gorla
>>>>>>> @patriciagorla
>>>>>>>
>>>>>>> Consultant
>>>>>>> Apache Cassandra Consulting
>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Node bootstrap

Posted by Ruchir Jha <ru...@gmail.com>.
Sorry for the multiple updates, but another thing I found was all the other
existing nodes have themselves in the seeds list, but the new node does not
have itself in the seeds list. Can that cause this issue?


On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com> wrote:

> Just ran this on the new node:
>
> nodetool netstats | grep "Streaming from" | wc -l
> 10
>
> Seems like the new node is receiving data from 10 other nodes. Is that
> expected in a vnodes enabled environment?
>
> Ruchir.
>
>
>
> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> Also not sure if this is relevant but just noticed the nodetool tpstats
>> output:
>>
>> Pool Name                    Active   Pending      Completed   Blocked
>>  All time blocked
>> FlushWriter                       0         0           1136         0
>>             512
>>
>> Looks like about 50% of flushes are blocked.
>>
>>
>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>> including the new one.
>>>
>>>
>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
>>> wrote:
>>>
>>>> My understanding was that if initial_token is left empty on the new
>>>>> node, it just contacts the heaviest node and bisects its token range.
>>>>
>>>>
>>>> If you are using vnodes and you have num_tokens set to 256 the new node
>>>> will take token ranges dynamically. What is the configuration of your other
>>>> nodes, are you setting num_tokens or initial_token on those?
>>>>
>>>>
>>>> Mark
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Patricia for your response!
>>>>>
>>>>> On the new node, I just see a lot of the following:
>>>>>
>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
>>>>> Writing Memtable
>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>
>>>>> so basically it is just busy flushing, and compacting. Would you have
>>>>> any ideas on why the 2x disk space blow up. My understanding was that if
>>>>> initial_token is left empty on the new node, it just contacts the heaviest
>>>>> node and bisects its token range. And the heaviest node is around 2.1 TB,
>>>>> and the new node is already at 4 TB. Could this be because compaction is
>>>>> falling behind?
>>>>>
>>>>> Ruchir
>>>>>
>>>>>
>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>> patricia@thelastpickle.com> wrote:
>>>>>
>>>>>> Ruchir,
>>>>>>
>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>> compactions on the new bootstrapping node?
>>>>>>
>>>>>> With respect to the seed list, it is generally advisable to use 3
>>>>>> seed nodes per AZ / DC.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster
>>>>>>> where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>> running major compactions while the streaming is going on?
>>>>>>>
>>>>>>> One thing that I noticed that seemed off was the seeds property in
>>>>>>> the yaml of the 13th node comprises of 1..12. Where as the seeds property
>>>>>>> on the existing 12 nodes consists of all the other nodes except the
>>>>>>> thirteenth node. Is this an issue?
>>>>>>>
>>>>>>> Any other insight is appreciated?
>>>>>>>
>>>>>>> Ruchir.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Patricia Gorla
>>>>>> @patriciagorla
>>>>>>
>>>>>> Consultant
>>>>>> Apache Cassandra Consulting
>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Node bootstrap

Posted by Ruchir Jha <ru...@gmail.com>.
Just ran this on the new node:

nodetool netstats | grep "Streaming from" | wc -l
10

Seems like the new node is receiving data from 10 other nodes. Is that
expected in a vnodes enabled environment?

Ruchir.



On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com> wrote:

> Also not sure if this is relevant but just noticed the nodetool tpstats
> output:
>
> Pool Name                    Active   Pending      Completed   Blocked
>  All time blocked
> FlushWriter                       0         0           1136         0
>           512
>
> Looks like about 50% of flushes are blocked.
>
>
> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>> including the new one.
>>
>>
>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
>> wrote:
>>
>>> My understanding was that if initial_token is left empty on the new
>>>> node, it just contacts the heaviest node and bisects its token range.
>>>
>>>
>>> If you are using vnodes and you have num_tokens set to 256 the new node
>>> will take token ranges dynamically. What is the configuration of your other
>>> nodes, are you setting num_tokens or initial_token on those?
>>>
>>>
>>> Mark
>>>
>>>
>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>>
>>>> Thanks Patricia for your response!
>>>>
>>>> On the new node, I just see a lot of the following:
>>>>
>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
>>>> Writing Memtable
>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
>>>> (line 262) Compacted 12 sstables to
>>>>
>>>> so basically it is just busy flushing, and compacting. Would you have
>>>> any ideas on why the 2x disk space blow up. My understanding was that if
>>>> initial_token is left empty on the new node, it just contacts the heaviest
>>>> node and bisects its token range. And the heaviest node is around 2.1 TB,
>>>> and the new node is already at 4 TB. Could this be because compaction is
>>>> falling behind?
>>>>
>>>> Ruchir
>>>>
>>>>
>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>> patricia@thelastpickle.com> wrote:
>>>>
>>>>> Ruchir,
>>>>>
>>>>> What exactly are you seeing in the logs? Are you running major
>>>>> compactions on the new bootstrapping node?
>>>>>
>>>>> With respect to the seed list, it is generally advisable to use 3 seed
>>>>> nodes per AZ / DC.
>>>>>
>>>>> Cheers,
>>>>>
>>>>>
>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster
>>>>>> where the average data size per node is about 2.1 TB. The bootstrap
>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>> running major compactions while the streaming is going on?
>>>>>>
>>>>>> One thing that I noticed that seemed off was the seeds property in
>>>>>> the yaml of the 13th node comprises of 1..12. Where as the seeds property
>>>>>> on the existing 12 nodes consists of all the other nodes except the
>>>>>> thirteenth node. Is this an issue?
>>>>>>
>>>>>> Any other insight is appreciated?
>>>>>>
>>>>>> Ruchir.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Patricia Gorla
>>>>> @patriciagorla
>>>>>
>>>>> Consultant
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Node bootstrap

Posted by Ruchir Jha <ru...@gmail.com>.
Also not sure if this is relevant but just noticed the nodetool tpstats
output:

Pool Name                    Active   Pending      Completed   Blocked  All
time blocked
FlushWriter                       0         0           1136         0
          512

Looks like about 50% of flushes are blocked.


On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com> wrote:

> Yes num_tokens is set to 256. initial_token is blank on all nodes
> including the new one.
>
>
> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
> wrote:
>
>> My understanding was that if initial_token is left empty on the new node,
>>> it just contacts the heaviest node and bisects its token range.
>>
>>
>> If you are using vnodes and you have num_tokens set to 256 the new node
>> will take token ranges dynamically. What is the configuration of your other
>> nodes, are you setting num_tokens or initial_token on those?
>>
>>
>> Mark
>>
>>
>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> Thanks Patricia for your response!
>>>
>>> On the new node, I just see a lot of the following:
>>>
>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
>>> Writing Memtable
>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
>>> (line 262) Compacted 12 sstables to
>>>
>>> so basically it is just busy flushing, and compacting. Would you have
>>> any ideas on why the 2x disk space blow up. My understanding was that if
>>> initial_token is left empty on the new node, it just contacts the heaviest
>>> node and bisects its token range. And the heaviest node is around 2.1 TB,
>>> and the new node is already at 4 TB. Could this be because compaction is
>>> falling behind?
>>>
>>> Ruchir
>>>
>>>
>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>> patricia@thelastpickle.com> wrote:
>>>
>>>> Ruchir,
>>>>
>>>> What exactly are you seeing in the logs? Are you running major
>>>> compactions on the new bootstrapping node?
>>>>
>>>> With respect to the seed list, it is generally advisable to use 3 seed
>>>> nodes per AZ / DC.
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster
>>>>> where the average data size per node is about 2.1 TB. The bootstrap
>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>> running major compactions while the streaming is going on?
>>>>>
>>>>> One thing that I noticed that seemed off was the seeds property in the
>>>>> yaml of the 13th node comprises of 1..12. Where as the seeds property on
>>>>> the existing 12 nodes consists of all the other nodes except the thirteenth
>>>>> node. Is this an issue?
>>>>>
>>>>> Any other insight is appreciated?
>>>>>
>>>>> Ruchir.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Patricia Gorla
>>>> @patriciagorla
>>>>
>>>> Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>
>>>
>>>
>>
>

Re: Node bootstrap

Posted by Ruchir Jha <ru...@gmail.com>.
Yes num_tokens is set to 256. initial_token is blank on all nodes including
the new one.


On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com> wrote:

> My understanding was that if initial_token is left empty on the new node,
>> it just contacts the heaviest node and bisects its token range.
>
>
> If you are using vnodes and you have num_tokens set to 256 the new node
> will take token ranges dynamically. What is the configuration of your other
> nodes, are you setting num_tokens or initial_token on those?
>
>
> Mark
>
>
> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> Thanks Patricia for your response!
>>
>> On the new node, I just see a lot of the following:
>>
>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
>> Writing Memtable
>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
>> (line 262) Compacted 12 sstables to
>>
>> so basically it is just busy flushing, and compacting. Would you have any
>> ideas on why the 2x disk space blow up. My understanding was that if
>> initial_token is left empty on the new node, it just contacts the heaviest
>> node and bisects its token range. And the heaviest node is around 2.1 TB,
>> and the new node is already at 4 TB. Could this be because compaction is
>> falling behind?
>>
>> Ruchir
>>
>>
>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>> patricia@thelastpickle.com> wrote:
>>
>>> Ruchir,
>>>
>>> What exactly are you seeing in the logs? Are you running major
>>> compactions on the new bootstrapping node?
>>>
>>> With respect to the seed list, it is generally advisable to use 3 seed
>>> nodes per AZ / DC.
>>>
>>> Cheers,
>>>
>>>
>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>> wrote:
>>>
>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster where
>>>> the average data size per node is about 2.1 TB. The bootstrap streaming has
>>>> been going on for 2 days now, and the disk size on the new node is already
>>>> above 4 TB and still going. Is this because the new node is running major
>>>> compactions while the streaming is going on?
>>>>
>>>> One thing that I noticed that seemed off was the seeds property in the
>>>> yaml of the 13th node comprises of 1..12. Where as the seeds property on
>>>> the existing 12 nodes consists of all the other nodes except the thirteenth
>>>> node. Is this an issue?
>>>>
>>>> Any other insight is appreciated?
>>>>
>>>> Ruchir.
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Patricia Gorla
>>> @patriciagorla
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>
>>
>>
>

Re: Node bootstrap

Posted by Mark Reddy <ma...@boxever.com>.
>
> My understanding was that if initial_token is left empty on the new node,
> it just contacts the heaviest node and bisects its token range.


If you are using vnodes and you have num_tokens set to 256 the new node
will take token ranges dynamically. What is the configuration of your other
nodes, are you setting num_tokens or initial_token on those?


Mark


On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com> wrote:

> Thanks Patricia for your response!
>
> On the new node, I just see a lot of the following:
>
> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
> Writing Memtable
> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
> (line 262) Compacted 12 sstables to
>
> so basically it is just busy flushing, and compacting. Would you have any
> ideas on why the 2x disk space blow up. My understanding was that if
> initial_token is left empty on the new node, it just contacts the heaviest
> node and bisects its token range. And the heaviest node is around 2.1 TB,
> and the new node is already at 4 TB. Could this be because compaction is
> falling behind?
>
> Ruchir
>
>
> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <patricia@thelastpickle.com
> > wrote:
>
>> Ruchir,
>>
>> What exactly are you seeing in the logs? Are you running major
>> compactions on the new bootstrapping node?
>>
>> With respect to the seed list, it is generally advisable to use 3 seed
>> nodes per AZ / DC.
>>
>> Cheers,
>>
>>
>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> I am trying to bootstrap the thirteenth node in a 12 node cluster where
>>> the average data size per node is about 2.1 TB. The bootstrap streaming has
>>> been going on for 2 days now, and the disk size on the new node is already
>>> above 4 TB and still going. Is this because the new node is running major
>>> compactions while the streaming is going on?
>>>
>>> One thing that I noticed that seemed off was the seeds property in the
>>> yaml of the 13th node comprises of 1..12. Where as the seeds property on
>>> the existing 12 nodes consists of all the other nodes except the thirteenth
>>> node. Is this an issue?
>>>
>>> Any other insight is appreciated?
>>>
>>> Ruchir.
>>>
>>>
>>>
>>
>>
>> --
>> Patricia Gorla
>> @patriciagorla
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com <http://thelastpickle.com>
>>
>
>

Re: Node bootstrap

Posted by Ruchir Jha <ru...@gmail.com>.
Thanks Patricia for your response!

On the new node, I just see a lot of the following:

INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
Writing Memtable
INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
(line 262) Compacted 12 sstables to

so basically it is just busy flushing, and compacting. Would you have any
ideas on why the 2x disk space blow up. My understanding was that if
initial_token is left empty on the new node, it just contacts the heaviest
node and bisects its token range. And the heaviest node is around 2.1 TB,
and the new node is already at 4 TB. Could this be because compaction is
falling behind?

Ruchir


On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <pa...@thelastpickle.com>
wrote:

> Ruchir,
>
> What exactly are you seeing in the logs? Are you running major compactions
> on the new bootstrapping node?
>
> With respect to the seed list, it is generally advisable to use 3 seed
> nodes per AZ / DC.
>
> Cheers,
>
>
> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> I am trying to bootstrap the thirteenth node in a 12 node cluster where
>> the average data size per node is about 2.1 TB. The bootstrap streaming has
>> been going on for 2 days now, and the disk size on the new node is already
>> above 4 TB and still going. Is this because the new node is running major
>> compactions while the streaming is going on?
>>
>> One thing that I noticed that seemed off was the seeds property in the
>> yaml of the 13th node comprises of 1..12. Where as the seeds property on
>> the existing 12 nodes consists of all the other nodes except the thirteenth
>> node. Is this an issue?
>>
>> Any other insight is appreciated?
>>
>> Ruchir.
>>
>>
>>
>
>
> --
> Patricia Gorla
> @patriciagorla
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com <http://thelastpickle.com>
>

Re: Node bootstrap

Posted by Patricia Gorla <pa...@thelastpickle.com>.
Ruchir,

What exactly are you seeing in the logs? Are you running major compactions
on the new bootstrapping node?

With respect to the seed list, it is generally advisable to use 3 seed
nodes per AZ / DC.

Cheers,


On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com> wrote:

> I am trying to bootstrap the thirteenth node in a 12 node cluster where
> the average data size per node is about 2.1 TB. The bootstrap streaming has
> been going on for 2 days now, and the disk size on the new node is already
> above 4 TB and still going. Is this because the new node is running major
> compactions while the streaming is going on?
>
> One thing that I noticed that seemed off was the seeds property in the
> yaml of the 13th node comprises of 1..12. Where as the seeds property on
> the existing 12 nodes consists of all the other nodes except the thirteenth
> node. Is this an issue?
>
> Any other insight is appreciated?
>
> Ruchir.
>
>
>


-- 
Patricia Gorla
@patriciagorla

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com <http://thelastpickle.com>