You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ruchir Jha <ru...@gmail.com> on 2014/08/04 19:41:32 UTC
Node bootstrap
I am trying to bootstrap the thirteenth node in a 12 node cluster where the
average data size per node is about 2.1 TB. The bootstrap streaming has
been going on for 2 days now, and the disk size on the new node is already
above 4 TB and still going. Is this because the new node is running major
compactions while the streaming is going on?
One thing that I noticed that seemed off was the seeds property in the yaml
of the 13th node comprises of 1..12. Where as the seeds property on the
existing 12 nodes consists of all the other nodes except the thirteenth
node. Is this an issue?
Any other insight is appreciated?
Ruchir.
Re: Node bootstrap
Posted by Ruchir Jha <ru...@gmail.com>.
Still having issues with node bootstrapping. The new node just died,
because it Full Gced, the nodes it had actual streams with noticed its
down. After the full gc finished the new node printed this log :
ERROR 02:52:36,259 Stream failed because /10.10.20.35 died or was
restarted/removed (streams may still be active in background, but further
streams won't be started)
Here 10.10.20.35 is an existing node, the new guy was streaming from. A
similar log was printed for every other node on the cluster. Why did the
new node just exit after the FGC pause?
We have heap dumps enabled on Full GC's and this are the top offenders on
the new node. A new entry that I noticed is the CompressionMetaData chunks.
Anything I can do to optimize that?
num #instances #bytes class name
----------------------------------------------
1: 42508421 4818885752 [B
2: 65860543 3161306064 java.nio.HeapByteBuffer
3: 124361093 2984666232
org.apache.cassandra.io.compress.CompressionMetadata$Chunk
4: 29745665 1427791920
edu.stanford.ppl.concurrent.SnapTreeMap$Node
5: 29810362 953931584 org.apache.cassandra.db.Column
6: 31623 498012768
[Lorg.apache.cassandra.io.compress.CompressionMetadata$Chunk;
On Tue, Aug 5, 2014 at 2:59 PM, Ruchir Jha <ru...@gmail.com> wrote:
> Also, right now the "top" command shows that we are at 500-700% CPU, and
> we have 23 total processors, which means we have a lot of idle CPU left
> over, so throwing more threads at compaction and flush should alleviate the
> problem?
>
>
> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com> wrote:
>
>>
>> Right now, we have 6 flush writers and compaction_throughput_mb_per_sec
>> is set to 0, which I believe disables throttling.
>>
>> Also, Here is the iostat -x 5 5 output:
>>
>>
>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
>> avgrq-sz avgqu-sz await svctm %util
>> sda 10.00 1450.35 50.79 55.92 9775.97 12030.14
>> 204.34 1.56 14.62 1.05 11.21
>> dm-0 0.00 0.00 3.59 18.82 166.52 150.35
>> 14.14 0.44 19.49 0.54 1.22
>> dm-1 0.00 0.00 2.32 5.37 18.56 42.98
>> 8.00 0.76 98.82 0.43 0.33
>> dm-2 0.00 0.00 162.17 5836.66 32714.46 47040.87
>> 13.30 5.57 0.90 0.06 36.00
>> sdb 0.40 4251.90 106.72 107.35 23123.61 35204.09
>> 272.46 4.43 20.68 1.29 27.64
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 14.64 10.75 1.81 13.50 0.00 59.29
>>
>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
>> avgrq-sz avgqu-sz await svctm %util
>> sda 15.40 1344.60 68.80 145.60 4964.80 11790.40
>> 78.15 0.38 1.80 0.80 17.10
>> dm-0 0.00 0.00 43.00 1186.20 2292.80 9489.60
>> 9.59 4.88 3.90 0.09 11.58
>> dm-1 0.00 0.00 1.60 0.00 12.80 0.00
>> 8.00 0.03 16.00 2.00 0.32
>> dm-2 0.00 0.00 197.20 17583.80 35152.00 140664.00
>> 9.89 2847.50 109.52 0.05 93.50
>> sdb 13.20 16552.20 159.00 742.20 32745.60 129129.60
>> 179.62 72.88 66.01 1.04 93.42
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 15.51 19.77 1.97 5.02 0.00 57.73
>>
>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
>> avgrq-sz avgqu-sz await svctm %util
>> sda 16.20 523.40 60.00 285.00 5220.80 5913.60
>> 32.27 0.25 0.72 0.60 20.86
>> dm-0 0.00 0.00 0.80 1.40 32.00 11.20
>> 19.64 0.01 3.18 1.55 0.34
>> dm-1 0.00 0.00 1.60 0.00 12.80 0.00
>> 8.00 0.03 21.00 2.62 0.42
>> dm-2 0.00 0.00 339.40 5886.80 66219.20 47092.80
>> 18.20 251.66 184.72 0.10 63.48
>> sdb 1.00 5025.40 264.20 209.20 60992.00 50422.40
>> 235.35 5.98 40.92 1.23 58.28
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 16.59 16.34 2.03 9.01 0.00 56.04
>>
>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
>> avgrq-sz avgqu-sz await svctm %util
>> sda 5.40 320.00 37.40 159.80 2483.20 3529.60
>> 30.49 0.10 0.52 0.39 7.76
>> dm-0 0.00 0.00 0.20 3.60 1.60 28.80
>> 8.00 0.00 0.68 0.68 0.26
>> dm-1 0.00 0.00 0.00 0.00 0.00 0.00
>> 0.00 0.00 0.00 0.00 0.00
>> dm-2 0.00 0.00 287.20 13108.20 53985.60 104864.00
>> 11.86 869.18 48.82 0.06 76.96
>> sdb 5.20 12163.40 238.20 532.00 51235.20 93753.60
>> 188.25 21.46 23.75 0.97 75.08
>>
>>
>>
>> On Tue, Aug 5, 2014 at 1:55 PM, Mark Reddy <ma...@boxever.com>
>> wrote:
>>
>>> Hi Ruchir,
>>>
>>> With the large number of blocked flushes and the number of pending
>>> compactions would still indicate IO contention. Can you post the output of
>>> 'iostat -x 5 5'
>>>
>>> If you do in fact have spare IO, there are several configuration options
>>> you can tune such as increasing the number of flush writers and
>>> compaction_throughput_mb_per_sec
>>>
>>> Mark
>>>
>>>
>>> On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>>
>>>> Also Mark to your comment on my tpstats output, below is my iostat
>>>> output, and the iowait is at 4.59%, which means no IO pressure, but we are
>>>> still seeing the bad flush performance. Should we try increasing the flush
>>>> writers?
>>>>
>>>>
>>>> Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp) 08/05/2014
>>>> _x86_64_ (24 CPU)
>>>>
>>>> avg-cpu: %user %nice %system %iowait %steal %idle
>>>> 5.80 10.25 0.65 4.59 0.00 78.72
>>>>
>>>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>>>> sda 103.83 9630.62 11982.60 3231174328 4020290310
>>>> dm-0 13.57 160.17 81.12 53739546 27217432
>>>> dm-1 7.59 16.94 43.77 5682200 14686784
>>>> dm-2 5792.76 32242.66 45427.12 10817753530 15241278360
>>>> sdb 206.09 22789.19 33569.27 7646015080 11262843224
>>>>
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> nodetool status:
>>>>>
>>>>> Datacenter: datacenter1
>>>>> =======================
>>>>> Status=Up/Down
>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>> -- Address Load Tokens Owns (effective) Host ID
>>>>> Rack
>>>>> UN 10.10.20.27 1.89 TB 256 25.4%
>>>>> 76023cdd-c42d-4068-8b53-ae94584b8b04 rack1
>>>>> UN 10.10.20.62 1.83 TB 256 25.5%
>>>>> 84b47313-da75-4519-94f3-3951d554a3e5 rack1
>>>>> UN 10.10.20.47 1.87 TB 256 24.7%
>>>>> bcd51a92-3150-41ae-9c51-104ea154f6fa rack1
>>>>> UN 10.10.20.45 1.7 TB 256 22.6%
>>>>> 8d6bce33-8179-4660-8443-2cf822074ca4 rack1
>>>>> UN 10.10.20.15 1.86 TB 256 24.5%
>>>>> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee rack1
>>>>> UN 10.10.20.31 1.87 TB 256 24.9%
>>>>> 1435acf9-c64d-4bcd-b6a4-abcec209815e rack1
>>>>> UN 10.10.20.35 1.86 TB 256 25.8%
>>>>> 17cb8772-2444-46ff-8525-33746514727d rack1
>>>>> UN 10.10.20.51 1.89 TB 256 25.0%
>>>>> 0343cd58-3686-465f-8280-56fb72d161e2 rack1
>>>>> UN 10.10.20.19 1.91 TB 256 25.5%
>>>>> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb rack1
>>>>> UN 10.10.20.39 1.93 TB 256 26.0%
>>>>> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9 rack1
>>>>> UN 10.10.20.52 1.81 TB 256 25.4%
>>>>> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e rack1
>>>>> UN 10.10.20.22 1.89 TB 256 24.8%
>>>>> 46af9664-8975-4c91-847f-3f7b8f8d5ce2 rack1
>>>>>
>>>>>
>>>>> Note: The new node is not part of the above list.
>>>>>
>>>>> nodetool compactionstats:
>>>>>
>>>>> pending tasks: 1649
>>>>> compaction type keyspace column family
>>>>> completed total unit progress
>>>>> Compaction iprod customerorder
>>>>> 1682804084 17956558077 bytes 9.37%
>>>>> Compaction prodgatecustomerorder
>>>>> 1664239271 1693502275 bytes 98.27%
>>>>> Compaction qa_config_bkupfixsessionconfig_hist
>>>>> 2443 27253 bytes 8.96%
>>>>> Compaction prodgatecustomerorder_hist
>>>>> 1770577280 5026699390 bytes 35.22%
>>>>> Compaction iprodgatecustomerorder_hist
>>>>> 2959560205 312350192622 bytes 0.95%
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <ma...@boxever.com>
>>>>> wrote:
>>>>>
>>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>>> including the new one.
>>>>>>
>>>>>>
>>>>>> Ok so you have num_tokens set to 256 for all nodes with initial_token
>>>>>> commented out, this means you are using vnodes and the new node will
>>>>>> automatically grab a list of tokens to take over responsibility for.
>>>>>>
>>>>>> Pool Name Active Pending Completed
>>>>>>> Blocked All time blocked
>>>>>>> FlushWriter 0 0 1136
>>>>>>> 0 512
>>>>>>>
>>>>>>> Looks like about 50% of flushes are blocked.
>>>>>>>
>>>>>>
>>>>>> This is a problem as it indicates that the IO system cannot keep up.
>>>>>>
>>>>>> Just ran this on the new node:
>>>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>>>> 10
>>>>>>
>>>>>>
>>>>>> This is normal as the new node will most likely take tokens from all
>>>>>> nodes in the cluster.
>>>>>>
>>>>>> Sorry for the multiple updates, but another thing I found was all the
>>>>>>> other existing nodes have themselves in the seeds list, but the new node
>>>>>>> does not have itself in the seeds list. Can that cause this issue?
>>>>>>
>>>>>>
>>>>>> Seeds are only used when a new node is bootstrapping into the cluster
>>>>>> and needs a set of ips to contact and discover the cluster, so this would
>>>>>> have no impact on data sizes or streaming. In general it would be
>>>>>> considered best practice to have a set of 2-3 seeds from each data center,
>>>>>> with all nodes having the same seed list.
>>>>>>
>>>>>>
>>>>>> What is the current output of 'nodetool compactionstats'? Could you
>>>>>> also paste the output of nodetool status <keyspace>?
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Sorry for the multiple updates, but another thing I found was all
>>>>>>> the other existing nodes have themselves in the seeds list, but the new
>>>>>>> node does not have itself in the seeds list. Can that cause this issue?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Just ran this on the new node:
>>>>>>>>
>>>>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>>>>> 10
>>>>>>>>
>>>>>>>> Seems like the new node is receiving data from 10 other nodes. Is
>>>>>>>> that expected in a vnodes enabled environment?
>>>>>>>>
>>>>>>>> Ruchir.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Also not sure if this is relevant but just noticed the nodetool
>>>>>>>>> tpstats output:
>>>>>>>>>
>>>>>>>>> Pool Name Active Pending Completed
>>>>>>>>> Blocked All time blocked
>>>>>>>>> FlushWriter 0 0 1136
>>>>>>>>> 0 512
>>>>>>>>>
>>>>>>>>> Looks like about 50% of flushes are blocked.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>>>>>> including the new one.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <
>>>>>>>>>> mark.reddy@boxever.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> My understanding was that if initial_token is left empty on the
>>>>>>>>>>>> new node, it just contacts the heaviest node and bisects its token range.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> If you are using vnodes and you have num_tokens set to 256 the
>>>>>>>>>>> new node will take token ranges dynamically. What is the configuration of
>>>>>>>>>>> your other nodes, are you setting num_tokens or initial_token on those?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Mark
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ruchir.jha@gmail.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Patricia for your response!
>>>>>>>>>>>>
>>>>>>>>>>>> On the new node, I just see a lot of the following:
>>>>>>>>>>>>
>>>>>>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java
>>>>>>>>>>>> (line 400) Writing Memtable
>>>>>>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>>>>>>>
>>>>>>>>>>>> so basically it is just busy flushing, and compacting. Would
>>>>>>>>>>>> you have any ideas on why the 2x disk space blow up. My understanding was
>>>>>>>>>>>> that if initial_token is left empty on the new node, it just contacts the
>>>>>>>>>>>> heaviest node and bisects its token range. And the heaviest node is around
>>>>>>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because
>>>>>>>>>>>> compaction is falling behind?
>>>>>>>>>>>>
>>>>>>>>>>>> Ruchir
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>>>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Ruchir,
>>>>>>>>>>>>>
>>>>>>>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>>>>>>>> compactions on the new bootstrapping node?
>>>>>>>>>>>>>
>>>>>>>>>>>>> With respect to the seed list, it is generally advisable to
>>>>>>>>>>>>> use 3 seed nodes per AZ / DC.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <
>>>>>>>>>>>>> ruchir.jha@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node
>>>>>>>>>>>>>> cluster where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One thing that I noticed that seemed off was the seeds
>>>>>>>>>>>>>> property in the yaml of the 13th node comprises of 1..12. Where as the
>>>>>>>>>>>>>> seeds property on the existing 12 nodes consists of all the other nodes
>>>>>>>>>>>>>> except the thirteenth node. Is this an issue?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any other insight is appreciated?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ruchir.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Patricia Gorla
>>>>>>>>>>>>> @patriciagorla
>>>>>>>>>>>>>
>>>>>>>>>>>>> Consultant
>>>>>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Node bootstrap
Posted by Ruchir Jha <ru...@gmail.com>.
Also, right now the "top" command shows that we are at 500-700% CPU, and we
have 23 total processors, which means we have a lot of idle CPU left over,
so throwing more threads at compaction and flush should alleviate the
problem?
On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com> wrote:
>
> Right now, we have 6 flush writers and compaction_throughput_mb_per_sec is
> set to 0, which I believe disables throttling.
>
> Also, Here is the iostat -x 5 5 output:
>
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 10.00 1450.35 50.79 55.92 9775.97 12030.14 204.34
> 1.56 14.62 1.05 11.21
> dm-0 0.00 0.00 3.59 18.82 166.52 150.35 14.14
> 0.44 19.49 0.54 1.22
> dm-1 0.00 0.00 2.32 5.37 18.56 42.98 8.00
> 0.76 98.82 0.43 0.33
> dm-2 0.00 0.00 162.17 5836.66 32714.46 47040.87 13.30
> 5.57 0.90 0.06 36.00
> sdb 0.40 4251.90 106.72 107.35 23123.61 35204.09 272.46
> 4.43 20.68 1.29 27.64
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 14.64 10.75 1.81 13.50 0.00 59.29
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 15.40 1344.60 68.80 145.60 4964.80 11790.40 78.15
> 0.38 1.80 0.80 17.10
> dm-0 0.00 0.00 43.00 1186.20 2292.80 9489.60 9.59
> 4.88 3.90 0.09 11.58
> dm-1 0.00 0.00 1.60 0.00 12.80 0.00 8.00
> 0.03 16.00 2.00 0.32
> dm-2 0.00 0.00 197.20 17583.80 35152.00 140664.00
> 9.89 2847.50 109.52 0.05 93.50
> sdb 13.20 16552.20 159.00 742.20 32745.60 129129.60
> 179.62 72.88 66.01 1.04 93.42
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 15.51 19.77 1.97 5.02 0.00 57.73
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 16.20 523.40 60.00 285.00 5220.80 5913.60 32.27
> 0.25 0.72 0.60 20.86
> dm-0 0.00 0.00 0.80 1.40 32.00 11.20 19.64
> 0.01 3.18 1.55 0.34
> dm-1 0.00 0.00 1.60 0.00 12.80 0.00 8.00
> 0.03 21.00 2.62 0.42
> dm-2 0.00 0.00 339.40 5886.80 66219.20 47092.80 18.20
> 251.66 184.72 0.10 63.48
> sdb 1.00 5025.40 264.20 209.20 60992.00 50422.40 235.35
> 5.98 40.92 1.23 58.28
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 16.59 16.34 2.03 9.01 0.00 56.04
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> sda 5.40 320.00 37.40 159.80 2483.20 3529.60 30.49
> 0.10 0.52 0.39 7.76
> dm-0 0.00 0.00 0.20 3.60 1.60 28.80 8.00
> 0.00 0.68 0.68 0.26
> dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00
> dm-2 0.00 0.00 287.20 13108.20 53985.60 104864.00
> 11.86 869.18 48.82 0.06 76.96
> sdb 5.20 12163.40 238.20 532.00 51235.20 93753.60 188.25
> 21.46 23.75 0.97 75.08
>
>
>
> On Tue, Aug 5, 2014 at 1:55 PM, Mark Reddy <ma...@boxever.com> wrote:
>
>> Hi Ruchir,
>>
>> With the large number of blocked flushes and the number of pending
>> compactions would still indicate IO contention. Can you post the output of
>> 'iostat -x 5 5'
>>
>> If you do in fact have spare IO, there are several configuration options
>> you can tune such as increasing the number of flush writers and
>> compaction_throughput_mb_per_sec
>>
>> Mark
>>
>>
>> On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> Also Mark to your comment on my tpstats output, below is my iostat
>>> output, and the iowait is at 4.59%, which means no IO pressure, but we are
>>> still seeing the bad flush performance. Should we try increasing the flush
>>> writers?
>>>
>>>
>>> Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp) 08/05/2014
>>> _x86_64_ (24 CPU)
>>>
>>> avg-cpu: %user %nice %system %iowait %steal %idle
>>> 5.80 10.25 0.65 4.59 0.00 78.72
>>>
>>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>>> sda 103.83 9630.62 11982.60 3231174328 4020290310
>>> dm-0 13.57 160.17 81.12 53739546 27217432
>>> dm-1 7.59 16.94 43.77 5682200 14686784
>>> dm-2 5792.76 32242.66 45427.12 10817753530 15241278360
>>> sdb 206.09 22789.19 33569.27 7646015080 11262843224
>>>
>>>
>>>
>>> On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha <ru...@gmail.com>
>>> wrote:
>>>
>>>> nodetool status:
>>>>
>>>> Datacenter: datacenter1
>>>> =======================
>>>> Status=Up/Down
>>>> |/ State=Normal/Leaving/Joining/Moving
>>>> -- Address Load Tokens Owns (effective) Host ID
>>>> Rack
>>>> UN 10.10.20.27 1.89 TB 256 25.4%
>>>> 76023cdd-c42d-4068-8b53-ae94584b8b04 rack1
>>>> UN 10.10.20.62 1.83 TB 256 25.5%
>>>> 84b47313-da75-4519-94f3-3951d554a3e5 rack1
>>>> UN 10.10.20.47 1.87 TB 256 24.7%
>>>> bcd51a92-3150-41ae-9c51-104ea154f6fa rack1
>>>> UN 10.10.20.45 1.7 TB 256 22.6%
>>>> 8d6bce33-8179-4660-8443-2cf822074ca4 rack1
>>>> UN 10.10.20.15 1.86 TB 256 24.5%
>>>> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee rack1
>>>> UN 10.10.20.31 1.87 TB 256 24.9%
>>>> 1435acf9-c64d-4bcd-b6a4-abcec209815e rack1
>>>> UN 10.10.20.35 1.86 TB 256 25.8%
>>>> 17cb8772-2444-46ff-8525-33746514727d rack1
>>>> UN 10.10.20.51 1.89 TB 256 25.0%
>>>> 0343cd58-3686-465f-8280-56fb72d161e2 rack1
>>>> UN 10.10.20.19 1.91 TB 256 25.5%
>>>> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb rack1
>>>> UN 10.10.20.39 1.93 TB 256 26.0%
>>>> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9 rack1
>>>> UN 10.10.20.52 1.81 TB 256 25.4%
>>>> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e rack1
>>>> UN 10.10.20.22 1.89 TB 256 24.8%
>>>> 46af9664-8975-4c91-847f-3f7b8f8d5ce2 rack1
>>>>
>>>>
>>>> Note: The new node is not part of the above list.
>>>>
>>>> nodetool compactionstats:
>>>>
>>>> pending tasks: 1649
>>>> compaction type keyspace column family
>>>> completed total unit progress
>>>> Compaction iprod customerorder
>>>> 1682804084 17956558077 bytes 9.37%
>>>> Compaction prodgatecustomerorder
>>>> 1664239271 1693502275 bytes 98.27%
>>>> Compaction qa_config_bkupfixsessionconfig_hist
>>>> 2443 27253 bytes 8.96%
>>>> Compaction prodgatecustomerorder_hist
>>>> 1770577280 5026699390 bytes 35.22%
>>>> Compaction iprodgatecustomerorder_hist
>>>> 2959560205 312350192622 bytes 0.95%
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <ma...@boxever.com>
>>>> wrote:
>>>>
>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>> including the new one.
>>>>>
>>>>>
>>>>> Ok so you have num_tokens set to 256 for all nodes with initial_token
>>>>> commented out, this means you are using vnodes and the new node will
>>>>> automatically grab a list of tokens to take over responsibility for.
>>>>>
>>>>> Pool Name Active Pending Completed Blocked
>>>>>> All time blocked
>>>>>> FlushWriter 0 0 1136
>>>>>> 0 512
>>>>>>
>>>>>> Looks like about 50% of flushes are blocked.
>>>>>>
>>>>>
>>>>> This is a problem as it indicates that the IO system cannot keep up.
>>>>>
>>>>> Just ran this on the new node:
>>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>>> 10
>>>>>
>>>>>
>>>>> This is normal as the new node will most likely take tokens from all
>>>>> nodes in the cluster.
>>>>>
>>>>> Sorry for the multiple updates, but another thing I found was all the
>>>>>> other existing nodes have themselves in the seeds list, but the new node
>>>>>> does not have itself in the seeds list. Can that cause this issue?
>>>>>
>>>>>
>>>>> Seeds are only used when a new node is bootstrapping into the cluster
>>>>> and needs a set of ips to contact and discover the cluster, so this would
>>>>> have no impact on data sizes or streaming. In general it would be
>>>>> considered best practice to have a set of 2-3 seeds from each data center,
>>>>> with all nodes having the same seed list.
>>>>>
>>>>>
>>>>> What is the current output of 'nodetool compactionstats'? Could you
>>>>> also paste the output of nodetool status <keyspace>?
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Sorry for the multiple updates, but another thing I found was all the
>>>>>> other existing nodes have themselves in the seeds list, but the new node
>>>>>> does not have itself in the seeds list. Can that cause this issue?
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Just ran this on the new node:
>>>>>>>
>>>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>>>> 10
>>>>>>>
>>>>>>> Seems like the new node is receiving data from 10 other nodes. Is
>>>>>>> that expected in a vnodes enabled environment?
>>>>>>>
>>>>>>> Ruchir.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Also not sure if this is relevant but just noticed the nodetool
>>>>>>>> tpstats output:
>>>>>>>>
>>>>>>>> Pool Name Active Pending Completed
>>>>>>>> Blocked All time blocked
>>>>>>>> FlushWriter 0 0 1136
>>>>>>>> 0 512
>>>>>>>>
>>>>>>>> Looks like about 50% of flushes are blocked.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>>>>> including the new one.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <
>>>>>>>>> mark.reddy@boxever.com> wrote:
>>>>>>>>>
>>>>>>>>>> My understanding was that if initial_token is left empty on the
>>>>>>>>>>> new node, it just contacts the heaviest node and bisects its token range.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If you are using vnodes and you have num_tokens set to 256 the
>>>>>>>>>> new node will take token ranges dynamically. What is the configuration of
>>>>>>>>>> your other nodes, are you setting num_tokens or initial_token on those?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Mark
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Patricia for your response!
>>>>>>>>>>>
>>>>>>>>>>> On the new node, I just see a lot of the following:
>>>>>>>>>>>
>>>>>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java
>>>>>>>>>>> (line 400) Writing Memtable
>>>>>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>>>>>>
>>>>>>>>>>> so basically it is just busy flushing, and compacting. Would you
>>>>>>>>>>> have any ideas on why the 2x disk space blow up. My understanding was that
>>>>>>>>>>> if initial_token is left empty on the new node, it just contacts the
>>>>>>>>>>> heaviest node and bisects its token range. And the heaviest node is around
>>>>>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because
>>>>>>>>>>> compaction is falling behind?
>>>>>>>>>>>
>>>>>>>>>>> Ruchir
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Ruchir,
>>>>>>>>>>>>
>>>>>>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>>>>>>> compactions on the new bootstrapping node?
>>>>>>>>>>>>
>>>>>>>>>>>> With respect to the seed list, it is generally advisable to use
>>>>>>>>>>>> 3 seed nodes per AZ / DC.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <
>>>>>>>>>>>> ruchir.jha@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node
>>>>>>>>>>>>> cluster where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>>>>>>
>>>>>>>>>>>>> One thing that I noticed that seemed off was the seeds
>>>>>>>>>>>>> property in the yaml of the 13th node comprises of 1..12. Where as the
>>>>>>>>>>>>> seeds property on the existing 12 nodes consists of all the other nodes
>>>>>>>>>>>>> except the thirteenth node. Is this an issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any other insight is appreciated?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ruchir.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Patricia Gorla
>>>>>>>>>>>> @patriciagorla
>>>>>>>>>>>>
>>>>>>>>>>>> Consultant
>>>>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Node bootstrap
Posted by Ruchir Jha <ru...@gmail.com>.
Right now, we have 6 flush writers and compaction_throughput_mb_per_sec is
set to 0, which I believe disables throttling.
Also, Here is the iostat -x 5 5 output:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda 10.00 1450.35 50.79 55.92 9775.97 12030.14 204.34
1.56 14.62 1.05 11.21
dm-0 0.00 0.00 3.59 18.82 166.52 150.35 14.14
0.44 19.49 0.54 1.22
dm-1 0.00 0.00 2.32 5.37 18.56 42.98 8.00
0.76 98.82 0.43 0.33
dm-2 0.00 0.00 162.17 5836.66 32714.46 47040.87 13.30
5.57 0.90 0.06 36.00
sdb 0.40 4251.90 106.72 107.35 23123.61 35204.09 272.46
4.43 20.68 1.29 27.64
avg-cpu: %user %nice %system %iowait %steal %idle
14.64 10.75 1.81 13.50 0.00 59.29
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda 15.40 1344.60 68.80 145.60 4964.80 11790.40 78.15
0.38 1.80 0.80 17.10
dm-0 0.00 0.00 43.00 1186.20 2292.80 9489.60 9.59
4.88 3.90 0.09 11.58
dm-1 0.00 0.00 1.60 0.00 12.80 0.00 8.00
0.03 16.00 2.00 0.32
dm-2 0.00 0.00 197.20 17583.80 35152.00 140664.00
9.89 2847.50 109.52 0.05 93.50
sdb 13.20 16552.20 159.00 742.20 32745.60 129129.60 179.62
72.88 66.01 1.04 93.42
avg-cpu: %user %nice %system %iowait %steal %idle
15.51 19.77 1.97 5.02 0.00 57.73
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda 16.20 523.40 60.00 285.00 5220.80 5913.60 32.27
0.25 0.72 0.60 20.86
dm-0 0.00 0.00 0.80 1.40 32.00 11.20 19.64
0.01 3.18 1.55 0.34
dm-1 0.00 0.00 1.60 0.00 12.80 0.00 8.00
0.03 21.00 2.62 0.42
dm-2 0.00 0.00 339.40 5886.80 66219.20 47092.80 18.20
251.66 184.72 0.10 63.48
sdb 1.00 5025.40 264.20 209.20 60992.00 50422.40 235.35
5.98 40.92 1.23 58.28
avg-cpu: %user %nice %system %iowait %steal %idle
16.59 16.34 2.03 9.01 0.00 56.04
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sda 5.40 320.00 37.40 159.80 2483.20 3529.60 30.49
0.10 0.52 0.39 7.76
dm-0 0.00 0.00 0.20 3.60 1.60 28.80 8.00
0.00 0.68 0.68 0.26
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
dm-2 0.00 0.00 287.20 13108.20 53985.60 104864.00
11.86 869.18 48.82 0.06 76.96
sdb 5.20 12163.40 238.20 532.00 51235.20 93753.60 188.25
21.46 23.75 0.97 75.08
On Tue, Aug 5, 2014 at 1:55 PM, Mark Reddy <ma...@boxever.com> wrote:
> Hi Ruchir,
>
> With the large number of blocked flushes and the number of pending
> compactions would still indicate IO contention. Can you post the output of
> 'iostat -x 5 5'
>
> If you do in fact have spare IO, there are several configuration options
> you can tune such as increasing the number of flush writers and
> compaction_throughput_mb_per_sec
>
> Mark
>
>
> On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> Also Mark to your comment on my tpstats output, below is my iostat
>> output, and the iowait is at 4.59%, which means no IO pressure, but we are
>> still seeing the bad flush performance. Should we try increasing the flush
>> writers?
>>
>>
>> Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp) 08/05/2014
>> _x86_64_ (24 CPU)
>>
>> avg-cpu: %user %nice %system %iowait %steal %idle
>> 5.80 10.25 0.65 4.59 0.00 78.72
>>
>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>> sda 103.83 9630.62 11982.60 3231174328 4020290310
>> dm-0 13.57 160.17 81.12 53739546 27217432
>> dm-1 7.59 16.94 43.77 5682200 14686784
>> dm-2 5792.76 32242.66 45427.12 10817753530 15241278360
>> sdb 206.09 22789.19 33569.27 7646015080 11262843224
>>
>>
>>
>> On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> nodetool status:
>>>
>>> Datacenter: datacenter1
>>> =======================
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> -- Address Load Tokens Owns (effective) Host ID
>>> Rack
>>> UN 10.10.20.27 1.89 TB 256 25.4%
>>> 76023cdd-c42d-4068-8b53-ae94584b8b04 rack1
>>> UN 10.10.20.62 1.83 TB 256 25.5%
>>> 84b47313-da75-4519-94f3-3951d554a3e5 rack1
>>> UN 10.10.20.47 1.87 TB 256 24.7%
>>> bcd51a92-3150-41ae-9c51-104ea154f6fa rack1
>>> UN 10.10.20.45 1.7 TB 256 22.6%
>>> 8d6bce33-8179-4660-8443-2cf822074ca4 rack1
>>> UN 10.10.20.15 1.86 TB 256 24.5%
>>> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee rack1
>>> UN 10.10.20.31 1.87 TB 256 24.9%
>>> 1435acf9-c64d-4bcd-b6a4-abcec209815e rack1
>>> UN 10.10.20.35 1.86 TB 256 25.8%
>>> 17cb8772-2444-46ff-8525-33746514727d rack1
>>> UN 10.10.20.51 1.89 TB 256 25.0%
>>> 0343cd58-3686-465f-8280-56fb72d161e2 rack1
>>> UN 10.10.20.19 1.91 TB 256 25.5%
>>> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb rack1
>>> UN 10.10.20.39 1.93 TB 256 26.0%
>>> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9 rack1
>>> UN 10.10.20.52 1.81 TB 256 25.4%
>>> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e rack1
>>> UN 10.10.20.22 1.89 TB 256 24.8%
>>> 46af9664-8975-4c91-847f-3f7b8f8d5ce2 rack1
>>>
>>>
>>> Note: The new node is not part of the above list.
>>>
>>> nodetool compactionstats:
>>>
>>> pending tasks: 1649
>>> compaction type keyspace column family
>>> completed total unit progress
>>> Compaction iprod customerorder
>>> 1682804084 17956558077 bytes 9.37%
>>> Compaction prodgatecustomerorder
>>> 1664239271 1693502275 bytes 98.27%
>>> Compaction qa_config_bkupfixsessionconfig_hist
>>> 2443 27253 bytes 8.96%
>>> Compaction prodgatecustomerorder_hist
>>> 1770577280 5026699390 bytes 35.22%
>>> Compaction iprodgatecustomerorder_hist
>>> 2959560205 312350192622 bytes 0.95%
>>>
>>>
>>>
>>>
>>> On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <ma...@boxever.com>
>>> wrote:
>>>
>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>> including the new one.
>>>>
>>>>
>>>> Ok so you have num_tokens set to 256 for all nodes with initial_token
>>>> commented out, this means you are using vnodes and the new node will
>>>> automatically grab a list of tokens to take over responsibility for.
>>>>
>>>> Pool Name Active Pending Completed Blocked
>>>>> All time blocked
>>>>> FlushWriter 0 0 1136 0
>>>>> 512
>>>>>
>>>>> Looks like about 50% of flushes are blocked.
>>>>>
>>>>
>>>> This is a problem as it indicates that the IO system cannot keep up.
>>>>
>>>> Just ran this on the new node:
>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>> 10
>>>>
>>>>
>>>> This is normal as the new node will most likely take tokens from all
>>>> nodes in the cluster.
>>>>
>>>> Sorry for the multiple updates, but another thing I found was all the
>>>>> other existing nodes have themselves in the seeds list, but the new node
>>>>> does not have itself in the seeds list. Can that cause this issue?
>>>>
>>>>
>>>> Seeds are only used when a new node is bootstrapping into the cluster
>>>> and needs a set of ips to contact and discover the cluster, so this would
>>>> have no impact on data sizes or streaming. In general it would be
>>>> considered best practice to have a set of 2-3 seeds from each data center,
>>>> with all nodes having the same seed list.
>>>>
>>>>
>>>> What is the current output of 'nodetool compactionstats'? Could you
>>>> also paste the output of nodetool status <keyspace>?
>>>>
>>>> Mark
>>>>
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> Sorry for the multiple updates, but another thing I found was all the
>>>>> other existing nodes have themselves in the seeds list, but the new node
>>>>> does not have itself in the seeds list. Can that cause this issue?
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Just ran this on the new node:
>>>>>>
>>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>>> 10
>>>>>>
>>>>>> Seems like the new node is receiving data from 10 other nodes. Is
>>>>>> that expected in a vnodes enabled environment?
>>>>>>
>>>>>> Ruchir.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Also not sure if this is relevant but just noticed the nodetool
>>>>>>> tpstats output:
>>>>>>>
>>>>>>> Pool Name Active Pending Completed
>>>>>>> Blocked All time blocked
>>>>>>> FlushWriter 0 0 1136
>>>>>>> 0 512
>>>>>>>
>>>>>>> Looks like about 50% of flushes are blocked.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>>>> including the new one.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <mark.reddy@boxever.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> My understanding was that if initial_token is left empty on the
>>>>>>>>>> new node, it just contacts the heaviest node and bisects its token range.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If you are using vnodes and you have num_tokens set to 256 the new
>>>>>>>>> node will take token ranges dynamically. What is the configuration of your
>>>>>>>>> other nodes, are you setting num_tokens or initial_token on those?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Patricia for your response!
>>>>>>>>>>
>>>>>>>>>> On the new node, I just see a lot of the following:
>>>>>>>>>>
>>>>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line
>>>>>>>>>> 400) Writing Memtable
>>>>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>>>>>
>>>>>>>>>> so basically it is just busy flushing, and compacting. Would you
>>>>>>>>>> have any ideas on why the 2x disk space blow up. My understanding was that
>>>>>>>>>> if initial_token is left empty on the new node, it just contacts the
>>>>>>>>>> heaviest node and bisects its token range. And the heaviest node is around
>>>>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because
>>>>>>>>>> compaction is falling behind?
>>>>>>>>>>
>>>>>>>>>> Ruchir
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Ruchir,
>>>>>>>>>>>
>>>>>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>>>>>> compactions on the new bootstrapping node?
>>>>>>>>>>>
>>>>>>>>>>> With respect to the seed list, it is generally advisable to use
>>>>>>>>>>> 3 seed nodes per AZ / DC.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <
>>>>>>>>>>> ruchir.jha@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node
>>>>>>>>>>>> cluster where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>>>>>
>>>>>>>>>>>> One thing that I noticed that seemed off was the seeds property
>>>>>>>>>>>> in the yaml of the 13th node comprises of 1..12. Where as the seeds
>>>>>>>>>>>> property on the existing 12 nodes consists of all the other nodes except
>>>>>>>>>>>> the thirteenth node. Is this an issue?
>>>>>>>>>>>>
>>>>>>>>>>>> Any other insight is appreciated?
>>>>>>>>>>>>
>>>>>>>>>>>> Ruchir.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Patricia Gorla
>>>>>>>>>>> @patriciagorla
>>>>>>>>>>>
>>>>>>>>>>> Consultant
>>>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Node bootstrap
Posted by Mark Reddy <ma...@boxever.com>.
Hi Ruchir,
With the large number of blocked flushes and the number of pending
compactions would still indicate IO contention. Can you post the output of
'iostat -x 5 5'
If you do in fact have spare IO, there are several configuration options
you can tune such as increasing the number of flush writers and
compaction_throughput_mb_per_sec
Mark
On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha <ru...@gmail.com> wrote:
> Also Mark to your comment on my tpstats output, below is my iostat output,
> and the iowait is at 4.59%, which means no IO pressure, but we are still
> seeing the bad flush performance. Should we try increasing the flush
> writers?
>
>
> Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp) 08/05/2014
> _x86_64_ (24 CPU)
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 5.80 10.25 0.65 4.59 0.00 78.72
>
> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
> sda 103.83 9630.62 11982.60 3231174328 4020290310
> dm-0 13.57 160.17 81.12 53739546 27217432
> dm-1 7.59 16.94 43.77 5682200 14686784
> dm-2 5792.76 32242.66 45427.12 10817753530 15241278360
> sdb 206.09 22789.19 33569.27 7646015080 11262843224
>
>
>
> On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> nodetool status:
>>
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> -- Address Load Tokens Owns (effective) Host ID
>> Rack
>> UN 10.10.20.27 1.89 TB 256 25.4%
>> 76023cdd-c42d-4068-8b53-ae94584b8b04 rack1
>> UN 10.10.20.62 1.83 TB 256 25.5%
>> 84b47313-da75-4519-94f3-3951d554a3e5 rack1
>> UN 10.10.20.47 1.87 TB 256 24.7%
>> bcd51a92-3150-41ae-9c51-104ea154f6fa rack1
>> UN 10.10.20.45 1.7 TB 256 22.6%
>> 8d6bce33-8179-4660-8443-2cf822074ca4 rack1
>> UN 10.10.20.15 1.86 TB 256 24.5%
>> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee rack1
>> UN 10.10.20.31 1.87 TB 256 24.9%
>> 1435acf9-c64d-4bcd-b6a4-abcec209815e rack1
>> UN 10.10.20.35 1.86 TB 256 25.8%
>> 17cb8772-2444-46ff-8525-33746514727d rack1
>> UN 10.10.20.51 1.89 TB 256 25.0%
>> 0343cd58-3686-465f-8280-56fb72d161e2 rack1
>> UN 10.10.20.19 1.91 TB 256 25.5%
>> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb rack1
>> UN 10.10.20.39 1.93 TB 256 26.0%
>> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9 rack1
>> UN 10.10.20.52 1.81 TB 256 25.4%
>> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e rack1
>> UN 10.10.20.22 1.89 TB 256 24.8%
>> 46af9664-8975-4c91-847f-3f7b8f8d5ce2 rack1
>>
>>
>> Note: The new node is not part of the above list.
>>
>> nodetool compactionstats:
>>
>> pending tasks: 1649
>> compaction type keyspace column family completed
>> total unit progress
>> Compaction iprod customerorder 1682804084
>> 17956558077 bytes 9.37%
>> Compaction prodgatecustomerorder
>> 1664239271 1693502275 bytes 98.27%
>> Compaction qa_config_bkupfixsessionconfig_hist
>> 2443 27253 bytes 8.96%
>> Compaction prodgatecustomerorder_hist
>> 1770577280 5026699390 bytes 35.22%
>> Compaction iprodgatecustomerorder_hist
>> 2959560205 312350192622 bytes 0.95%
>>
>>
>>
>>
>> On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <ma...@boxever.com>
>> wrote:
>>
>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>> including the new one.
>>>
>>>
>>> Ok so you have num_tokens set to 256 for all nodes with initial_token
>>> commented out, this means you are using vnodes and the new node will
>>> automatically grab a list of tokens to take over responsibility for.
>>>
>>> Pool Name Active Pending Completed Blocked
>>>> All time blocked
>>>> FlushWriter 0 0 1136 0
>>>> 512
>>>>
>>>> Looks like about 50% of flushes are blocked.
>>>>
>>>
>>> This is a problem as it indicates that the IO system cannot keep up.
>>>
>>> Just ran this on the new node:
>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>> 10
>>>
>>>
>>> This is normal as the new node will most likely take tokens from all
>>> nodes in the cluster.
>>>
>>> Sorry for the multiple updates, but another thing I found was all the
>>>> other existing nodes have themselves in the seeds list, but the new node
>>>> does not have itself in the seeds list. Can that cause this issue?
>>>
>>>
>>> Seeds are only used when a new node is bootstrapping into the cluster
>>> and needs a set of ips to contact and discover the cluster, so this would
>>> have no impact on data sizes or streaming. In general it would be
>>> considered best practice to have a set of 2-3 seeds from each data center,
>>> with all nodes having the same seed list.
>>>
>>>
>>> What is the current output of 'nodetool compactionstats'? Could you also
>>> paste the output of nodetool status <keyspace>?
>>>
>>> Mark
>>>
>>>
>>>
>>> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>>
>>>> Sorry for the multiple updates, but another thing I found was all the
>>>> other existing nodes have themselves in the seeds list, but the new node
>>>> does not have itself in the seeds list. Can that cause this issue?
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> Just ran this on the new node:
>>>>>
>>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>>> 10
>>>>>
>>>>> Seems like the new node is receiving data from 10 other nodes. Is that
>>>>> expected in a vnodes enabled environment?
>>>>>
>>>>> Ruchir.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Also not sure if this is relevant but just noticed the nodetool
>>>>>> tpstats output:
>>>>>>
>>>>>> Pool Name Active Pending Completed
>>>>>> Blocked All time blocked
>>>>>> FlushWriter 0 0 1136
>>>>>> 0 512
>>>>>>
>>>>>> Looks like about 50% of flushes are blocked.
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>>> including the new one.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> My understanding was that if initial_token is left empty on the new
>>>>>>>>> node, it just contacts the heaviest node and bisects its token range.
>>>>>>>>
>>>>>>>>
>>>>>>>> If you are using vnodes and you have num_tokens set to 256 the new
>>>>>>>> node will take token ranges dynamically. What is the configuration of your
>>>>>>>> other nodes, are you setting num_tokens or initial_token on those?
>>>>>>>>
>>>>>>>>
>>>>>>>> Mark
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Patricia for your response!
>>>>>>>>>
>>>>>>>>> On the new node, I just see a lot of the following:
>>>>>>>>>
>>>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line
>>>>>>>>> 400) Writing Memtable
>>>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>>>>
>>>>>>>>> so basically it is just busy flushing, and compacting. Would you
>>>>>>>>> have any ideas on why the 2x disk space blow up. My understanding was that
>>>>>>>>> if initial_token is left empty on the new node, it just contacts the
>>>>>>>>> heaviest node and bisects its token range. And the heaviest node is around
>>>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because
>>>>>>>>> compaction is falling behind?
>>>>>>>>>
>>>>>>>>> Ruchir
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>>>>
>>>>>>>>>> Ruchir,
>>>>>>>>>>
>>>>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>>>>> compactions on the new bootstrapping node?
>>>>>>>>>>
>>>>>>>>>> With respect to the seed list, it is generally advisable to use 3
>>>>>>>>>> seed nodes per AZ / DC.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ruchir.jha@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node
>>>>>>>>>>> cluster where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>>>>
>>>>>>>>>>> One thing that I noticed that seemed off was the seeds property
>>>>>>>>>>> in the yaml of the 13th node comprises of 1..12. Where as the seeds
>>>>>>>>>>> property on the existing 12 nodes consists of all the other nodes except
>>>>>>>>>>> the thirteenth node. Is this an issue?
>>>>>>>>>>>
>>>>>>>>>>> Any other insight is appreciated?
>>>>>>>>>>>
>>>>>>>>>>> Ruchir.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Patricia Gorla
>>>>>>>>>> @patriciagorla
>>>>>>>>>>
>>>>>>>>>> Consultant
>>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Node bootstrap
Posted by Ruchir Jha <ru...@gmail.com>.
Also Mark to your comment on my tpstats output, below is my iostat output,
and the iowait is at 4.59%, which means no IO pressure, but we are still
seeing the bad flush performance. Should we try increasing the flush
writers?
Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp) 08/05/2014
_x86_64_ (24 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
5.80 10.25 0.65 4.59 0.00 78.72
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 103.83 9630.62 11982.60 3231174328 4020290310
dm-0 13.57 160.17 81.12 53739546 27217432
dm-1 7.59 16.94 43.77 5682200 14686784
dm-2 5792.76 32242.66 45427.12 10817753530 15241278360
sdb 206.09 22789.19 33569.27 7646015080 11262843224
On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha <ru...@gmail.com> wrote:
> nodetool status:
>
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 10.10.20.27 1.89 TB 256 25.4%
> 76023cdd-c42d-4068-8b53-ae94584b8b04 rack1
> UN 10.10.20.62 1.83 TB 256 25.5%
> 84b47313-da75-4519-94f3-3951d554a3e5 rack1
> UN 10.10.20.47 1.87 TB 256 24.7%
> bcd51a92-3150-41ae-9c51-104ea154f6fa rack1
> UN 10.10.20.45 1.7 TB 256 22.6%
> 8d6bce33-8179-4660-8443-2cf822074ca4 rack1
> UN 10.10.20.15 1.86 TB 256 24.5%
> 01a01f07-4df2-4c87-98e9-8dd38b3e4aee rack1
> UN 10.10.20.31 1.87 TB 256 24.9%
> 1435acf9-c64d-4bcd-b6a4-abcec209815e rack1
> UN 10.10.20.35 1.86 TB 256 25.8%
> 17cb8772-2444-46ff-8525-33746514727d rack1
> UN 10.10.20.51 1.89 TB 256 25.0%
> 0343cd58-3686-465f-8280-56fb72d161e2 rack1
> UN 10.10.20.19 1.91 TB 256 25.5%
> 30ddf003-4d59-4a3e-85fa-e94e4adba1cb rack1
> UN 10.10.20.39 1.93 TB 256 26.0%
> b7d44c26-4d75-4d36-a779-b7e7bdaecbc9 rack1
> UN 10.10.20.52 1.81 TB 256 25.4%
> 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e rack1
> UN 10.10.20.22 1.89 TB 256 24.8%
> 46af9664-8975-4c91-847f-3f7b8f8d5ce2 rack1
>
>
> Note: The new node is not part of the above list.
>
> nodetool compactionstats:
>
> pending tasks: 1649
> compaction type keyspace column family completed
> total unit progress
> Compaction iprod customerorder 1682804084
> 17956558077 bytes 9.37%
> Compaction prodgatecustomerorder 1664239271
> 1693502275 bytes 98.27%
> Compaction qa_config_bkupfixsessionconfig_hist
> 2443 27253 bytes 8.96%
> Compaction prodgatecustomerorder_hist
> 1770577280 5026699390 bytes 35.22%
> Compaction iprodgatecustomerorder_hist
> 2959560205 312350192622 bytes 0.95%
>
>
>
>
> On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <ma...@boxever.com>
> wrote:
>
>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>> including the new one.
>>
>>
>> Ok so you have num_tokens set to 256 for all nodes with initial_token
>> commented out, this means you are using vnodes and the new node will
>> automatically grab a list of tokens to take over responsibility for.
>>
>> Pool Name Active Pending Completed Blocked
>>> All time blocked
>>> FlushWriter 0 0 1136 0
>>> 512
>>>
>>> Looks like about 50% of flushes are blocked.
>>>
>>
>> This is a problem as it indicates that the IO system cannot keep up.
>>
>> Just ran this on the new node:
>>> nodetool netstats | grep "Streaming from" | wc -l
>>> 10
>>
>>
>> This is normal as the new node will most likely take tokens from all
>> nodes in the cluster.
>>
>> Sorry for the multiple updates, but another thing I found was all the
>>> other existing nodes have themselves in the seeds list, but the new node
>>> does not have itself in the seeds list. Can that cause this issue?
>>
>>
>> Seeds are only used when a new node is bootstrapping into the cluster and
>> needs a set of ips to contact and discover the cluster, so this would have
>> no impact on data sizes or streaming. In general it would be considered
>> best practice to have a set of 2-3 seeds from each data center, with all
>> nodes having the same seed list.
>>
>>
>> What is the current output of 'nodetool compactionstats'? Could you also
>> paste the output of nodetool status <keyspace>?
>>
>> Mark
>>
>>
>>
>> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> Sorry for the multiple updates, but another thing I found was all the
>>> other existing nodes have themselves in the seeds list, but the new node
>>> does not have itself in the seeds list. Can that cause this issue?
>>>
>>>
>>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com>
>>> wrote:
>>>
>>>> Just ran this on the new node:
>>>>
>>>> nodetool netstats | grep "Streaming from" | wc -l
>>>> 10
>>>>
>>>> Seems like the new node is receiving data from 10 other nodes. Is that
>>>> expected in a vnodes enabled environment?
>>>>
>>>> Ruchir.
>>>>
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> Also not sure if this is relevant but just noticed the nodetool
>>>>> tpstats output:
>>>>>
>>>>> Pool Name Active Pending Completed Blocked
>>>>> All time blocked
>>>>> FlushWriter 0 0 1136 0
>>>>> 512
>>>>>
>>>>> Looks like about 50% of flushes are blocked.
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>>> including the new one.
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
>>>>>> wrote:
>>>>>>
>>>>>>> My understanding was that if initial_token is left empty on the new
>>>>>>>> node, it just contacts the heaviest node and bisects its token range.
>>>>>>>
>>>>>>>
>>>>>>> If you are using vnodes and you have num_tokens set to 256 the new
>>>>>>> node will take token ranges dynamically. What is the configuration of your
>>>>>>> other nodes, are you setting num_tokens or initial_token on those?
>>>>>>>
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Patricia for your response!
>>>>>>>>
>>>>>>>> On the new node, I just see a lot of the following:
>>>>>>>>
>>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line
>>>>>>>> 400) Writing Memtable
>>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>>>
>>>>>>>> so basically it is just busy flushing, and compacting. Would you
>>>>>>>> have any ideas on why the 2x disk space blow up. My understanding was that
>>>>>>>> if initial_token is left empty on the new node, it just contacts the
>>>>>>>> heaviest node and bisects its token range. And the heaviest node is around
>>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because
>>>>>>>> compaction is falling behind?
>>>>>>>>
>>>>>>>> Ruchir
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>>>
>>>>>>>>> Ruchir,
>>>>>>>>>
>>>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>>>> compactions on the new bootstrapping node?
>>>>>>>>>
>>>>>>>>> With respect to the seed list, it is generally advisable to use 3
>>>>>>>>> seed nodes per AZ / DC.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster
>>>>>>>>>> where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>>>
>>>>>>>>>> One thing that I noticed that seemed off was the seeds property
>>>>>>>>>> in the yaml of the 13th node comprises of 1..12. Where as the seeds
>>>>>>>>>> property on the existing 12 nodes consists of all the other nodes except
>>>>>>>>>> the thirteenth node. Is this an issue?
>>>>>>>>>>
>>>>>>>>>> Any other insight is appreciated?
>>>>>>>>>>
>>>>>>>>>> Ruchir.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Patricia Gorla
>>>>>>>>> @patriciagorla
>>>>>>>>>
>>>>>>>>> Consultant
>>>>>>>>> Apache Cassandra Consulting
>>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Node bootstrap
Posted by Ruchir Jha <ru...@gmail.com>.
nodetool status:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
UN 10.10.20.27 1.89 TB 256 25.4%
76023cdd-c42d-4068-8b53-ae94584b8b04 rack1
UN 10.10.20.62 1.83 TB 256 25.5%
84b47313-da75-4519-94f3-3951d554a3e5 rack1
UN 10.10.20.47 1.87 TB 256 24.7%
bcd51a92-3150-41ae-9c51-104ea154f6fa rack1
UN 10.10.20.45 1.7 TB 256 22.6%
8d6bce33-8179-4660-8443-2cf822074ca4 rack1
UN 10.10.20.15 1.86 TB 256 24.5%
01a01f07-4df2-4c87-98e9-8dd38b3e4aee rack1
UN 10.10.20.31 1.87 TB 256 24.9%
1435acf9-c64d-4bcd-b6a4-abcec209815e rack1
UN 10.10.20.35 1.86 TB 256 25.8%
17cb8772-2444-46ff-8525-33746514727d rack1
UN 10.10.20.51 1.89 TB 256 25.0%
0343cd58-3686-465f-8280-56fb72d161e2 rack1
UN 10.10.20.19 1.91 TB 256 25.5%
30ddf003-4d59-4a3e-85fa-e94e4adba1cb rack1
UN 10.10.20.39 1.93 TB 256 26.0%
b7d44c26-4d75-4d36-a779-b7e7bdaecbc9 rack1
UN 10.10.20.52 1.81 TB 256 25.4%
6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e rack1
UN 10.10.20.22 1.89 TB 256 24.8%
46af9664-8975-4c91-847f-3f7b8f8d5ce2 rack1
Note: The new node is not part of the above list.
nodetool compactionstats:
pending tasks: 1649
compaction type keyspace column family completed
total unit progress
Compaction iprod customerorder 1682804084
17956558077 bytes 9.37%
Compaction prodgatecustomerorder 1664239271
1693502275 bytes 98.27%
Compaction qa_config_bkupfixsessionconfig_hist
2443 27253 bytes 8.96%
Compaction prodgatecustomerorder_hist
1770577280 5026699390 bytes 35.22%
Compaction iprodgatecustomerorder_hist
2959560205 312350192622 bytes 0.95%
On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy <ma...@boxever.com> wrote:
> Yes num_tokens is set to 256. initial_token is blank on all nodes
>> including the new one.
>
>
> Ok so you have num_tokens set to 256 for all nodes with initial_token
> commented out, this means you are using vnodes and the new node will
> automatically grab a list of tokens to take over responsibility for.
>
> Pool Name Active Pending Completed Blocked
>> All time blocked
>> FlushWriter 0 0 1136 0
>> 512
>>
>> Looks like about 50% of flushes are blocked.
>>
>
> This is a problem as it indicates that the IO system cannot keep up.
>
> Just ran this on the new node:
>> nodetool netstats | grep "Streaming from" | wc -l
>> 10
>
>
> This is normal as the new node will most likely take tokens from all nodes
> in the cluster.
>
> Sorry for the multiple updates, but another thing I found was all the
>> other existing nodes have themselves in the seeds list, but the new node
>> does not have itself in the seeds list. Can that cause this issue?
>
>
> Seeds are only used when a new node is bootstrapping into the cluster and
> needs a set of ips to contact and discover the cluster, so this would have
> no impact on data sizes or streaming. In general it would be considered
> best practice to have a set of 2-3 seeds from each data center, with all
> nodes having the same seed list.
>
>
> What is the current output of 'nodetool compactionstats'? Could you also
> paste the output of nodetool status <keyspace>?
>
> Mark
>
>
>
> On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> Sorry for the multiple updates, but another thing I found was all the
>> other existing nodes have themselves in the seeds list, but the new node
>> does not have itself in the seeds list. Can that cause this issue?
>>
>>
>> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> Just ran this on the new node:
>>>
>>> nodetool netstats | grep "Streaming from" | wc -l
>>> 10
>>>
>>> Seems like the new node is receiving data from 10 other nodes. Is that
>>> expected in a vnodes enabled environment?
>>>
>>> Ruchir.
>>>
>>>
>>>
>>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com>
>>> wrote:
>>>
>>>> Also not sure if this is relevant but just noticed the nodetool tpstats
>>>> output:
>>>>
>>>> Pool Name Active Pending Completed Blocked
>>>> All time blocked
>>>> FlushWriter 0 0 1136 0
>>>> 512
>>>>
>>>> Looks like about 50% of flushes are blocked.
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>>> including the new one.
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
>>>>> wrote:
>>>>>
>>>>>> My understanding was that if initial_token is left empty on the new
>>>>>>> node, it just contacts the heaviest node and bisects its token range.
>>>>>>
>>>>>>
>>>>>> If you are using vnodes and you have num_tokens set to 256 the new
>>>>>> node will take token ranges dynamically. What is the configuration of your
>>>>>> other nodes, are you setting num_tokens or initial_token on those?
>>>>>>
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Patricia for your response!
>>>>>>>
>>>>>>> On the new node, I just see a lot of the following:
>>>>>>>
>>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line
>>>>>>> 400) Writing Memtable
>>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>>
>>>>>>> so basically it is just busy flushing, and compacting. Would you
>>>>>>> have any ideas on why the 2x disk space blow up. My understanding was that
>>>>>>> if initial_token is left empty on the new node, it just contacts the
>>>>>>> heaviest node and bisects its token range. And the heaviest node is around
>>>>>>> 2.1 TB, and the new node is already at 4 TB. Could this be because
>>>>>>> compaction is falling behind?
>>>>>>>
>>>>>>> Ruchir
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>>
>>>>>>>> Ruchir,
>>>>>>>>
>>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>>> compactions on the new bootstrapping node?
>>>>>>>>
>>>>>>>> With respect to the seed list, it is generally advisable to use 3
>>>>>>>> seed nodes per AZ / DC.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster
>>>>>>>>> where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>>
>>>>>>>>> One thing that I noticed that seemed off was the seeds property in
>>>>>>>>> the yaml of the 13th node comprises of 1..12. Where as the seeds property
>>>>>>>>> on the existing 12 nodes consists of all the other nodes except the
>>>>>>>>> thirteenth node. Is this an issue?
>>>>>>>>>
>>>>>>>>> Any other insight is appreciated?
>>>>>>>>>
>>>>>>>>> Ruchir.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Patricia Gorla
>>>>>>>> @patriciagorla
>>>>>>>>
>>>>>>>> Consultant
>>>>>>>> Apache Cassandra Consulting
>>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Node bootstrap
Posted by Mark Reddy <ma...@boxever.com>.
>
> Yes num_tokens is set to 256. initial_token is blank on all nodes
> including the new one.
Ok so you have num_tokens set to 256 for all nodes with initial_token
commented out, this means you are using vnodes and the new node will
automatically grab a list of tokens to take over responsibility for.
Pool Name Active Pending Completed Blocked All
> time blocked
> FlushWriter 0 0 1136 0
> 512
>
> Looks like about 50% of flushes are blocked.
>
This is a problem as it indicates that the IO system cannot keep up.
Just ran this on the new node:
> nodetool netstats | grep "Streaming from" | wc -l
> 10
This is normal as the new node will most likely take tokens from all nodes
in the cluster.
Sorry for the multiple updates, but another thing I found was all the other
> existing nodes have themselves in the seeds list, but the new node does not
> have itself in the seeds list. Can that cause this issue?
Seeds are only used when a new node is bootstrapping into the cluster and
needs a set of ips to contact and discover the cluster, so this would have
no impact on data sizes or streaming. In general it would be considered
best practice to have a set of 2-3 seeds from each data center, with all
nodes having the same seed list.
What is the current output of 'nodetool compactionstats'? Could you also
paste the output of nodetool status <keyspace>?
Mark
On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha <ru...@gmail.com> wrote:
> Sorry for the multiple updates, but another thing I found was all the
> other existing nodes have themselves in the seeds list, but the new node
> does not have itself in the seeds list. Can that cause this issue?
>
>
> On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> Just ran this on the new node:
>>
>> nodetool netstats | grep "Streaming from" | wc -l
>> 10
>>
>> Seems like the new node is receiving data from 10 other nodes. Is that
>> expected in a vnodes enabled environment?
>>
>> Ruchir.
>>
>>
>>
>> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> Also not sure if this is relevant but just noticed the nodetool tpstats
>>> output:
>>>
>>> Pool Name Active Pending Completed Blocked
>>> All time blocked
>>> FlushWriter 0 0 1136 0
>>> 512
>>>
>>> Looks like about 50% of flushes are blocked.
>>>
>>>
>>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com>
>>> wrote:
>>>
>>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>>> including the new one.
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
>>>> wrote:
>>>>
>>>>> My understanding was that if initial_token is left empty on the new
>>>>>> node, it just contacts the heaviest node and bisects its token range.
>>>>>
>>>>>
>>>>> If you are using vnodes and you have num_tokens set to 256 the new
>>>>> node will take token ranges dynamically. What is the configuration of your
>>>>> other nodes, are you setting num_tokens or initial_token on those?
>>>>>
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Patricia for your response!
>>>>>>
>>>>>> On the new node, I just see a lot of the following:
>>>>>>
>>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line
>>>>>> 400) Writing Memtable
>>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>>
>>>>>> so basically it is just busy flushing, and compacting. Would you have
>>>>>> any ideas on why the 2x disk space blow up. My understanding was that if
>>>>>> initial_token is left empty on the new node, it just contacts the heaviest
>>>>>> node and bisects its token range. And the heaviest node is around 2.1 TB,
>>>>>> and the new node is already at 4 TB. Could this be because compaction is
>>>>>> falling behind?
>>>>>>
>>>>>> Ruchir
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>>> patricia@thelastpickle.com> wrote:
>>>>>>
>>>>>>> Ruchir,
>>>>>>>
>>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>>> compactions on the new bootstrapping node?
>>>>>>>
>>>>>>> With respect to the seed list, it is generally advisable to use 3
>>>>>>> seed nodes per AZ / DC.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster
>>>>>>>> where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>>> running major compactions while the streaming is going on?
>>>>>>>>
>>>>>>>> One thing that I noticed that seemed off was the seeds property in
>>>>>>>> the yaml of the 13th node comprises of 1..12. Where as the seeds property
>>>>>>>> on the existing 12 nodes consists of all the other nodes except the
>>>>>>>> thirteenth node. Is this an issue?
>>>>>>>>
>>>>>>>> Any other insight is appreciated?
>>>>>>>>
>>>>>>>> Ruchir.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Patricia Gorla
>>>>>>> @patriciagorla
>>>>>>>
>>>>>>> Consultant
>>>>>>> Apache Cassandra Consulting
>>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Node bootstrap
Posted by Ruchir Jha <ru...@gmail.com>.
Sorry for the multiple updates, but another thing I found was all the other
existing nodes have themselves in the seeds list, but the new node does not
have itself in the seeds list. Can that cause this issue?
On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha <ru...@gmail.com> wrote:
> Just ran this on the new node:
>
> nodetool netstats | grep "Streaming from" | wc -l
> 10
>
> Seems like the new node is receiving data from 10 other nodes. Is that
> expected in a vnodes enabled environment?
>
> Ruchir.
>
>
>
> On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> Also not sure if this is relevant but just noticed the nodetool tpstats
>> output:
>>
>> Pool Name Active Pending Completed Blocked
>> All time blocked
>> FlushWriter 0 0 1136 0
>> 512
>>
>> Looks like about 50% of flushes are blocked.
>>
>>
>> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>>> including the new one.
>>>
>>>
>>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
>>> wrote:
>>>
>>>> My understanding was that if initial_token is left empty on the new
>>>>> node, it just contacts the heaviest node and bisects its token range.
>>>>
>>>>
>>>> If you are using vnodes and you have num_tokens set to 256 the new node
>>>> will take token ranges dynamically. What is the configuration of your other
>>>> nodes, are you setting num_tokens or initial_token on those?
>>>>
>>>>
>>>> Mark
>>>>
>>>>
>>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Patricia for your response!
>>>>>
>>>>> On the new node, I just see a lot of the following:
>>>>>
>>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
>>>>> Writing Memtable
>>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
>>>>> CompactionTask.java (line 262) Compacted 12 sstables to
>>>>>
>>>>> so basically it is just busy flushing, and compacting. Would you have
>>>>> any ideas on why the 2x disk space blow up. My understanding was that if
>>>>> initial_token is left empty on the new node, it just contacts the heaviest
>>>>> node and bisects its token range. And the heaviest node is around 2.1 TB,
>>>>> and the new node is already at 4 TB. Could this be because compaction is
>>>>> falling behind?
>>>>>
>>>>> Ruchir
>>>>>
>>>>>
>>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>>> patricia@thelastpickle.com> wrote:
>>>>>
>>>>>> Ruchir,
>>>>>>
>>>>>> What exactly are you seeing in the logs? Are you running major
>>>>>> compactions on the new bootstrapping node?
>>>>>>
>>>>>> With respect to the seed list, it is generally advisable to use 3
>>>>>> seed nodes per AZ / DC.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster
>>>>>>> where the average data size per node is about 2.1 TB. The bootstrap
>>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>>> running major compactions while the streaming is going on?
>>>>>>>
>>>>>>> One thing that I noticed that seemed off was the seeds property in
>>>>>>> the yaml of the 13th node comprises of 1..12. Where as the seeds property
>>>>>>> on the existing 12 nodes consists of all the other nodes except the
>>>>>>> thirteenth node. Is this an issue?
>>>>>>>
>>>>>>> Any other insight is appreciated?
>>>>>>>
>>>>>>> Ruchir.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Patricia Gorla
>>>>>> @patriciagorla
>>>>>>
>>>>>> Consultant
>>>>>> Apache Cassandra Consulting
>>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Node bootstrap
Posted by Ruchir Jha <ru...@gmail.com>.
Just ran this on the new node:
nodetool netstats | grep "Streaming from" | wc -l
10
Seems like the new node is receiving data from 10 other nodes. Is that
expected in a vnodes enabled environment?
Ruchir.
On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha <ru...@gmail.com> wrote:
> Also not sure if this is relevant but just noticed the nodetool tpstats
> output:
>
> Pool Name Active Pending Completed Blocked
> All time blocked
> FlushWriter 0 0 1136 0
> 512
>
> Looks like about 50% of flushes are blocked.
>
>
> On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> Yes num_tokens is set to 256. initial_token is blank on all nodes
>> including the new one.
>>
>>
>> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
>> wrote:
>>
>>> My understanding was that if initial_token is left empty on the new
>>>> node, it just contacts the heaviest node and bisects its token range.
>>>
>>>
>>> If you are using vnodes and you have num_tokens set to 256 the new node
>>> will take token ranges dynamically. What is the configuration of your other
>>> nodes, are you setting num_tokens or initial_token on those?
>>>
>>>
>>> Mark
>>>
>>>
>>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>>
>>>> Thanks Patricia for your response!
>>>>
>>>> On the new node, I just see a lot of the following:
>>>>
>>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
>>>> Writing Memtable
>>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
>>>> (line 262) Compacted 12 sstables to
>>>>
>>>> so basically it is just busy flushing, and compacting. Would you have
>>>> any ideas on why the 2x disk space blow up. My understanding was that if
>>>> initial_token is left empty on the new node, it just contacts the heaviest
>>>> node and bisects its token range. And the heaviest node is around 2.1 TB,
>>>> and the new node is already at 4 TB. Could this be because compaction is
>>>> falling behind?
>>>>
>>>> Ruchir
>>>>
>>>>
>>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>>> patricia@thelastpickle.com> wrote:
>>>>
>>>>> Ruchir,
>>>>>
>>>>> What exactly are you seeing in the logs? Are you running major
>>>>> compactions on the new bootstrapping node?
>>>>>
>>>>> With respect to the seed list, it is generally advisable to use 3 seed
>>>>> nodes per AZ / DC.
>>>>>
>>>>> Cheers,
>>>>>
>>>>>
>>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster
>>>>>> where the average data size per node is about 2.1 TB. The bootstrap
>>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>>> running major compactions while the streaming is going on?
>>>>>>
>>>>>> One thing that I noticed that seemed off was the seeds property in
>>>>>> the yaml of the 13th node comprises of 1..12. Where as the seeds property
>>>>>> on the existing 12 nodes consists of all the other nodes except the
>>>>>> thirteenth node. Is this an issue?
>>>>>>
>>>>>> Any other insight is appreciated?
>>>>>>
>>>>>> Ruchir.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Patricia Gorla
>>>>> @patriciagorla
>>>>>
>>>>> Consultant
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>>
>>>>
>>>>
>>>
>>
>
Re: Node bootstrap
Posted by Ruchir Jha <ru...@gmail.com>.
Also not sure if this is relevant but just noticed the nodetool tpstats
output:
Pool Name Active Pending Completed Blocked All
time blocked
FlushWriter 0 0 1136 0
512
Looks like about 50% of flushes are blocked.
On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha <ru...@gmail.com> wrote:
> Yes num_tokens is set to 256. initial_token is blank on all nodes
> including the new one.
>
>
> On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com>
> wrote:
>
>> My understanding was that if initial_token is left empty on the new node,
>>> it just contacts the heaviest node and bisects its token range.
>>
>>
>> If you are using vnodes and you have num_tokens set to 256 the new node
>> will take token ranges dynamically. What is the configuration of your other
>> nodes, are you setting num_tokens or initial_token on those?
>>
>>
>> Mark
>>
>>
>> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> Thanks Patricia for your response!
>>>
>>> On the new node, I just see a lot of the following:
>>>
>>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
>>> Writing Memtable
>>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
>>> (line 262) Compacted 12 sstables to
>>>
>>> so basically it is just busy flushing, and compacting. Would you have
>>> any ideas on why the 2x disk space blow up. My understanding was that if
>>> initial_token is left empty on the new node, it just contacts the heaviest
>>> node and bisects its token range. And the heaviest node is around 2.1 TB,
>>> and the new node is already at 4 TB. Could this be because compaction is
>>> falling behind?
>>>
>>> Ruchir
>>>
>>>
>>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>>> patricia@thelastpickle.com> wrote:
>>>
>>>> Ruchir,
>>>>
>>>> What exactly are you seeing in the logs? Are you running major
>>>> compactions on the new bootstrapping node?
>>>>
>>>> With respect to the seed list, it is generally advisable to use 3 seed
>>>> nodes per AZ / DC.
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>>> wrote:
>>>>
>>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster
>>>>> where the average data size per node is about 2.1 TB. The bootstrap
>>>>> streaming has been going on for 2 days now, and the disk size on the new
>>>>> node is already above 4 TB and still going. Is this because the new node is
>>>>> running major compactions while the streaming is going on?
>>>>>
>>>>> One thing that I noticed that seemed off was the seeds property in the
>>>>> yaml of the 13th node comprises of 1..12. Where as the seeds property on
>>>>> the existing 12 nodes consists of all the other nodes except the thirteenth
>>>>> node. Is this an issue?
>>>>>
>>>>> Any other insight is appreciated?
>>>>>
>>>>> Ruchir.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Patricia Gorla
>>>> @patriciagorla
>>>>
>>>> Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>>
>>>
>>>
>>
>
Re: Node bootstrap
Posted by Ruchir Jha <ru...@gmail.com>.
Yes num_tokens is set to 256. initial_token is blank on all nodes including
the new one.
On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy <ma...@boxever.com> wrote:
> My understanding was that if initial_token is left empty on the new node,
>> it just contacts the heaviest node and bisects its token range.
>
>
> If you are using vnodes and you have num_tokens set to 256 the new node
> will take token ranges dynamically. What is the configuration of your other
> nodes, are you setting num_tokens or initial_token on those?
>
>
> Mark
>
>
> On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> Thanks Patricia for your response!
>>
>> On the new node, I just see a lot of the following:
>>
>> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
>> Writing Memtable
>> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
>> (line 262) Compacted 12 sstables to
>>
>> so basically it is just busy flushing, and compacting. Would you have any
>> ideas on why the 2x disk space blow up. My understanding was that if
>> initial_token is left empty on the new node, it just contacts the heaviest
>> node and bisects its token range. And the heaviest node is around 2.1 TB,
>> and the new node is already at 4 TB. Could this be because compaction is
>> falling behind?
>>
>> Ruchir
>>
>>
>> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <
>> patricia@thelastpickle.com> wrote:
>>
>>> Ruchir,
>>>
>>> What exactly are you seeing in the logs? Are you running major
>>> compactions on the new bootstrapping node?
>>>
>>> With respect to the seed list, it is generally advisable to use 3 seed
>>> nodes per AZ / DC.
>>>
>>> Cheers,
>>>
>>>
>>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com>
>>> wrote:
>>>
>>>> I am trying to bootstrap the thirteenth node in a 12 node cluster where
>>>> the average data size per node is about 2.1 TB. The bootstrap streaming has
>>>> been going on for 2 days now, and the disk size on the new node is already
>>>> above 4 TB and still going. Is this because the new node is running major
>>>> compactions while the streaming is going on?
>>>>
>>>> One thing that I noticed that seemed off was the seeds property in the
>>>> yaml of the 13th node comprises of 1..12. Where as the seeds property on
>>>> the existing 12 nodes consists of all the other nodes except the thirteenth
>>>> node. Is this an issue?
>>>>
>>>> Any other insight is appreciated?
>>>>
>>>> Ruchir.
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Patricia Gorla
>>> @patriciagorla
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com <http://thelastpickle.com>
>>>
>>
>>
>
Re: Node bootstrap
Posted by Mark Reddy <ma...@boxever.com>.
>
> My understanding was that if initial_token is left empty on the new node,
> it just contacts the heaviest node and bisects its token range.
If you are using vnodes and you have num_tokens set to 256 the new node
will take token ranges dynamically. What is the configuration of your other
nodes, are you setting num_tokens or initial_token on those?
Mark
On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha <ru...@gmail.com> wrote:
> Thanks Patricia for your response!
>
> On the new node, I just see a lot of the following:
>
> INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
> Writing Memtable
> INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
> (line 262) Compacted 12 sstables to
>
> so basically it is just busy flushing, and compacting. Would you have any
> ideas on why the 2x disk space blow up. My understanding was that if
> initial_token is left empty on the new node, it just contacts the heaviest
> node and bisects its token range. And the heaviest node is around 2.1 TB,
> and the new node is already at 4 TB. Could this be because compaction is
> falling behind?
>
> Ruchir
>
>
> On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <patricia@thelastpickle.com
> > wrote:
>
>> Ruchir,
>>
>> What exactly are you seeing in the logs? Are you running major
>> compactions on the new bootstrapping node?
>>
>> With respect to the seed list, it is generally advisable to use 3 seed
>> nodes per AZ / DC.
>>
>> Cheers,
>>
>>
>> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com> wrote:
>>
>>> I am trying to bootstrap the thirteenth node in a 12 node cluster where
>>> the average data size per node is about 2.1 TB. The bootstrap streaming has
>>> been going on for 2 days now, and the disk size on the new node is already
>>> above 4 TB and still going. Is this because the new node is running major
>>> compactions while the streaming is going on?
>>>
>>> One thing that I noticed that seemed off was the seeds property in the
>>> yaml of the 13th node comprises of 1..12. Where as the seeds property on
>>> the existing 12 nodes consists of all the other nodes except the thirteenth
>>> node. Is this an issue?
>>>
>>> Any other insight is appreciated?
>>>
>>> Ruchir.
>>>
>>>
>>>
>>
>>
>> --
>> Patricia Gorla
>> @patriciagorla
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com <http://thelastpickle.com>
>>
>
>
Re: Node bootstrap
Posted by Ruchir Jha <ru...@gmail.com>.
Thanks Patricia for your response!
On the new node, I just see a lot of the following:
INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
Writing Memtable
INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
(line 262) Compacted 12 sstables to
so basically it is just busy flushing, and compacting. Would you have any
ideas on why the 2x disk space blow up. My understanding was that if
initial_token is left empty on the new node, it just contacts the heaviest
node and bisects its token range. And the heaviest node is around 2.1 TB,
and the new node is already at 4 TB. Could this be because compaction is
falling behind?
Ruchir
On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla <pa...@thelastpickle.com>
wrote:
> Ruchir,
>
> What exactly are you seeing in the logs? Are you running major compactions
> on the new bootstrapping node?
>
> With respect to the seed list, it is generally advisable to use 3 seed
> nodes per AZ / DC.
>
> Cheers,
>
>
> On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com> wrote:
>
>> I am trying to bootstrap the thirteenth node in a 12 node cluster where
>> the average data size per node is about 2.1 TB. The bootstrap streaming has
>> been going on for 2 days now, and the disk size on the new node is already
>> above 4 TB and still going. Is this because the new node is running major
>> compactions while the streaming is going on?
>>
>> One thing that I noticed that seemed off was the seeds property in the
>> yaml of the 13th node comprises of 1..12. Where as the seeds property on
>> the existing 12 nodes consists of all the other nodes except the thirteenth
>> node. Is this an issue?
>>
>> Any other insight is appreciated?
>>
>> Ruchir.
>>
>>
>>
>
>
> --
> Patricia Gorla
> @patriciagorla
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com <http://thelastpickle.com>
>
Re: Node bootstrap
Posted by Patricia Gorla <pa...@thelastpickle.com>.
Ruchir,
What exactly are you seeing in the logs? Are you running major compactions
on the new bootstrapping node?
With respect to the seed list, it is generally advisable to use 3 seed
nodes per AZ / DC.
Cheers,
On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha <ru...@gmail.com> wrote:
> I am trying to bootstrap the thirteenth node in a 12 node cluster where
> the average data size per node is about 2.1 TB. The bootstrap streaming has
> been going on for 2 days now, and the disk size on the new node is already
> above 4 TB and still going. Is this because the new node is running major
> compactions while the streaming is going on?
>
> One thing that I noticed that seemed off was the seeds property in the
> yaml of the 13th node comprises of 1..12. Where as the seeds property on
> the existing 12 nodes consists of all the other nodes except the thirteenth
> node. Is this an issue?
>
> Any other insight is appreciated?
>
> Ruchir.
>
>
>
--
Patricia Gorla
@patriciagorla
Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com <http://thelastpickle.com>