You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vickrum Loi <vi...@idioplatform.com> on 2016/01/06 16:26:47 UTC
New node has high network and disk usage.
Hi,
We recently added a new node to our cluster in order to replace a node that
died (hardware failure we believe). For the next two weeks it had high disk
and network activity. We replaced the server, but it's happened again.
We've looked into memory allowances, disk performance, number of
connections, and all the nodetool stats, but can't find the cause of the
issue.
`nodetool tpstats`[0] shows a lot of active and pending threads, in
comparison to the rest of the cluster, but that's likely a symptom, not a
cause.
`nodetool status`[1] shows the cluster isn't quite balanced. The bad node
(D) has less data.
Disk Activity[2] and Network activity[3] on this node is far higher than
the rest.
The only other difference this node has to the rest of the cluster is that
its on the ext4 filesystem, whereas the rest are ext3, but we've done
plenty of testing there and can't see how that would affect performance on
this node so much.
Nothing of note in system.log.
What should our next step be in trying to diagnose this issue?
Best wishes,
Vic
[0] `nodetool tpstats` output:
Good node:
Pool Name Active Pending Completed Blocked
All time blocked
ReadStage 0 0 46311521
0 0
RequestResponseStage 0 0 23817366
0 0
MutationStage 0 0 47389269
0 0
ReadRepairStage 0 0 11108
0 0
ReplicateOnWriteStage 0 0 0
0 0
GossipStage 0 0 5259908
0 0
CacheCleanupExecutor 0 0 0
0 0
MigrationStage 0 0 30
0 0
MemoryMeter 0 0 16563
0 0
FlushWriter 0 0 39637
0 26
ValidationExecutor 0 0 19013
0 0
InternalResponseStage 0 0 9
0 0
AntiEntropyStage 0 0 38026
0 0
MemtablePostFlusher 0 0 81740
0 0
MiscStage 0 0 19196
0 0
PendingRangeCalculator 0 0 23
0 0
CompactionExecutor 0 0 61629
0 0
commitlog_archiver 0 0 0
0 0
HintedHandoff 0 0 63
0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
PAGED_RANGE 0
BINARY 0
READ 640
MUTATION 0
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
Bad node:
Pool Name Active Pending Completed Blocked
All time blocked
ReadStage 32 113 52216
0 0
RequestResponseStage 0 0 4167
0 0
MutationStage 0 0 127559
0 0
ReadRepairStage 0 0 125
0 0
ReplicateOnWriteStage 0 0 0
0 0
GossipStage 0 0 9965
0 0
CacheCleanupExecutor 0 0 0
0 0
MigrationStage 0 0 0
0 0
MemoryMeter 0 0 24
0 0
FlushWriter 0 0 27
0 1
ValidationExecutor 0 0 0
0 0
InternalResponseStage 0 0 0
0 0
AntiEntropyStage 0 0 0
0 0
MemtablePostFlusher 0 0 96
0 0
MiscStage 0 0 0
0 0
PendingRangeCalculator 0 0 10
0 0
CompactionExecutor 1 1 73
0 0
commitlog_archiver 0 0 0
0 0
HintedHandoff 0 0 15
0 0
Message type Dropped
RANGE_SLICE 130
READ_REPAIR 1
PAGED_RANGE 0
BINARY 0
READ 31032
MUTATION 865
_TRACE 0
REQUEST_RESPONSE 7
COUNTER_MUTATION 0
[1] `nodetool status` output:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host
ID Rack
UN A (Good) 252.37 GB 256 23.0%
9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
UN B (Good) 245.91 GB 256 24.4%
6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
UN C (Good) 254.79 GB 256 23.7%
f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
UN D (Bad) 163.85 GB 256 28.8%
faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
[2] Disk read/write ops:
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
[3] Network in/out:
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
Re: New node has high network and disk usage.
Posted by Kai Wang <de...@gmail.com>.
James,
Thanks for sharing. Anyway, good to know there's one more thing to add to
the checklist.
On Sun, Jan 17, 2016 at 12:23 PM, James Griffin <
james.griffin@idioplatform.com> wrote:
> Hi all,
>
> Just to let you know, we finally figured this out on Friday. It turns out
> the new nodes had an older version of the kernel installed. Upgrading the
> kernel solved our issues. For reference, the "bad" kernel was
> 3.2.0-75-virtual, upgrading to 3.2.0-86-virtual resolved the issue. We
> still don't fully understand why this kernel bug didn't affect *all *our
> nodes (in the end we had three nodes with that kernel, only two of them
> exhibited this issue), but there we go.
>
> Thanks everyone for your help
>
> Cheers,
> Griff
>
> On 14 January 2016 at 15:14, James Griffin <james.griffin@idioplatform.com
> > wrote:
>
>> Hi Kai,
>>
>> Well observed - running `nodetool status` without specifying keyspace
>> does report ~33% on each node. We have two keyspaces on this cluster - if I
>> specify either of them the ownership reported by each node is 100%, so I
>> believe the repair completed successfully.
>>
>> Best wishes,
>>
>> Griff
>>
>> [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
>> CTO
>> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 |
>> Twitter: @imaginaryroots <http://twitter.com/imaginaryroots> | Skype:
>> j.s.griffin
>> idio helps major brands and publishers to build closer relationships with
>> their customers and prospects by learning from their content consumption
>> and acting on that insight. We call it Content Intelligence, and it
>> integrates with your existing marketing technology to provide detailed
>> customer interest profiles in real-time across all channels, and to
>> personalize content into every channel for every customer. See
>> http://idioplatform.com
>> <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F> for
>> more information.
>>
>> On 14 January 2016 at 15:08, Kai Wang <de...@gmail.com> wrote:
>>
>>> James,
>>>
>>> I may miss something. You mentioned your cluster had RF=3. Then why
>>> does "nodetool status" show each node owns 1/3 of the data especially after
>>> a full repair?
>>>
>>> On Thu, Jan 14, 2016 at 9:56 AM, James Griffin <
>>> james.griffin@idioplatform.com> wrote:
>>>
>>>> Hi Kai,
>>>>
>>>> Below - nothing going on that I can see
>>>>
>>>> $ nodetool netstats
>>>> Mode: NORMAL
>>>> Not sending any streams.
>>>> Read Repair Statistics:
>>>> Attempted: 0
>>>> Mismatch (Blocking): 0
>>>> Mismatch (Background): 0
>>>> Pool Name Active Pending Completed
>>>> Commands n/a 0 6326
>>>> Responses n/a 0 219356
>>>>
>>>>
>>>>
>>>> Best wishes,
>>>>
>>>> Griff
>>>>
>>>> [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
>>>> CTO
>>>> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 |
>>>> Twitter: @imaginaryroots <http://twitter.com/imaginaryroots> | Skype:
>>>> j.s.griffin
>>>> idio helps major brands and publishers to build closer relationships
>>>> with their customers and prospects by learning from their content
>>>> consumption and acting on that insight. We call it Content Intelligence,
>>>> and it integrates with your existing marketing technology to provide
>>>> detailed customer interest profiles in real-time across all channels, and
>>>> to personalize content into every channel for every customer. See
>>>> http://idioplatform.com
>>>> <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F> for
>>>> more information.
>>>>
>>>> On 14 January 2016 at 14:22, Kai Wang <de...@gmail.com> wrote:
>>>>
>>>>> James,
>>>>>
>>>>> Can you post the result of "nodetool netstats" on the bad node?
>>>>>
>>>>> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
>>>>> james.griffin@idioplatform.com> wrote:
>>>>>
>>>>>> A summary of what we've done this morning:
>>>>>>
>>>>>> - Noted that there are no GCInspector lines in system.log on bad
>>>>>> node (there are GCInspector logs on other healthy nodes)
>>>>>> - Turned on GC logging, noted that we had logs which stated out
>>>>>> total time for which application threads were stopped was high - ~10s.
>>>>>> - Not seeing failures or any kind (promotion or concurrent mark)
>>>>>> - Attached Visual VM: noted that heap usage was very low (~5%
>>>>>> usage and stable) and it didn't display hallmarks GC of activity. PermGen
>>>>>> also very stable
>>>>>> - Downloaded GC logs and examined in GC Viewer. Noted that:
>>>>>> - We had lots of pauses (again around 10s), but no full GC.
>>>>>> - From a 2,300s sample, just over 2,000s were spent with
>>>>>> threads paused
>>>>>> - Spotted many small GCs in the new space - realised that Xmn
>>>>>> value was very low (200M against a heap size of 3750M). Increased Xmn to
>>>>>> 937M - no change in server behaviour (high load, high reads/s on disk, high
>>>>>> CPU wait)
>>>>>>
>>>>>> Current output of jstat:
>>>>>>
>>>>>> S0 S1 E O P YGC YGCT FGC FGCT
>>>>>> GCT
>>>>>> 2 0.00 45.20 12.82 26.84 76.21 2333 63.684 2 0.039
>>>>>> 63.724
>>>>>> 3 63.58 0.00 33.68 8.04 75.19 14 1.812 2 0.103
>>>>>> 1.915
>>>>>>
>>>>>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise
>>>>>> than 2 (which has normal load statistics).
>>>>>>
>>>>>> Anywhere else you can recommend we look?
>>>>>>
>>>>>> Griff
>>>>>>
>>>>>> On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in>
>>>>>> wrote:
>>>>>>
>>>>>>> Ok. I saw dropped mutations on your cluster and full gc is a common
>>>>>>> cause for that.
>>>>>>> Can you just search the word GCInspector in system.log and share the
>>>>>>> frequency of minor and full gc. Moreover, are you printing promotion
>>>>>>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>>>>>>> or concurrent mode failures?
>>>>>>>
>>>>>>> If you are on CMS, you need to fine tune your heap options to
>>>>>>> address full gc.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Anuj
>>>>>>>
>>>>>>> Sent from Yahoo Mail on Android
>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>
>>>>>>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>>>>>>> <ja...@idioplatform.com> wrote:
>>>>>>> I think I was incorrect in assuming GC wasn't an issue due to the
>>>>>>> lack of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>>>>>>> differences, though
>>>>>>> comparing the startup flags on the two machines show the GC config
>>>>>>> is identical.:
>>>>>>>
>>>>>>> $ jstat -gcutil
>>>>>>> S0 S1 E O P YGC YGCT FGC FGCT
>>>>>>> GCT
>>>>>>> 2 5.08 0.00 55.72 18.24 59.90 25986 619.827 28 1.597
>>>>>>> 621.424
>>>>>>> 3 0.00 0.00 22.79 17.87 59.99 422600 11225.979 668 57.383
>>>>>>> 11283.361
>>>>>>>
>>>>>>> Here's typical output for iostat on nodes 2 & 3 as well:
>>>>>>>
>>>>>>> $ iostat -dmx md0
>>>>>>>
>>>>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>>>>>>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>>>>>>> 2 md0 0.00 0.00 339.00 0.00 9.77 0.00
>>>>>>> 59.00 0.00 0.00 0.00 0.00 0.00 0.00
>>>>>>> 3 md0 0.00 0.00 2069.00 1.00 85.85 0.00
>>>>>>> 84.94 0.00 0.00 0.00 0.00 0.00 0.00
>>>>>>>
>>>>>>> Griff
>>>>>>>
>>>>>>> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Node 2 has slightly higher data but that should be ok. Not sure how
>>>>>>>> read ops are so high when no IO intensive activity such as repair and
>>>>>>>> compaction is running on node 3.May be you can try investigating logs to
>>>>>>>> see whats happening.
>>>>>>>>
>>>>>>>> Others on the mailing list could also share their views on the
>>>>>>>> situation.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Anuj
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Sent from Yahoo Mail on Android
>>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>>
>>>>>>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>>>>>>> <ja...@idioplatform.com> wrote:
>>>>>>>> Hi Anuj,
>>>>>>>>
>>>>>>>> Below is the output of nodetool status. The nodes were replaced
>>>>>>>> following the instructions in Datastax documentation for replacing running
>>>>>>>> nodes since the nodes were running fine, it was that the servers had been
>>>>>>>> incorrectly initialised and they thus had less disk space. The status below
>>>>>>>> shows 2 has significantly higher load, however as I say 2 is operating
>>>>>>>> normally and is running compactions, so I guess that's not an issue?
>>>>>>>>
>>>>>>>> Datacenter: datacenter1
>>>>>>>> =======================
>>>>>>>> Status=Up/Down
>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>> -- Address Load Tokens Owns Host ID
>>>>>>>> Rack
>>>>>>>> UN 1 253.59 GB 256 31.7%
>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>>>>> UN 2 302.23 GB 256 35.3%
>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>>>> UN 3 265.02 GB 256 33.1%
>>>>>>>> 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1
>>>>>>>>
>>>>>>>> Griff
>>>>>>>>
>>>>>>>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Revisiting the thread I can see that nodetool status had both good
>>>>>>>>> and bad nodes at same time. How do you replace nodes? When you say bad
>>>>>>>>> node..I understand that the node is no more usable even though Cassandra is
>>>>>>>>> UP? Is that correct?
>>>>>>>>>
>>>>>>>>> If a node is in bad shape and not working, adding new node may
>>>>>>>>> trigger streaming huge data from bad node too. Have you considered using
>>>>>>>>> the procedure for replacing a dead node?
>>>>>>>>>
>>>>>>>>> Please share Latest nodetool status.
>>>>>>>>>
>>>>>>>>> nodetool output shared earlier:
>>>>>>>>>
>>>>>>>>> `nodetool status` output:
>>>>>>>>>
>>>>>>>>> Status=Up/Down
>>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>>> -- Address Load Tokens Owns Host
>>>>>>>>> ID Rack
>>>>>>>>> UN A (Good) 252.37 GB 256 23.0%
>>>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>>>>>>> UN B (Good) 245.91 GB 256 24.4%
>>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>>>>>> UN C (Good) 254.79 GB 256 23.7%
>>>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>>>>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Anuj
>>>>>>>>>
>>>>>>>>> Sent from Yahoo Mail on Android
>>>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>>>
>>>>>>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>>>>>>>> <ja...@idioplatform.com> wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> We’ve spent a few days running things but are in the same
>>>>>>>>> position. To add some more flavour:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> - We have a 3-node ring, replication factor = 3. We’ve been
>>>>>>>>> running in this configuration for a few years without any real issues
>>>>>>>>> - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>>>>>>> brought in to replace two other nodes which had failed RAID0 configuration
>>>>>>>>> and thus were lacking in disk space.
>>>>>>>>> - When node 2 was brought into the ring, it exhibited high CPU
>>>>>>>>> wait, IO and load metrics
>>>>>>>>> - We subsequently brought 3 into the ring: as soon as 3 was
>>>>>>>>> fully bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>>>>>>> levels. Those same stats on 3, however, sky-rocketed
>>>>>>>>> - We’ve confirmed configuration across all three nodes are
>>>>>>>>> identical and in line with the recommended production settings
>>>>>>>>> - We’ve run a full repair
>>>>>>>>> - Node 2 is currently running compactions, 1 & 3 aren’t and
>>>>>>>>> have no pending
>>>>>>>>> - There is no GC happening from what I can see. Node 1 has a
>>>>>>>>> GC log, but that’s not been written to since May last year
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What we’re seeing at the moment is similar and normal stats on
>>>>>>>>> nodes 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>>>>>>> 2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>>>>>>> 3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Can you recommend any next steps?
>>>>>>>>>
>>>>>>>>> Griff
>>>>>>>>>
>>>>>>>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Vickrum,
>>>>>>>>>>
>>>>>>>>>> I would have proceeded with diagnosis as follows:
>>>>>>>>>>
>>>>>>>>>> 1. Analysis of sar report to check system health -cpu memory
>>>>>>>>>> swap disk etc.
>>>>>>>>>> System seems to be overloaded. This is evident from mutation
>>>>>>>>>> drops.
>>>>>>>>>>
>>>>>>>>>> 2. Make sure that all recommended Cassandra production settings
>>>>>>>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>>>>>>>
>>>>>>>>>> 3.Run full Repair on bad node and check data size. Node is owner
>>>>>>>>>> of maximum token range but has significant lower data.I doubt that
>>>>>>>>>> bootstrapping happened properly.
>>>>>>>>>>
>>>>>>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>>>>>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>>>>>>>
>>>>>>>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>>>>>>>
>>>>>>>>>> 6. Look for other common performance problems such as GC pauses
>>>>>>>>>> to make sure that dropped mutations are not caused by GC pauses.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Anuj
>>>>>>>>>>
>>>>>>>>>> Sent from Yahoo Mail on Android
>>>>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>>>>
>>>>>>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>>>>>>>> <vi...@idioplatform.com> wrote:
>>>>>>>>>> # nodetool compactionstats
>>>>>>>>>> pending tasks: 22
>>>>>>>>>> compaction type keyspace table
>>>>>>>>>> completed total unit progress
>>>>>>>>>> Compactionproduction_analytics
>>>>>>>>>> interactions 240410213 161172668724 bytes 0.15%
>>>>>>>>>>
>>>>>>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>>>>>>>> 120815385 226295183 bytes 53.39%
>>>>>>>>>> Active compaction remaining time : 2h39m58s
>>>>>>>>>>
>>>>>>>>>> Worth mentioning that compactions haven't been running on this
>>>>>>>>>> node particularly often. The node's been performing badly regardless of
>>>>>>>>>> whether it's compacting or not.
>>>>>>>>>>
>>>>>>>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> What’s your output of `nodetool compactionstats`?
>>>>>>>>>>>
>>>>>>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <
>>>>>>>>>>> vickrum.loi@idioplatform.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> We recently added a new node to our cluster in order to replace
>>>>>>>>>>> a node that died (hardware failure we believe). For the next two weeks it
>>>>>>>>>>> had high disk and network activity. We replaced the server, but it's
>>>>>>>>>>> happened again. We've looked into memory allowances, disk performance,
>>>>>>>>>>> number of connections, and all the nodetool stats, but can't find the cause
>>>>>>>>>>> of the issue.
>>>>>>>>>>>
>>>>>>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads,
>>>>>>>>>>> in comparison to the rest of the cluster, but that's likely a symptom, not
>>>>>>>>>>> a cause.
>>>>>>>>>>>
>>>>>>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The
>>>>>>>>>>> bad node (D) has less data.
>>>>>>>>>>>
>>>>>>>>>>> Disk Activity[2] and Network activity[3] on this node is far
>>>>>>>>>>> higher than the rest.
>>>>>>>>>>>
>>>>>>>>>>> The only other difference this node has to the rest of the
>>>>>>>>>>> cluster is that its on the ext4 filesystem, whereas the rest are ext3, but
>>>>>>>>>>> we've done plenty of testing there and can't see how that would affect
>>>>>>>>>>> performance on this node so much.
>>>>>>>>>>>
>>>>>>>>>>> Nothing of note in system.log.
>>>>>>>>>>>
>>>>>>>>>>> What should our next step be in trying to diagnose this issue?
>>>>>>>>>>>
>>>>>>>>>>> Best wishes,
>>>>>>>>>>> Vic
>>>>>>>>>>>
>>>>>>>>>>> [0] `nodetool tpstats` output:
>>>>>>>>>>>
>>>>>>>>>>> Good node:
>>>>>>>>>>> Pool Name Active Pending
>>>>>>>>>>> Completed Blocked All time blocked
>>>>>>>>>>> ReadStage 0 0
>>>>>>>>>>> 46311521 0 0
>>>>>>>>>>> RequestResponseStage 0 0
>>>>>>>>>>> 23817366 0 0
>>>>>>>>>>> MutationStage 0 0
>>>>>>>>>>> 47389269 0 0
>>>>>>>>>>> ReadRepairStage 0 0
>>>>>>>>>>> 11108 0 0
>>>>>>>>>>> ReplicateOnWriteStage 0 0
>>>>>>>>>>> 0 0 0
>>>>>>>>>>> GossipStage 0 0
>>>>>>>>>>> 5259908 0 0
>>>>>>>>>>> CacheCleanupExecutor 0 0
>>>>>>>>>>> 0 0 0
>>>>>>>>>>> MigrationStage 0 0
>>>>>>>>>>> 30 0 0
>>>>>>>>>>> MemoryMeter 0 0
>>>>>>>>>>> 16563 0 0
>>>>>>>>>>> FlushWriter 0 0
>>>>>>>>>>> 39637 0 26
>>>>>>>>>>> ValidationExecutor 0 0
>>>>>>>>>>> 19013 0 0
>>>>>>>>>>> InternalResponseStage 0 0
>>>>>>>>>>> 9 0 0
>>>>>>>>>>> AntiEntropyStage 0 0
>>>>>>>>>>> 38026 0 0
>>>>>>>>>>> MemtablePostFlusher 0 0
>>>>>>>>>>> 81740 0 0
>>>>>>>>>>> MiscStage 0 0
>>>>>>>>>>> 19196 0 0
>>>>>>>>>>> PendingRangeCalculator 0 0
>>>>>>>>>>> 23 0 0
>>>>>>>>>>> CompactionExecutor 0 0
>>>>>>>>>>> 61629 0 0
>>>>>>>>>>> commitlog_archiver 0 0
>>>>>>>>>>> 0 0 0
>>>>>>>>>>> HintedHandoff 0 0
>>>>>>>>>>> 63 0 0
>>>>>>>>>>>
>>>>>>>>>>> Message type Dropped
>>>>>>>>>>> RANGE_SLICE 0
>>>>>>>>>>> READ_REPAIR 0
>>>>>>>>>>> PAGED_RANGE 0
>>>>>>>>>>> BINARY 0
>>>>>>>>>>> READ 640
>>>>>>>>>>> MUTATION 0
>>>>>>>>>>> _TRACE 0
>>>>>>>>>>> REQUEST_RESPONSE 0
>>>>>>>>>>> COUNTER_MUTATION 0
>>>>>>>>>>>
>>>>>>>>>>> Bad node:
>>>>>>>>>>> Pool Name Active Pending
>>>>>>>>>>> Completed Blocked All time blocked
>>>>>>>>>>> ReadStage 32 113
>>>>>>>>>>> 52216 0 0
>>>>>>>>>>> RequestResponseStage 0 0
>>>>>>>>>>> 4167 0 0
>>>>>>>>>>> MutationStage 0 0
>>>>>>>>>>> 127559 0 0
>>>>>>>>>>> ReadRepairStage 0 0
>>>>>>>>>>> 125 0 0
>>>>>>>>>>> ReplicateOnWriteStage 0 0
>>>>>>>>>>> 0 0 0
>>>>>>>>>>> GossipStage 0 0
>>>>>>>>>>> 9965 0 0
>>>>>>>>>>> CacheCleanupExecutor 0 0
>>>>>>>>>>> 0 0 0
>>>>>>>>>>> MigrationStage 0 0
>>>>>>>>>>> 0 0 0
>>>>>>>>>>> MemoryMeter 0 0
>>>>>>>>>>> 24 0 0
>>>>>>>>>>> FlushWriter 0 0
>>>>>>>>>>> 27 0 1
>>>>>>>>>>> ValidationExecutor 0 0
>>>>>>>>>>> 0 0 0
>>>>>>>>>>> InternalResponseStage 0 0
>>>>>>>>>>> 0 0 0
>>>>>>>>>>> AntiEntropyStage 0 0
>>>>>>>>>>> 0 0 0
>>>>>>>>>>> MemtablePostFlusher 0 0
>>>>>>>>>>> 96 0 0
>>>>>>>>>>> MiscStage 0 0
>>>>>>>>>>> 0 0 0
>>>>>>>>>>> PendingRangeCalculator 0 0
>>>>>>>>>>> 10 0 0
>>>>>>>>>>> CompactionExecutor 1 1
>>>>>>>>>>> 73 0 0
>>>>>>>>>>> commitlog_archiver 0 0
>>>>>>>>>>> 0 0 0
>>>>>>>>>>> HintedHandoff 0 0
>>>>>>>>>>> 15 0 0
>>>>>>>>>>>
>>>>>>>>>>> Message type Dropped
>>>>>>>>>>> RANGE_SLICE 130
>>>>>>>>>>> READ_REPAIR 1
>>>>>>>>>>> PAGED_RANGE 0
>>>>>>>>>>> BINARY 0
>>>>>>>>>>> READ 31032
>>>>>>>>>>> MUTATION 865
>>>>>>>>>>> _TRACE 0
>>>>>>>>>>> REQUEST_RESPONSE 7
>>>>>>>>>>> COUNTER_MUTATION 0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1] `nodetool status` output:
>>>>>>>>>>>
>>>>>>>>>>> Status=Up/Down
>>>>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>>>>> -- Address Load Tokens Owns Host
>>>>>>>>>>> ID Rack
>>>>>>>>>>> UN A (Good) 252.37 GB 256 23.0%
>>>>>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>>>>>>>>> UN B (Good) 245.91 GB 256 24.4%
>>>>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>>>>>>>> UN C (Good) 254.79 GB 256 23.7%
>>>>>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>>>>>>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>>>>>>>
>>>>>>>>>>> [2] Disk read/write ops:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>>>>>>>
>>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>>>>>>>
>>>>>>>>>>> [3] Network in/out:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>>>>>>>
>>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: New node has high network and disk usage.
Posted by James Griffin <ja...@idioplatform.com>.
Hi all,
Just to let you know, we finally figured this out on Friday. It turns out
the new nodes had an older version of the kernel installed. Upgrading the
kernel solved our issues. For reference, the "bad" kernel was
3.2.0-75-virtual, upgrading to 3.2.0-86-virtual resolved the issue. We
still don't fully understand why this kernel bug didn't affect *all *our
nodes (in the end we had three nodes with that kernel, only two of them
exhibited this issue), but there we go.
Thanks everyone for your help
Cheers,
Griff
On 14 January 2016 at 15:14, James Griffin <ja...@idioplatform.com>
wrote:
> Hi Kai,
>
> Well observed - running `nodetool status` without specifying keyspace does
> report ~33% on each node. We have two keyspaces on this cluster - if I
> specify either of them the ownership reported by each node is 100%, so I
> believe the repair completed successfully.
>
> Best wishes,
>
> Griff
>
> [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
> CTO
> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 | Twitter:
> @imaginaryroots <http://twitter.com/imaginaryroots> | Skype: j.s.griffin
> idio helps major brands and publishers to build closer relationships with
> their customers and prospects by learning from their content consumption
> and acting on that insight. We call it Content Intelligence, and it
> integrates with your existing marketing technology to provide detailed
> customer interest profiles in real-time across all channels, and to
> personalize content into every channel for every customer. See
> http://idioplatform.com
> <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F> for
> more information.
>
> On 14 January 2016 at 15:08, Kai Wang <de...@gmail.com> wrote:
>
>> James,
>>
>> I may miss something. You mentioned your cluster had RF=3. Then why does
>> "nodetool status" show each node owns 1/3 of the data especially after a
>> full repair?
>>
>> On Thu, Jan 14, 2016 at 9:56 AM, James Griffin <
>> james.griffin@idioplatform.com> wrote:
>>
>>> Hi Kai,
>>>
>>> Below - nothing going on that I can see
>>>
>>> $ nodetool netstats
>>> Mode: NORMAL
>>> Not sending any streams.
>>> Read Repair Statistics:
>>> Attempted: 0
>>> Mismatch (Blocking): 0
>>> Mismatch (Background): 0
>>> Pool Name Active Pending Completed
>>> Commands n/a 0 6326
>>> Responses n/a 0 219356
>>>
>>>
>>>
>>> Best wishes,
>>>
>>> Griff
>>>
>>> [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
>>> CTO
>>> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 |
>>> Twitter: @imaginaryroots <http://twitter.com/imaginaryroots> | Skype:
>>> j.s.griffin
>>> idio helps major brands and publishers to build closer relationships
>>> with their customers and prospects by learning from their content
>>> consumption and acting on that insight. We call it Content Intelligence,
>>> and it integrates with your existing marketing technology to provide
>>> detailed customer interest profiles in real-time across all channels, and
>>> to personalize content into every channel for every customer. See
>>> http://idioplatform.com
>>> <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F> for
>>> more information.
>>>
>>> On 14 January 2016 at 14:22, Kai Wang <de...@gmail.com> wrote:
>>>
>>>> James,
>>>>
>>>> Can you post the result of "nodetool netstats" on the bad node?
>>>>
>>>> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
>>>> james.griffin@idioplatform.com> wrote:
>>>>
>>>>> A summary of what we've done this morning:
>>>>>
>>>>> - Noted that there are no GCInspector lines in system.log on bad
>>>>> node (there are GCInspector logs on other healthy nodes)
>>>>> - Turned on GC logging, noted that we had logs which stated out
>>>>> total time for which application threads were stopped was high - ~10s.
>>>>> - Not seeing failures or any kind (promotion or concurrent mark)
>>>>> - Attached Visual VM: noted that heap usage was very low (~5%
>>>>> usage and stable) and it didn't display hallmarks GC of activity. PermGen
>>>>> also very stable
>>>>> - Downloaded GC logs and examined in GC Viewer. Noted that:
>>>>> - We had lots of pauses (again around 10s), but no full GC.
>>>>> - From a 2,300s sample, just over 2,000s were spent with
>>>>> threads paused
>>>>> - Spotted many small GCs in the new space - realised that Xmn
>>>>> value was very low (200M against a heap size of 3750M). Increased Xmn to
>>>>> 937M - no change in server behaviour (high load, high reads/s on disk, high
>>>>> CPU wait)
>>>>>
>>>>> Current output of jstat:
>>>>>
>>>>> S0 S1 E O P YGC YGCT FGC FGCT GCT
>>>>> 2 0.00 45.20 12.82 26.84 76.21 2333 63.684 2 0.039
>>>>> 63.724
>>>>> 3 63.58 0.00 33.68 8.04 75.19 14 1.812 2 0.103
>>>>> 1.915
>>>>>
>>>>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise
>>>>> than 2 (which has normal load statistics).
>>>>>
>>>>> Anywhere else you can recommend we look?
>>>>>
>>>>> Griff
>>>>>
>>>>> On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in>
>>>>> wrote:
>>>>>
>>>>>> Ok. I saw dropped mutations on your cluster and full gc is a common
>>>>>> cause for that.
>>>>>> Can you just search the word GCInspector in system.log and share the
>>>>>> frequency of minor and full gc. Moreover, are you printing promotion
>>>>>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>>>>>> or concurrent mode failures?
>>>>>>
>>>>>> If you are on CMS, you need to fine tune your heap options to address
>>>>>> full gc.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Anuj
>>>>>>
>>>>>> Sent from Yahoo Mail on Android
>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>
>>>>>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>>>>>> <ja...@idioplatform.com> wrote:
>>>>>> I think I was incorrect in assuming GC wasn't an issue due to the
>>>>>> lack of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>>>>>> differences, though
>>>>>> comparing the startup flags on the two machines show the GC config is
>>>>>> identical.:
>>>>>>
>>>>>> $ jstat -gcutil
>>>>>> S0 S1 E O P YGC YGCT FGC FGCT
>>>>>> GCT
>>>>>> 2 5.08 0.00 55.72 18.24 59.90 25986 619.827 28 1.597
>>>>>> 621.424
>>>>>> 3 0.00 0.00 22.79 17.87 59.99 422600 11225.979 668 57.383
>>>>>> 11283.361
>>>>>>
>>>>>> Here's typical output for iostat on nodes 2 & 3 as well:
>>>>>>
>>>>>> $ iostat -dmx md0
>>>>>>
>>>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>>>>>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>>>>>> 2 md0 0.00 0.00 339.00 0.00 9.77 0.00
>>>>>> 59.00 0.00 0.00 0.00 0.00 0.00 0.00
>>>>>> 3 md0 0.00 0.00 2069.00 1.00 85.85 0.00
>>>>>> 84.94 0.00 0.00 0.00 0.00 0.00 0.00
>>>>>>
>>>>>> Griff
>>>>>>
>>>>>> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in>
>>>>>> wrote:
>>>>>>
>>>>>>> Node 2 has slightly higher data but that should be ok. Not sure how
>>>>>>> read ops are so high when no IO intensive activity such as repair and
>>>>>>> compaction is running on node 3.May be you can try investigating logs to
>>>>>>> see whats happening.
>>>>>>>
>>>>>>> Others on the mailing list could also share their views on the
>>>>>>> situation.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Anuj
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Sent from Yahoo Mail on Android
>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>
>>>>>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>>>>>> <ja...@idioplatform.com> wrote:
>>>>>>> Hi Anuj,
>>>>>>>
>>>>>>> Below is the output of nodetool status. The nodes were replaced
>>>>>>> following the instructions in Datastax documentation for replacing running
>>>>>>> nodes since the nodes were running fine, it was that the servers had been
>>>>>>> incorrectly initialised and they thus had less disk space. The status below
>>>>>>> shows 2 has significantly higher load, however as I say 2 is operating
>>>>>>> normally and is running compactions, so I guess that's not an issue?
>>>>>>>
>>>>>>> Datacenter: datacenter1
>>>>>>> =======================
>>>>>>> Status=Up/Down
>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>> -- Address Load Tokens Owns Host ID
>>>>>>> Rack
>>>>>>> UN 1 253.59 GB 256 31.7%
>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>>>> UN 2 302.23 GB 256 35.3%
>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>>> UN 3 265.02 GB 256 33.1%
>>>>>>> 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1
>>>>>>>
>>>>>>> Griff
>>>>>>>
>>>>>>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Revisiting the thread I can see that nodetool status had both good
>>>>>>>> and bad nodes at same time. How do you replace nodes? When you say bad
>>>>>>>> node..I understand that the node is no more usable even though Cassandra is
>>>>>>>> UP? Is that correct?
>>>>>>>>
>>>>>>>> If a node is in bad shape and not working, adding new node may
>>>>>>>> trigger streaming huge data from bad node too. Have you considered using
>>>>>>>> the procedure for replacing a dead node?
>>>>>>>>
>>>>>>>> Please share Latest nodetool status.
>>>>>>>>
>>>>>>>> nodetool output shared earlier:
>>>>>>>>
>>>>>>>> `nodetool status` output:
>>>>>>>>
>>>>>>>> Status=Up/Down
>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>> -- Address Load Tokens Owns Host
>>>>>>>> ID Rack
>>>>>>>> UN A (Good) 252.37 GB 256 23.0%
>>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>>>>>> UN B (Good) 245.91 GB 256 24.4%
>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>>>>> UN C (Good) 254.79 GB 256 23.7%
>>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>>>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Anuj
>>>>>>>>
>>>>>>>> Sent from Yahoo Mail on Android
>>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>>
>>>>>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>>>>>>> <ja...@idioplatform.com> wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> We’ve spent a few days running things but are in the same position.
>>>>>>>> To add some more flavour:
>>>>>>>>
>>>>>>>>
>>>>>>>> - We have a 3-node ring, replication factor = 3. We’ve been
>>>>>>>> running in this configuration for a few years without any real issues
>>>>>>>> - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>>>>>> brought in to replace two other nodes which had failed RAID0 configuration
>>>>>>>> and thus were lacking in disk space.
>>>>>>>> - When node 2 was brought into the ring, it exhibited high CPU
>>>>>>>> wait, IO and load metrics
>>>>>>>> - We subsequently brought 3 into the ring: as soon as 3 was
>>>>>>>> fully bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>>>>>> levels. Those same stats on 3, however, sky-rocketed
>>>>>>>> - We’ve confirmed configuration across all three nodes are
>>>>>>>> identical and in line with the recommended production settings
>>>>>>>> - We’ve run a full repair
>>>>>>>> - Node 2 is currently running compactions, 1 & 3 aren’t and
>>>>>>>> have no pending
>>>>>>>> - There is no GC happening from what I can see. Node 1 has a GC
>>>>>>>> log, but that’s not been written to since May last year
>>>>>>>>
>>>>>>>>
>>>>>>>> What we’re seeing at the moment is similar and normal stats on
>>>>>>>> nodes 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>>>>>>
>>>>>>>>
>>>>>>>> 1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>>>>>> 2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>>>>>> 3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>>>>>>
>>>>>>>>
>>>>>>>> Can you recommend any next steps?
>>>>>>>>
>>>>>>>> Griff
>>>>>>>>
>>>>>>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Vickrum,
>>>>>>>>>
>>>>>>>>> I would have proceeded with diagnosis as follows:
>>>>>>>>>
>>>>>>>>> 1. Analysis of sar report to check system health -cpu memory swap
>>>>>>>>> disk etc.
>>>>>>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>>>>>>
>>>>>>>>> 2. Make sure that all recommended Cassandra production settings
>>>>>>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>>>>>>
>>>>>>>>> 3.Run full Repair on bad node and check data size. Node is owner
>>>>>>>>> of maximum token range but has significant lower data.I doubt that
>>>>>>>>> bootstrapping happened properly.
>>>>>>>>>
>>>>>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>>>>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>>>>>>
>>>>>>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>>>>>>
>>>>>>>>> 6. Look for other common performance problems such as GC pauses to
>>>>>>>>> make sure that dropped mutations are not caused by GC pauses.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Anuj
>>>>>>>>>
>>>>>>>>> Sent from Yahoo Mail on Android
>>>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>>>
>>>>>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>>>>>>> <vi...@idioplatform.com> wrote:
>>>>>>>>> # nodetool compactionstats
>>>>>>>>> pending tasks: 22
>>>>>>>>> compaction type keyspace table
>>>>>>>>> completed total unit progress
>>>>>>>>> Compactionproduction_analytics
>>>>>>>>> interactions 240410213 161172668724 bytes 0.15%
>>>>>>>>>
>>>>>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>>>>>>> 120815385 226295183 bytes 53.39%
>>>>>>>>> Active compaction remaining time : 2h39m58s
>>>>>>>>>
>>>>>>>>> Worth mentioning that compactions haven't been running on this
>>>>>>>>> node particularly often. The node's been performing badly regardless of
>>>>>>>>> whether it's compacting or not.
>>>>>>>>>
>>>>>>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> What’s your output of `nodetool compactionstats`?
>>>>>>>>>>
>>>>>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <
>>>>>>>>>> vickrum.loi@idioplatform.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> We recently added a new node to our cluster in order to replace a
>>>>>>>>>> node that died (hardware failure we believe). For the next two weeks it had
>>>>>>>>>> high disk and network activity. We replaced the server, but it's happened
>>>>>>>>>> again. We've looked into memory allowances, disk performance, number of
>>>>>>>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>>>>>>>> issue.
>>>>>>>>>>
>>>>>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads,
>>>>>>>>>> in comparison to the rest of the cluster, but that's likely a symptom, not
>>>>>>>>>> a cause.
>>>>>>>>>>
>>>>>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The
>>>>>>>>>> bad node (D) has less data.
>>>>>>>>>>
>>>>>>>>>> Disk Activity[2] and Network activity[3] on this node is far
>>>>>>>>>> higher than the rest.
>>>>>>>>>>
>>>>>>>>>> The only other difference this node has to the rest of the
>>>>>>>>>> cluster is that its on the ext4 filesystem, whereas the rest are ext3, but
>>>>>>>>>> we've done plenty of testing there and can't see how that would affect
>>>>>>>>>> performance on this node so much.
>>>>>>>>>>
>>>>>>>>>> Nothing of note in system.log.
>>>>>>>>>>
>>>>>>>>>> What should our next step be in trying to diagnose this issue?
>>>>>>>>>>
>>>>>>>>>> Best wishes,
>>>>>>>>>> Vic
>>>>>>>>>>
>>>>>>>>>> [0] `nodetool tpstats` output:
>>>>>>>>>>
>>>>>>>>>> Good node:
>>>>>>>>>> Pool Name Active Pending
>>>>>>>>>> Completed Blocked All time blocked
>>>>>>>>>> ReadStage 0 0
>>>>>>>>>> 46311521 0 0
>>>>>>>>>> RequestResponseStage 0 0
>>>>>>>>>> 23817366 0 0
>>>>>>>>>> MutationStage 0 0
>>>>>>>>>> 47389269 0 0
>>>>>>>>>> ReadRepairStage 0 0
>>>>>>>>>> 11108 0 0
>>>>>>>>>> ReplicateOnWriteStage 0 0
>>>>>>>>>> 0 0 0
>>>>>>>>>> GossipStage 0 0
>>>>>>>>>> 5259908 0 0
>>>>>>>>>> CacheCleanupExecutor 0 0
>>>>>>>>>> 0 0 0
>>>>>>>>>> MigrationStage 0 0
>>>>>>>>>> 30 0 0
>>>>>>>>>> MemoryMeter 0 0
>>>>>>>>>> 16563 0 0
>>>>>>>>>> FlushWriter 0 0
>>>>>>>>>> 39637 0 26
>>>>>>>>>> ValidationExecutor 0 0
>>>>>>>>>> 19013 0 0
>>>>>>>>>> InternalResponseStage 0 0
>>>>>>>>>> 9 0 0
>>>>>>>>>> AntiEntropyStage 0 0
>>>>>>>>>> 38026 0 0
>>>>>>>>>> MemtablePostFlusher 0 0
>>>>>>>>>> 81740 0 0
>>>>>>>>>> MiscStage 0 0
>>>>>>>>>> 19196 0 0
>>>>>>>>>> PendingRangeCalculator 0 0
>>>>>>>>>> 23 0 0
>>>>>>>>>> CompactionExecutor 0 0
>>>>>>>>>> 61629 0 0
>>>>>>>>>> commitlog_archiver 0 0
>>>>>>>>>> 0 0 0
>>>>>>>>>> HintedHandoff 0 0
>>>>>>>>>> 63 0 0
>>>>>>>>>>
>>>>>>>>>> Message type Dropped
>>>>>>>>>> RANGE_SLICE 0
>>>>>>>>>> READ_REPAIR 0
>>>>>>>>>> PAGED_RANGE 0
>>>>>>>>>> BINARY 0
>>>>>>>>>> READ 640
>>>>>>>>>> MUTATION 0
>>>>>>>>>> _TRACE 0
>>>>>>>>>> REQUEST_RESPONSE 0
>>>>>>>>>> COUNTER_MUTATION 0
>>>>>>>>>>
>>>>>>>>>> Bad node:
>>>>>>>>>> Pool Name Active Pending
>>>>>>>>>> Completed Blocked All time blocked
>>>>>>>>>> ReadStage 32 113
>>>>>>>>>> 52216 0 0
>>>>>>>>>> RequestResponseStage 0 0
>>>>>>>>>> 4167 0 0
>>>>>>>>>> MutationStage 0 0
>>>>>>>>>> 127559 0 0
>>>>>>>>>> ReadRepairStage 0 0
>>>>>>>>>> 125 0 0
>>>>>>>>>> ReplicateOnWriteStage 0 0
>>>>>>>>>> 0 0 0
>>>>>>>>>> GossipStage 0 0
>>>>>>>>>> 9965 0 0
>>>>>>>>>> CacheCleanupExecutor 0 0
>>>>>>>>>> 0 0 0
>>>>>>>>>> MigrationStage 0 0
>>>>>>>>>> 0 0 0
>>>>>>>>>> MemoryMeter 0 0
>>>>>>>>>> 24 0 0
>>>>>>>>>> FlushWriter 0 0
>>>>>>>>>> 27 0 1
>>>>>>>>>> ValidationExecutor 0 0
>>>>>>>>>> 0 0 0
>>>>>>>>>> InternalResponseStage 0 0
>>>>>>>>>> 0 0 0
>>>>>>>>>> AntiEntropyStage 0 0
>>>>>>>>>> 0 0 0
>>>>>>>>>> MemtablePostFlusher 0 0
>>>>>>>>>> 96 0 0
>>>>>>>>>> MiscStage 0 0
>>>>>>>>>> 0 0 0
>>>>>>>>>> PendingRangeCalculator 0 0
>>>>>>>>>> 10 0 0
>>>>>>>>>> CompactionExecutor 1 1
>>>>>>>>>> 73 0 0
>>>>>>>>>> commitlog_archiver 0 0
>>>>>>>>>> 0 0 0
>>>>>>>>>> HintedHandoff 0 0
>>>>>>>>>> 15 0 0
>>>>>>>>>>
>>>>>>>>>> Message type Dropped
>>>>>>>>>> RANGE_SLICE 130
>>>>>>>>>> READ_REPAIR 1
>>>>>>>>>> PAGED_RANGE 0
>>>>>>>>>> BINARY 0
>>>>>>>>>> READ 31032
>>>>>>>>>> MUTATION 865
>>>>>>>>>> _TRACE 0
>>>>>>>>>> REQUEST_RESPONSE 7
>>>>>>>>>> COUNTER_MUTATION 0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1] `nodetool status` output:
>>>>>>>>>>
>>>>>>>>>> Status=Up/Down
>>>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>>>> -- Address Load Tokens Owns Host
>>>>>>>>>> ID Rack
>>>>>>>>>> UN A (Good) 252.37 GB 256 23.0%
>>>>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>>>>>>>> UN B (Good) 245.91 GB 256 24.4%
>>>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>>>>>>> UN C (Good) 254.79 GB 256 23.7%
>>>>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>>>>>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>>>>>>
>>>>>>>>>> [2] Disk read/write ops:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>>>>>>
>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>>>>>>
>>>>>>>>>> [3] Network in/out:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>>>>>>
>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: Cassandra 3.1.1 with respect to HeapSpace
Posted by Jean Tremblay <je...@zen-innovations.com>.
Thank you Sebastián!
On 15 Jan 2016, at 19:09 , Sebastian Estevez <se...@datastax.com>> wrote:
The recommended (and default when available) heap size for Cassandra is 8GB and for New size it's 100mb per core.
Your milage may vary based on workload, hardware etc.
There are also some alternative JVM tuning schools of thought. See cassandra-8150 (large heap) and CASSANDRA-7486 (G1GC).
All the best,
[datastax_logo.png]<http://www.datastax.com/>
Sebastián Estévez
Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com<ma...@datastax.com>
[linkedin.png]<https://www.linkedin.com/company/datastax> [facebook.png] <https://www.facebook.com/datastax> [twitter.png] <https://twitter.com/datastax> [g+.png] <https://plus.google.com/+Datastax/about> [https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IHrhsh-98hpi--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-bQ6h_mQ7k7iMybcR1dr1VhWgLMxcmg] <http://feeds.feedburner.com/datastax>
<http://goog_410786983/>
[http://learn.datastax.com/rs/059-YLZ-577/images/Gartner_728x90_Sig4.png]<http://www.datastax.com/gartner-magic-quadrant-odbms>
DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
On Fri, Jan 15, 2016 at 4:00 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Thank you Sebastián for your useful advice. I managed restarting the nodes, but I needed to delete all the commit logs, not only the last one specified. Nevertheless I’m back in business.
Would there be a better memory configuration to select for my nodes in a C* 3 cluster? Currently I use MAX_HEAP_SIZE=“6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
Thanks for your help.
Jean
On 15 Jan 2016, at 24:24 , Sebastian Estevez <se...@datastax.com>> wrote:
Try starting the other nodes. You may have to delete or mv the commitlog segment referenced in the error message for the node to come up since apparently it is corrupted.
All the best,
[datastax_logo.png]<http://www.datastax.com/>
Sebastián Estévez
Solutions Architect | 954 905 8615<tel:954%20905%208615> | sebastian.estevez@datastax.com<ma...@datastax.com>
[linkedin.png]<https://www.linkedin.com/company/datastax> [facebook.png] <https://www.facebook.com/datastax> [twitter.png] <https://twitter.com/datastax> [g+.png] <https://plus.google.com/+Datastax/about> [https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IHrhsh-98hpi--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-bQ6h_mQ7k7iMybcR1dr1VhWgLMxcmg] <http://feeds.feedburner.com/datastax>
<http://goog_410786983/>
[http://learn.datastax.com/rs/059-YLZ-577/images/Gartner_728x90_Sig4.png]<http://www.datastax.com/gartner-magic-quadrant-odbms>
DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
On Thu, Jan 14, 2016 at 1:00 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
How can I restart?
It blocks with the error listed below.
Are my memory settings good for my configuration?
On 14 Jan 2016, at 18:30, Jake Luciani <ja...@gmail.com>> wrote:
Yes you can restart without data loss.
Can you please include info about how much data you have loaded per node and perhaps what your schema looks like?
Thanks
On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Ok, I will open a ticket.
How could I restart my cluster without loosing everything ?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
Thanks
Jean
On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com>> wrote:
I don't think that's a known issue. Can you open a ticket at https://issues.apache.org/jira/browse/CASSANDRA and attach your schema along with the commitlog files and the mutation that was saved to /tmp?
On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,
I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
I use Cassandra 3.1.1.
I use the following setup for the memory:
MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="496M"
I have been loading a lot of data in this cluster over the last 24 hours. The system behaved I think very nicely. It was loading very fast, and giving excellent read time. There was no error messages until this one:
ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602 JVMStabilityInspector.java:139 - JVM state determined to be unstable. Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
at org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-3.1.1.jar:3.1.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.1.1.jar:3.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
4 nodes out of 5 crashed with this error message. Now when I want to restart the first node I have the following error;
ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Unexpected error deserializing mutation; saved to /tmp/mutation7465380878750576105dat. This may be caused by replaying a mutation against a table with the same name but incompatible schema. Exception follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to read a map
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677) [apache-cassandra-3.1.1.jar:3.1.1]
I can no longer start my nodes.
How can I restart my cluster?
Is this problem known?
Is there a better Cassandra 3 version which would behave better with respect to this problem?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
Thank you very much for your advice.
Kind regards
Jean
--
Tyler Hobbs
DataStax<http://datastax.com/>
--
http://twitter.com/tjake
Re: Cassandra 3.1.1 with respect to HeapSpace
Posted by Sebastian Estevez <se...@datastax.com>.
The recommended (and default when available) heap size for Cassandra is 8GB
and for New size it's 100mb per core.
Your milage may vary based on workload, hardware etc.
There are also some alternative JVM tuning schools of thought. See
cassandra-8150 (large heap) and CASSANDRA-7486 (G1GC).
All the best,
[image: datastax_logo.png] <http://www.datastax.com/>
Sebastián Estévez
Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>
<http://goog_410786983>
<http://www.datastax.com/gartner-magic-quadrant-odbms>
DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.
On Fri, Jan 15, 2016 at 4:00 AM, Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:
> Thank you Sebastián for your useful advice. I managed restarting the
> nodes, but I needed to delete all the commit logs, not only the last one
> specified. Nevertheless I’m back in business.
>
> Would there be a better memory configuration to select for my nodes in a
> C* 3 cluster? Currently I use MAX_HEAP_SIZE=“6G" HEAP_NEWSIZE=“496M” for
> a 16M RAM node.
>
> Thanks for your help.
>
> Jean
>
> On 15 Jan 2016, at 24:24 , Sebastian Estevez <
> sebastian.estevez@datastax.com> wrote:
>
>
> Try starting the other nodes. You may have to delete or mv the commitlog
> segment referenced in the error message for the node to come up since
> apparently it is corrupted.
>
> All the best,
>
> [image: datastax_logo.png] <http://www.datastax.com/>
> Sebastián Estévez
> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
> <http://goog_410786983/>
>
> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Thu, Jan 14, 2016 at 1:00 PM, Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
>> How can I restart?
>> It blocks with the error listed below.
>> Are my memory settings good for my configuration?
>>
>> On 14 Jan 2016, at 18:30, Jake Luciani <ja...@gmail.com> wrote:
>>
>> Yes you can restart without data loss.
>>
>> Can you please include info about how much data you have loaded per node
>> and perhaps what your schema looks like?
>>
>> Thanks
>>
>> On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <
>> jean.tremblay@zen-innovations.com> wrote:
>>
>>>
>>> Ok, I will open a ticket.
>>>
>>> How could I restart my cluster without loosing everything ?
>>> Would there be a better memory configuration to select for my nodes?
>>> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
>>>
>>> Thanks
>>>
>>> Jean
>>>
>>> On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com> wrote:
>>>
>>> I don't think that's a known issue. Can you open a ticket at
>>> https://issues.apache.org/jira/browse/CASSANDRA and attach your schema
>>> along with the commitlog files and the mutation that was saved to /tmp?
>>>
>>> On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <
>>> jean.tremblay@zen-innovations.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
>>>> I use Cassandra 3.1.1.
>>>> I use the following setup for the memory:
>>>> MAX_HEAP_SIZE="6G"
>>>> HEAP_NEWSIZE="496M"
>>>>
>>>> I have been loading a lot of data in this cluster over the last 24
>>>> hours. The system behaved I think very nicely. It was loading very fast,
>>>> and giving excellent read time. There was no error messages until this one:
>>>>
>>>>
>>>> ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602
>>>> JVMStabilityInspector.java:139 - JVM state determined to be unstable.
>>>> Exiting forcefully due to:
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
>>>> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
>>>> at
>>>> org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>> ~[na:1.8.0_65]
>>>> at
>>>> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
>>>>
>>>> 4 nodes out of 5 crashed with this error message. Now when I want to
>>>> restart the first node I have the following error;
>>>>
>>>> ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 -
>>>> Exiting due to error while processing commit log during initialization.
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
>>>> Unexpected error deserializing mutation; saved to
>>>> /tmp/mutation7465380878750576105dat. This may be caused by replaying a
>>>> mutation against a table with the same name but incompatible schema.
>>>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>>>> enough bytes to read a map
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>>
>>>> I can no longer start my nodes.
>>>>
>>>> How can I restart my cluster?
>>>> Is this problem known?
>>>> Is there a better Cassandra 3 version which would behave better with
>>>> respect to this problem?
>>>> Would there be a better memory configuration to select for my nodes?
>>>> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM
>>>> node.
>>>>
>>>>
>>>> Thank you very much for your advice.
>>>>
>>>> Kind regards
>>>>
>>>> Jean
>>>>
>>>
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax <http://datastax.com/>
>>>
>>>
>>
>>
>> --
>> http://twitter.com/tjake
>>
>>
>
>
Re: Cassandra 3.1.1 with respect to HeapSpace
Posted by Jean Tremblay <je...@zen-innovations.com>.
Thank you Sebastián for your useful advice. I managed restarting the nodes, but I needed to delete all the commit logs, not only the last one specified. Nevertheless I’m back in business.
Would there be a better memory configuration to select for my nodes in a C* 3 cluster? Currently I use MAX_HEAP_SIZE=“6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
Thanks for your help.
Jean
On 15 Jan 2016, at 24:24 , Sebastian Estevez <se...@datastax.com>> wrote:
Try starting the other nodes. You may have to delete or mv the commitlog segment referenced in the error message for the node to come up since apparently it is corrupted.
All the best,
[datastax_logo.png]<http://www.datastax.com/>
Sebastián Estévez
Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com<ma...@datastax.com>
[linkedin.png]<https://www.linkedin.com/company/datastax> [facebook.png] <https://www.facebook.com/datastax> [twitter.png] <https://twitter.com/datastax> [g+.png] <https://plus.google.com/+Datastax/about> [https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IHrhsh-98hpi--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-bQ6h_mQ7k7iMybcR1dr1VhWgLMxcmg] <http://feeds.feedburner.com/datastax>
<http://goog_410786983/>
[http://learn.datastax.com/rs/059-YLZ-577/images/Gartner_728x90_Sig4.png]<http://www.datastax.com/gartner-magic-quadrant-odbms>
DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
On Thu, Jan 14, 2016 at 1:00 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
How can I restart?
It blocks with the error listed below.
Are my memory settings good for my configuration?
On 14 Jan 2016, at 18:30, Jake Luciani <ja...@gmail.com>> wrote:
Yes you can restart without data loss.
Can you please include info about how much data you have loaded per node and perhaps what your schema looks like?
Thanks
On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Ok, I will open a ticket.
How could I restart my cluster without loosing everything ?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
Thanks
Jean
On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com>> wrote:
I don't think that's a known issue. Can you open a ticket at https://issues.apache.org/jira/browse/CASSANDRA and attach your schema along with the commitlog files and the mutation that was saved to /tmp?
On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,
I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
I use Cassandra 3.1.1.
I use the following setup for the memory:
MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="496M"
I have been loading a lot of data in this cluster over the last 24 hours. The system behaved I think very nicely. It was loading very fast, and giving excellent read time. There was no error messages until this one:
ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602 JVMStabilityInspector.java:139 - JVM state determined to be unstable. Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
at org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-3.1.1.jar:3.1.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.1.1.jar:3.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
4 nodes out of 5 crashed with this error message. Now when I want to restart the first node I have the following error;
ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Unexpected error deserializing mutation; saved to /tmp/mutation7465380878750576105dat. This may be caused by replaying a mutation against a table with the same name but incompatible schema. Exception follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to read a map
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677) [apache-cassandra-3.1.1.jar:3.1.1]
I can no longer start my nodes.
How can I restart my cluster?
Is this problem known?
Is there a better Cassandra 3 version which would behave better with respect to this problem?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
Thank you very much for your advice.
Kind regards
Jean
--
Tyler Hobbs
DataStax<http://datastax.com/>
--
http://twitter.com/tjake
Re: Cassandra 3.1.1 with respect to HeapSpace
Posted by Sebastian Estevez <se...@datastax.com>.
Try starting the other nodes. You may have to delete or mv the commitlog
segment referenced in the error message for the node to come up since
apparently it is corrupted.
All the best,
[image: datastax_logo.png] <http://www.datastax.com/>
Sebastián Estévez
Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>
<http://goog_410786983>
<http://www.datastax.com/gartner-magic-quadrant-odbms>
DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.
On Thu, Jan 14, 2016 at 1:00 PM, Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:
> How can I restart?
> It blocks with the error listed below.
> Are my memory settings good for my configuration?
>
> On 14 Jan 2016, at 18:30, Jake Luciani <ja...@gmail.com> wrote:
>
> Yes you can restart without data loss.
>
> Can you please include info about how much data you have loaded per node
> and perhaps what your schema looks like?
>
> Thanks
>
> On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
>>
>> Ok, I will open a ticket.
>>
>> How could I restart my cluster without loosing everything ?
>> Would there be a better memory configuration to select for my nodes?
>> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
>>
>> Thanks
>>
>> Jean
>>
>> On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com> wrote:
>>
>> I don't think that's a known issue. Can you open a ticket at
>> https://issues.apache.org/jira/browse/CASSANDRA and attach your schema
>> along with the commitlog files and the mutation that was saved to /tmp?
>>
>> On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <
>> jean.tremblay@zen-innovations.com> wrote:
>>
>>> Hi,
>>>
>>> I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
>>> I use Cassandra 3.1.1.
>>> I use the following setup for the memory:
>>> MAX_HEAP_SIZE="6G"
>>> HEAP_NEWSIZE="496M"
>>>
>>> I have been loading a lot of data in this cluster over the last 24
>>> hours. The system behaved I think very nicely. It was loading very fast,
>>> and giving excellent read time. There was no error messages until this one:
>>>
>>>
>>> ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602
>>> JVMStabilityInspector.java:139 - JVM state determined to be unstable.
>>> Exiting forcefully due to:
>>> java.lang.OutOfMemoryError: Java heap space
>>> at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
>>> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
>>> at
>>> org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> ~[na:1.8.0_65]
>>> at
>>> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
>>>
>>> 4 nodes out of 5 crashed with this error message. Now when I want to
>>> restart the first node I have the following error;
>>>
>>> ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 -
>>> Exiting due to error while processing commit log during initialization.
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
>>> Unexpected error deserializing mutation; saved to
>>> /tmp/mutation7465380878750576105dat. This may be caused by replaying a
>>> mutation against a table with the same name but incompatible schema.
>>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>>> enough bytes to read a map
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>
>>> I can no longer start my nodes.
>>>
>>> How can I restart my cluster?
>>> Is this problem known?
>>> Is there a better Cassandra 3 version which would behave better with
>>> respect to this problem?
>>> Would there be a better memory configuration to select for my nodes?
>>> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM
>>> node.
>>>
>>>
>>> Thank you very much for your advice.
>>>
>>> Kind regards
>>>
>>> Jean
>>>
>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>>
>
>
> --
> http://twitter.com/tjake
>
>
Re: Cassandra 3.1.1 with respect to HeapSpace
Posted by Jean Tremblay <je...@zen-innovations.com>.
How can I restart?
It blocks with the error listed below.
Are my memory settings good for my configuration?
On 14 Jan 2016, at 18:30, Jake Luciani <ja...@gmail.com>> wrote:
Yes you can restart without data loss.
Can you please include info about how much data you have loaded per node and perhaps what your schema looks like?
Thanks
On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Ok, I will open a ticket.
How could I restart my cluster without loosing everything ?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
Thanks
Jean
On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com>> wrote:
I don't think that's a known issue. Can you open a ticket at https://issues.apache.org/jira/browse/CASSANDRA and attach your schema along with the commitlog files and the mutation that was saved to /tmp?
On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,
I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
I use Cassandra 3.1.1.
I use the following setup for the memory:
MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="496M"
I have been loading a lot of data in this cluster over the last 24 hours. The system behaved I think very nicely. It was loading very fast, and giving excellent read time. There was no error messages until this one:
ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602 JVMStabilityInspector.java:139 - JVM state determined to be unstable. Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
at org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-3.1.1.jar:3.1.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.1.1.jar:3.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
4 nodes out of 5 crashed with this error message. Now when I want to restart the first node I have the following error;
ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Unexpected error deserializing mutation; saved to /tmp/mutation7465380878750576105dat. This may be caused by replaying a mutation against a table with the same name but incompatible schema. Exception follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to read a map
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677) [apache-cassandra-3.1.1.jar:3.1.1]
I can no longer start my nodes.
How can I restart my cluster?
Is this problem known?
Is there a better Cassandra 3 version which would behave better with respect to this problem?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
Thank you very much for your advice.
Kind regards
Jean
--
Tyler Hobbs
DataStax<http://datastax.com/>
--
http://twitter.com/tjake
Re: Cassandra 3.1.1 with respect to HeapSpace
Posted by Jake Luciani <ja...@gmail.com>.
Yes you can restart without data loss.
Can you please include info about how much data you have loaded per node
and perhaps what your schema looks like?
Thanks
On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:
>
> Ok, I will open a ticket.
>
> How could I restart my cluster without loosing everything ?
> Would there be a better memory configuration to select for my nodes?
> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
>
> Thanks
>
> Jean
>
> On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com> wrote:
>
> I don't think that's a known issue. Can you open a ticket at
> https://issues.apache.org/jira/browse/CASSANDRA and attach your schema
> along with the commitlog files and the mutation that was saved to /tmp?
>
> On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
>> Hi,
>>
>> I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
>> I use Cassandra 3.1.1.
>> I use the following setup for the memory:
>> MAX_HEAP_SIZE="6G"
>> HEAP_NEWSIZE="496M"
>>
>> I have been loading a lot of data in this cluster over the last 24 hours.
>> The system behaved I think very nicely. It was loading very fast, and
>> giving excellent read time. There was no error messages until this one:
>>
>>
>> ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602
>> JVMStabilityInspector.java:139 - JVM state determined to be unstable.
>> Exiting forcefully due to:
>> java.lang.OutOfMemoryError: Java heap space
>> at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
>> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
>> at
>> org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> ~[na:1.8.0_65]
>> at
>> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
>>
>> 4 nodes out of 5 crashed with this error message. Now when I want to
>> restart the first node I have the following error;
>>
>> ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 -
>> Exiting due to error while processing commit log during initialization.
>> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
>> Unexpected error deserializing mutation; saved to
>> /tmp/mutation7465380878750576105dat. This may be caused by replaying a
>> mutation against a table with the same name but incompatible schema.
>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>> enough bytes to read a map
>> at
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>>
>> I can no longer start my nodes.
>>
>> How can I restart my cluster?
>> Is this problem known?
>> Is there a better Cassandra 3 version which would behave better with
>> respect to this problem?
>> Would there be a better memory configuration to select for my nodes?
>> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM
>> node.
>>
>>
>> Thank you very much for your advice.
>>
>> Kind regards
>>
>> Jean
>>
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
>
--
http://twitter.com/tjake
Re: Cassandra 3.1.1 with respect to HeapSpace
Posted by Jean Tremblay <je...@zen-innovations.com>.
Ok, I will open a ticket.
How could I restart my cluster without loosing everything ?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
Thanks
Jean
On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com>> wrote:
I don't think that's a known issue. Can you open a ticket at https://issues.apache.org/jira/browse/CASSANDRA and attach your schema along with the commitlog files and the mutation that was saved to /tmp?
On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,
I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
I use Cassandra 3.1.1.
I use the following setup for the memory:
MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="496M"
I have been loading a lot of data in this cluster over the last 24 hours. The system behaved I think very nicely. It was loading very fast, and giving excellent read time. There was no error messages until this one:
ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602 JVMStabilityInspector.java:139 - JVM state determined to be unstable. Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
at org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-3.1.1.jar:3.1.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.1.1.jar:3.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
4 nodes out of 5 crashed with this error message. Now when I want to restart the first node I have the following error;
ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Unexpected error deserializing mutation; saved to /tmp/mutation7465380878750576105dat. This may be caused by replaying a mutation against a table with the same name but incompatible schema. Exception follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to read a map
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677) [apache-cassandra-3.1.1.jar:3.1.1]
I can no longer start my nodes.
How can I restart my cluster?
Is this problem known?
Is there a better Cassandra 3 version which would behave better with respect to this problem?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
Thank you very much for your advice.
Kind regards
Jean
--
Tyler Hobbs
DataStax<http://datastax.com/>
Re: Cassandra 3.1.1 with respect to HeapSpace
Posted by Tyler Hobbs <ty...@datastax.com>.
I don't think that's a known issue. Can you open a ticket at
https://issues.apache.org/jira/browse/CASSANDRA and attach your schema
along with the commitlog files and the mutation that was saved to /tmp?
On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:
> Hi,
>
> I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
> I use Cassandra 3.1.1.
> I use the following setup for the memory:
> MAX_HEAP_SIZE="6G"
> HEAP_NEWSIZE="496M"
>
> I have been loading a lot of data in this cluster over the last 24 hours.
> The system behaved I think very nicely. It was loading very fast, and
> giving excellent read time. There was no error messages until this one:
>
>
> ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602
> JVMStabilityInspector.java:139 - JVM state determined to be unstable.
> Exiting forcefully due to:
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
> at
> org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_65]
> at
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
>
> 4 nodes out of 5 crashed with this error message. Now when I want to
> restart the first node I have the following error;
>
> ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 -
> Exiting due to error while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
> Unexpected error deserializing mutation; saved to
> /tmp/mutation7465380878750576105dat. This may be caused by replaying a
> mutation against a table with the same name but incompatible schema.
> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
> enough bytes to read a map
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677)
> [apache-cassandra-3.1.1.jar:3.1.1]
>
> I can no longer start my nodes.
>
> How can I restart my cluster?
> Is this problem known?
> Is there a better Cassandra 3 version which would behave better with
> respect to this problem?
> Would there be a better memory configuration to select for my nodes?
> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
>
>
> Thank you very much for your advice.
>
> Kind regards
>
> Jean
>
--
Tyler Hobbs
DataStax <http://datastax.com/>
Cassandra 3.1.1 with respect to HeapSpace
Posted by Jean Tremblay <je...@zen-innovations.com>.
Hi,
I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
I use Cassandra 3.1.1.
I use the following setup for the memory:
MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="496M"
I have been loading a lot of data in this cluster over the last 24 hours. The system behaved I think very nicely. It was loading very fast, and giving excellent read time. There was no error messages until this one:
ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602 JVMStabilityInspector.java:139 - JVM state determined to be unstable. Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
at org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-3.1.1.jar:3.1.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.1.1.jar:3.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
4 nodes out of 5 crashed with this error message. Now when I want to restart the first node I have the following error;
ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Unexpected error deserializing mutation; saved to /tmp/mutation7465380878750576105dat. This may be caused by replaying a mutation against a table with the same name but incompatible schema. Exception follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to read a map
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677) [apache-cassandra-3.1.1.jar:3.1.1]
I can no longer start my nodes.
How can I restart my cluster?
Is this problem known?
Is there a better Cassandra 3 version which would behave better with respect to this problem?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
Thank you very much for your advice.
Kind regards
Jean
Re: New node has high network and disk usage.
Posted by James Griffin <ja...@idioplatform.com>.
Hi Kai,
Well observed - running `nodetool status` without specifying keyspace does
report ~33% on each node. We have two keyspaces on this cluster - if I
specify either of them the ownership reported by each node is 100%, so I
believe the repair completed successfully.
Best wishes,
Griff
[image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
CTO
Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 | Twitter:
@imaginaryroots <http://twitter.com/imaginaryroots> | Skype: j.s.griffin
idio helps major brands and publishers to build closer relationships with
their customers and prospects by learning from their content consumption
and acting on that insight. We call it Content Intelligence, and it
integrates with your existing marketing technology to provide detailed
customer interest profiles in real-time across all channels, and to
personalize content into every channel for every customer. See
http://idioplatform.com
<https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F>
for
more information.
On 14 January 2016 at 15:08, Kai Wang <de...@gmail.com> wrote:
> James,
>
> I may miss something. You mentioned your cluster had RF=3. Then why does
> "nodetool status" show each node owns 1/3 of the data especially after a
> full repair?
>
> On Thu, Jan 14, 2016 at 9:56 AM, James Griffin <
> james.griffin@idioplatform.com> wrote:
>
>> Hi Kai,
>>
>> Below - nothing going on that I can see
>>
>> $ nodetool netstats
>> Mode: NORMAL
>> Not sending any streams.
>> Read Repair Statistics:
>> Attempted: 0
>> Mismatch (Blocking): 0
>> Mismatch (Background): 0
>> Pool Name Active Pending Completed
>> Commands n/a 0 6326
>> Responses n/a 0 219356
>>
>>
>>
>> Best wishes,
>>
>> Griff
>>
>> [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
>> CTO
>> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 |
>> Twitter: @imaginaryroots <http://twitter.com/imaginaryroots> | Skype:
>> j.s.griffin
>> idio helps major brands and publishers to build closer relationships with
>> their customers and prospects by learning from their content consumption
>> and acting on that insight. We call it Content Intelligence, and it
>> integrates with your existing marketing technology to provide detailed
>> customer interest profiles in real-time across all channels, and to
>> personalize content into every channel for every customer. See
>> http://idioplatform.com
>> <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F> for
>> more information.
>>
>> On 14 January 2016 at 14:22, Kai Wang <de...@gmail.com> wrote:
>>
>>> James,
>>>
>>> Can you post the result of "nodetool netstats" on the bad node?
>>>
>>> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
>>> james.griffin@idioplatform.com> wrote:
>>>
>>>> A summary of what we've done this morning:
>>>>
>>>> - Noted that there are no GCInspector lines in system.log on bad
>>>> node (there are GCInspector logs on other healthy nodes)
>>>> - Turned on GC logging, noted that we had logs which stated out
>>>> total time for which application threads were stopped was high - ~10s.
>>>> - Not seeing failures or any kind (promotion or concurrent mark)
>>>> - Attached Visual VM: noted that heap usage was very low (~5% usage
>>>> and stable) and it didn't display hallmarks GC of activity. PermGen also
>>>> very stable
>>>> - Downloaded GC logs and examined in GC Viewer. Noted that:
>>>> - We had lots of pauses (again around 10s), but no full GC.
>>>> - From a 2,300s sample, just over 2,000s were spent with threads
>>>> paused
>>>> - Spotted many small GCs in the new space - realised that Xmn
>>>> value was very low (200M against a heap size of 3750M). Increased Xmn to
>>>> 937M - no change in server behaviour (high load, high reads/s on disk, high
>>>> CPU wait)
>>>>
>>>> Current output of jstat:
>>>>
>>>> S0 S1 E O P YGC YGCT FGC FGCT GCT
>>>> 2 0.00 45.20 12.82 26.84 76.21 2333 63.684 2 0.039
>>>> 63.724
>>>> 3 63.58 0.00 33.68 8.04 75.19 14 1.812 2 0.103
>>>> 1.915
>>>>
>>>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise
>>>> than 2 (which has normal load statistics).
>>>>
>>>> Anywhere else you can recommend we look?
>>>>
>>>> Griff
>>>>
>>>> On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in>
>>>> wrote:
>>>>
>>>>> Ok. I saw dropped mutations on your cluster and full gc is a common
>>>>> cause for that.
>>>>> Can you just search the word GCInspector in system.log and share the
>>>>> frequency of minor and full gc. Moreover, are you printing promotion
>>>>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>>>>> or concurrent mode failures?
>>>>>
>>>>> If you are on CMS, you need to fine tune your heap options to address
>>>>> full gc.
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>> Anuj
>>>>>
>>>>> Sent from Yahoo Mail on Android
>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>
>>>>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>>>>> <ja...@idioplatform.com> wrote:
>>>>> I think I was incorrect in assuming GC wasn't an issue due to the lack
>>>>> of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>>>>> differences, though
>>>>> comparing the startup flags on the two machines show the GC config is
>>>>> identical.:
>>>>>
>>>>> $ jstat -gcutil
>>>>> S0 S1 E O P YGC YGCT FGC FGCT
>>>>> GCT
>>>>> 2 5.08 0.00 55.72 18.24 59.90 25986 619.827 28 1.597
>>>>> 621.424
>>>>> 3 0.00 0.00 22.79 17.87 59.99 422600 11225.979 668 57.383
>>>>> 11283.361
>>>>>
>>>>> Here's typical output for iostat on nodes 2 & 3 as well:
>>>>>
>>>>> $ iostat -dmx md0
>>>>>
>>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>>>>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>>>>> 2 md0 0.00 0.00 339.00 0.00 9.77 0.00
>>>>> 59.00 0.00 0.00 0.00 0.00 0.00 0.00
>>>>> 3 md0 0.00 0.00 2069.00 1.00 85.85 0.00
>>>>> 84.94 0.00 0.00 0.00 0.00 0.00 0.00
>>>>>
>>>>> Griff
>>>>>
>>>>> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in>
>>>>> wrote:
>>>>>
>>>>>> Node 2 has slightly higher data but that should be ok. Not sure how
>>>>>> read ops are so high when no IO intensive activity such as repair and
>>>>>> compaction is running on node 3.May be you can try investigating logs to
>>>>>> see whats happening.
>>>>>>
>>>>>> Others on the mailing list could also share their views on the
>>>>>> situation.
>>>>>>
>>>>>> Thanks
>>>>>> Anuj
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sent from Yahoo Mail on Android
>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>
>>>>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>>>>> <ja...@idioplatform.com> wrote:
>>>>>> Hi Anuj,
>>>>>>
>>>>>> Below is the output of nodetool status. The nodes were replaced
>>>>>> following the instructions in Datastax documentation for replacing running
>>>>>> nodes since the nodes were running fine, it was that the servers had been
>>>>>> incorrectly initialised and they thus had less disk space. The status below
>>>>>> shows 2 has significantly higher load, however as I say 2 is operating
>>>>>> normally and is running compactions, so I guess that's not an issue?
>>>>>>
>>>>>> Datacenter: datacenter1
>>>>>> =======================
>>>>>> Status=Up/Down
>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>> -- Address Load Tokens Owns Host ID
>>>>>> Rack
>>>>>> UN 1 253.59 GB 256 31.7%
>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>>> UN 2 302.23 GB 256 35.3%
>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>> UN 3 265.02 GB 256 33.1%
>>>>>> 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1
>>>>>>
>>>>>> Griff
>>>>>>
>>>>>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Revisiting the thread I can see that nodetool status had both good
>>>>>>> and bad nodes at same time. How do you replace nodes? When you say bad
>>>>>>> node..I understand that the node is no more usable even though Cassandra is
>>>>>>> UP? Is that correct?
>>>>>>>
>>>>>>> If a node is in bad shape and not working, adding new node may
>>>>>>> trigger streaming huge data from bad node too. Have you considered using
>>>>>>> the procedure for replacing a dead node?
>>>>>>>
>>>>>>> Please share Latest nodetool status.
>>>>>>>
>>>>>>> nodetool output shared earlier:
>>>>>>>
>>>>>>> `nodetool status` output:
>>>>>>>
>>>>>>> Status=Up/Down
>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>> -- Address Load Tokens Owns Host
>>>>>>> ID Rack
>>>>>>> UN A (Good) 252.37 GB 256 23.0%
>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>>>>> UN B (Good) 245.91 GB 256 24.4%
>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>>>> UN C (Good) 254.79 GB 256 23.7%
>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Anuj
>>>>>>>
>>>>>>> Sent from Yahoo Mail on Android
>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>
>>>>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>>>>>> <ja...@idioplatform.com> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> We’ve spent a few days running things but are in the same position.
>>>>>>> To add some more flavour:
>>>>>>>
>>>>>>>
>>>>>>> - We have a 3-node ring, replication factor = 3. We’ve been
>>>>>>> running in this configuration for a few years without any real issues
>>>>>>> - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>>>>> brought in to replace two other nodes which had failed RAID0 configuration
>>>>>>> and thus were lacking in disk space.
>>>>>>> - When node 2 was brought into the ring, it exhibited high CPU
>>>>>>> wait, IO and load metrics
>>>>>>> - We subsequently brought 3 into the ring: as soon as 3 was
>>>>>>> fully bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>>>>> levels. Those same stats on 3, however, sky-rocketed
>>>>>>> - We’ve confirmed configuration across all three nodes are
>>>>>>> identical and in line with the recommended production settings
>>>>>>> - We’ve run a full repair
>>>>>>> - Node 2 is currently running compactions, 1 & 3 aren’t and have
>>>>>>> no pending
>>>>>>> - There is no GC happening from what I can see. Node 1 has a GC
>>>>>>> log, but that’s not been written to since May last year
>>>>>>>
>>>>>>>
>>>>>>> What we’re seeing at the moment is similar and normal stats on nodes
>>>>>>> 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>>>>>
>>>>>>>
>>>>>>> 1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>>>>> 2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>>>>> 3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>>>>>
>>>>>>>
>>>>>>> Can you recommend any next steps?
>>>>>>>
>>>>>>> Griff
>>>>>>>
>>>>>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Vickrum,
>>>>>>>>
>>>>>>>> I would have proceeded with diagnosis as follows:
>>>>>>>>
>>>>>>>> 1. Analysis of sar report to check system health -cpu memory swap
>>>>>>>> disk etc.
>>>>>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>>>>>
>>>>>>>> 2. Make sure that all recommended Cassandra production settings
>>>>>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>>>>>
>>>>>>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>>>>>>> maximum token range but has significant lower data.I doubt that
>>>>>>>> bootstrapping happened properly.
>>>>>>>>
>>>>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>>>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>>>>>
>>>>>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>>>>>
>>>>>>>> 6. Look for other common performance problems such as GC pauses to
>>>>>>>> make sure that dropped mutations are not caused by GC pauses.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Anuj
>>>>>>>>
>>>>>>>> Sent from Yahoo Mail on Android
>>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>>
>>>>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>>>>>> <vi...@idioplatform.com> wrote:
>>>>>>>> # nodetool compactionstats
>>>>>>>> pending tasks: 22
>>>>>>>> compaction type keyspace table
>>>>>>>> completed total unit progress
>>>>>>>> Compactionproduction_analytics interactions
>>>>>>>> 240410213 161172668724 bytes 0.15%
>>>>>>>>
>>>>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>>>>>> 120815385 226295183 bytes 53.39%
>>>>>>>> Active compaction remaining time : 2h39m58s
>>>>>>>>
>>>>>>>> Worth mentioning that compactions haven't been running on this node
>>>>>>>> particularly often. The node's been performing badly regardless of whether
>>>>>>>> it's compacting or not.
>>>>>>>>
>>>>>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> What’s your output of `nodetool compactionstats`?
>>>>>>>>>
>>>>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <
>>>>>>>>> vickrum.loi@idioplatform.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> We recently added a new node to our cluster in order to replace a
>>>>>>>>> node that died (hardware failure we believe). For the next two weeks it had
>>>>>>>>> high disk and network activity. We replaced the server, but it's happened
>>>>>>>>> again. We've looked into memory allowances, disk performance, number of
>>>>>>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>>>>>>> issue.
>>>>>>>>>
>>>>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads,
>>>>>>>>> in comparison to the rest of the cluster, but that's likely a symptom, not
>>>>>>>>> a cause.
>>>>>>>>>
>>>>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The
>>>>>>>>> bad node (D) has less data.
>>>>>>>>>
>>>>>>>>> Disk Activity[2] and Network activity[3] on this node is far
>>>>>>>>> higher than the rest.
>>>>>>>>>
>>>>>>>>> The only other difference this node has to the rest of the cluster
>>>>>>>>> is that its on the ext4 filesystem, whereas the rest are ext3, but we've
>>>>>>>>> done plenty of testing there and can't see how that would affect
>>>>>>>>> performance on this node so much.
>>>>>>>>>
>>>>>>>>> Nothing of note in system.log.
>>>>>>>>>
>>>>>>>>> What should our next step be in trying to diagnose this issue?
>>>>>>>>>
>>>>>>>>> Best wishes,
>>>>>>>>> Vic
>>>>>>>>>
>>>>>>>>> [0] `nodetool tpstats` output:
>>>>>>>>>
>>>>>>>>> Good node:
>>>>>>>>> Pool Name Active Pending Completed
>>>>>>>>> Blocked All time blocked
>>>>>>>>> ReadStage 0 0
>>>>>>>>> 46311521 0 0
>>>>>>>>> RequestResponseStage 0 0
>>>>>>>>> 23817366 0 0
>>>>>>>>> MutationStage 0 0
>>>>>>>>> 47389269 0 0
>>>>>>>>> ReadRepairStage 0 0
>>>>>>>>> 11108 0 0
>>>>>>>>> ReplicateOnWriteStage 0 0
>>>>>>>>> 0 0 0
>>>>>>>>> GossipStage 0 0
>>>>>>>>> 5259908 0 0
>>>>>>>>> CacheCleanupExecutor 0 0
>>>>>>>>> 0 0 0
>>>>>>>>> MigrationStage 0 0
>>>>>>>>> 30 0 0
>>>>>>>>> MemoryMeter 0 0
>>>>>>>>> 16563 0 0
>>>>>>>>> FlushWriter 0 0
>>>>>>>>> 39637 0 26
>>>>>>>>> ValidationExecutor 0 0
>>>>>>>>> 19013 0 0
>>>>>>>>> InternalResponseStage 0 0
>>>>>>>>> 9 0 0
>>>>>>>>> AntiEntropyStage 0 0
>>>>>>>>> 38026 0 0
>>>>>>>>> MemtablePostFlusher 0 0
>>>>>>>>> 81740 0 0
>>>>>>>>> MiscStage 0 0
>>>>>>>>> 19196 0 0
>>>>>>>>> PendingRangeCalculator 0 0
>>>>>>>>> 23 0 0
>>>>>>>>> CompactionExecutor 0 0
>>>>>>>>> 61629 0 0
>>>>>>>>> commitlog_archiver 0 0
>>>>>>>>> 0 0 0
>>>>>>>>> HintedHandoff 0 0
>>>>>>>>> 63 0 0
>>>>>>>>>
>>>>>>>>> Message type Dropped
>>>>>>>>> RANGE_SLICE 0
>>>>>>>>> READ_REPAIR 0
>>>>>>>>> PAGED_RANGE 0
>>>>>>>>> BINARY 0
>>>>>>>>> READ 640
>>>>>>>>> MUTATION 0
>>>>>>>>> _TRACE 0
>>>>>>>>> REQUEST_RESPONSE 0
>>>>>>>>> COUNTER_MUTATION 0
>>>>>>>>>
>>>>>>>>> Bad node:
>>>>>>>>> Pool Name Active Pending Completed
>>>>>>>>> Blocked All time blocked
>>>>>>>>> ReadStage 32 113
>>>>>>>>> 52216 0 0
>>>>>>>>> RequestResponseStage 0 0
>>>>>>>>> 4167 0 0
>>>>>>>>> MutationStage 0 0
>>>>>>>>> 127559 0 0
>>>>>>>>> ReadRepairStage 0 0
>>>>>>>>> 125 0 0
>>>>>>>>> ReplicateOnWriteStage 0 0
>>>>>>>>> 0 0 0
>>>>>>>>> GossipStage 0 0
>>>>>>>>> 9965 0 0
>>>>>>>>> CacheCleanupExecutor 0 0
>>>>>>>>> 0 0 0
>>>>>>>>> MigrationStage 0 0
>>>>>>>>> 0 0 0
>>>>>>>>> MemoryMeter 0 0
>>>>>>>>> 24 0 0
>>>>>>>>> FlushWriter 0 0
>>>>>>>>> 27 0 1
>>>>>>>>> ValidationExecutor 0 0
>>>>>>>>> 0 0 0
>>>>>>>>> InternalResponseStage 0 0
>>>>>>>>> 0 0 0
>>>>>>>>> AntiEntropyStage 0 0
>>>>>>>>> 0 0 0
>>>>>>>>> MemtablePostFlusher 0 0
>>>>>>>>> 96 0 0
>>>>>>>>> MiscStage 0 0
>>>>>>>>> 0 0 0
>>>>>>>>> PendingRangeCalculator 0 0
>>>>>>>>> 10 0 0
>>>>>>>>> CompactionExecutor 1 1
>>>>>>>>> 73 0 0
>>>>>>>>> commitlog_archiver 0 0
>>>>>>>>> 0 0 0
>>>>>>>>> HintedHandoff 0 0
>>>>>>>>> 15 0 0
>>>>>>>>>
>>>>>>>>> Message type Dropped
>>>>>>>>> RANGE_SLICE 130
>>>>>>>>> READ_REPAIR 1
>>>>>>>>> PAGED_RANGE 0
>>>>>>>>> BINARY 0
>>>>>>>>> READ 31032
>>>>>>>>> MUTATION 865
>>>>>>>>> _TRACE 0
>>>>>>>>> REQUEST_RESPONSE 7
>>>>>>>>> COUNTER_MUTATION 0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] `nodetool status` output:
>>>>>>>>>
>>>>>>>>> Status=Up/Down
>>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>>> -- Address Load Tokens Owns Host
>>>>>>>>> ID Rack
>>>>>>>>> UN A (Good) 252.37 GB 256 23.0%
>>>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>>>>>>> UN B (Good) 245.91 GB 256 24.4%
>>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>>>>>> UN C (Good) 254.79 GB 256 23.7%
>>>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>>>>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>>>>>
>>>>>>>>> [2] Disk read/write ops:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>>>>>
>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>>>>>
>>>>>>>>> [3] Network in/out:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>>>>>
>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: New node has high network and disk usage.
Posted by Kai Wang <de...@gmail.com>.
James,
I may miss something. You mentioned your cluster had RF=3. Then why does
"nodetool status" show each node owns 1/3 of the data especially after a
full repair?
On Thu, Jan 14, 2016 at 9:56 AM, James Griffin <
james.griffin@idioplatform.com> wrote:
> Hi Kai,
>
> Below - nothing going on that I can see
>
> $ nodetool netstats
> Mode: NORMAL
> Not sending any streams.
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name Active Pending Completed
> Commands n/a 0 6326
> Responses n/a 0 219356
>
>
>
> Best wishes,
>
> Griff
>
> [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
> CTO
> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 | Twitter:
> @imaginaryroots <http://twitter.com/imaginaryroots> | Skype: j.s.griffin
> idio helps major brands and publishers to build closer relationships with
> their customers and prospects by learning from their content consumption
> and acting on that insight. We call it Content Intelligence, and it
> integrates with your existing marketing technology to provide detailed
> customer interest profiles in real-time across all channels, and to
> personalize content into every channel for every customer. See
> http://idioplatform.com
> <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F> for
> more information.
>
> On 14 January 2016 at 14:22, Kai Wang <de...@gmail.com> wrote:
>
>> James,
>>
>> Can you post the result of "nodetool netstats" on the bad node?
>>
>> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
>> james.griffin@idioplatform.com> wrote:
>>
>>> A summary of what we've done this morning:
>>>
>>> - Noted that there are no GCInspector lines in system.log on bad
>>> node (there are GCInspector logs on other healthy nodes)
>>> - Turned on GC logging, noted that we had logs which stated out
>>> total time for which application threads were stopped was high - ~10s.
>>> - Not seeing failures or any kind (promotion or concurrent mark)
>>> - Attached Visual VM: noted that heap usage was very low (~5% usage
>>> and stable) and it didn't display hallmarks GC of activity. PermGen also
>>> very stable
>>> - Downloaded GC logs and examined in GC Viewer. Noted that:
>>> - We had lots of pauses (again around 10s), but no full GC.
>>> - From a 2,300s sample, just over 2,000s were spent with threads
>>> paused
>>> - Spotted many small GCs in the new space - realised that Xmn
>>> value was very low (200M against a heap size of 3750M). Increased Xmn to
>>> 937M - no change in server behaviour (high load, high reads/s on disk, high
>>> CPU wait)
>>>
>>> Current output of jstat:
>>>
>>> S0 S1 E O P YGC YGCT FGC FGCT GCT
>>> 2 0.00 45.20 12.82 26.84 76.21 2333 63.684 2 0.039
>>> 63.724
>>> 3 63.58 0.00 33.68 8.04 75.19 14 1.812 2 0.103
>>> 1.915
>>>
>>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than
>>> 2 (which has normal load statistics).
>>>
>>> Anywhere else you can recommend we look?
>>>
>>> Griff
>>>
>>> On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in>
>>> wrote:
>>>
>>>> Ok. I saw dropped mutations on your cluster and full gc is a common
>>>> cause for that.
>>>> Can you just search the word GCInspector in system.log and share the
>>>> frequency of minor and full gc. Moreover, are you printing promotion
>>>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>>>> or concurrent mode failures?
>>>>
>>>> If you are on CMS, you need to fine tune your heap options to address
>>>> full gc.
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Anuj
>>>>
>>>> Sent from Yahoo Mail on Android
>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>
>>>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>>>> <ja...@idioplatform.com> wrote:
>>>> I think I was incorrect in assuming GC wasn't an issue due to the lack
>>>> of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>>>> differences, though
>>>> comparing the startup flags on the two machines show the GC config is
>>>> identical.:
>>>>
>>>> $ jstat -gcutil
>>>> S0 S1 E O P YGC YGCT FGC FGCT GCT
>>>> 2 5.08 0.00 55.72 18.24 59.90 25986 619.827 28 1.597
>>>> 621.424
>>>> 3 0.00 0.00 22.79 17.87 59.99 422600 11225.979 668 57.383
>>>> 11283.361
>>>>
>>>> Here's typical output for iostat on nodes 2 & 3 as well:
>>>>
>>>> $ iostat -dmx md0
>>>>
>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>>>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>>>> 2 md0 0.00 0.00 339.00 0.00 9.77 0.00
>>>> 59.00 0.00 0.00 0.00 0.00 0.00 0.00
>>>> 3 md0 0.00 0.00 2069.00 1.00 85.85 0.00
>>>> 84.94 0.00 0.00 0.00 0.00 0.00 0.00
>>>>
>>>> Griff
>>>>
>>>> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in>
>>>> wrote:
>>>>
>>>>> Node 2 has slightly higher data but that should be ok. Not sure how
>>>>> read ops are so high when no IO intensive activity such as repair and
>>>>> compaction is running on node 3.May be you can try investigating logs to
>>>>> see whats happening.
>>>>>
>>>>> Others on the mailing list could also share their views on the
>>>>> situation.
>>>>>
>>>>> Thanks
>>>>> Anuj
>>>>>
>>>>>
>>>>>
>>>>> Sent from Yahoo Mail on Android
>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>
>>>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>>>> <ja...@idioplatform.com> wrote:
>>>>> Hi Anuj,
>>>>>
>>>>> Below is the output of nodetool status. The nodes were replaced
>>>>> following the instructions in Datastax documentation for replacing running
>>>>> nodes since the nodes were running fine, it was that the servers had been
>>>>> incorrectly initialised and they thus had less disk space. The status below
>>>>> shows 2 has significantly higher load, however as I say 2 is operating
>>>>> normally and is running compactions, so I guess that's not an issue?
>>>>>
>>>>> Datacenter: datacenter1
>>>>> =======================
>>>>> Status=Up/Down
>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>> -- Address Load Tokens Owns Host ID
>>>>> Rack
>>>>> UN 1 253.59 GB 256 31.7%
>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>> UN 2 302.23 GB 256 35.3%
>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>> UN 3 265.02 GB 256 33.1%
>>>>> 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1
>>>>>
>>>>> Griff
>>>>>
>>>>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Revisiting the thread I can see that nodetool status had both good
>>>>>> and bad nodes at same time. How do you replace nodes? When you say bad
>>>>>> node..I understand that the node is no more usable even though Cassandra is
>>>>>> UP? Is that correct?
>>>>>>
>>>>>> If a node is in bad shape and not working, adding new node may
>>>>>> trigger streaming huge data from bad node too. Have you considered using
>>>>>> the procedure for replacing a dead node?
>>>>>>
>>>>>> Please share Latest nodetool status.
>>>>>>
>>>>>> nodetool output shared earlier:
>>>>>>
>>>>>> `nodetool status` output:
>>>>>>
>>>>>> Status=Up/Down
>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>> -- Address Load Tokens Owns Host
>>>>>> ID Rack
>>>>>> UN A (Good) 252.37 GB 256 23.0%
>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>>>> UN B (Good) 245.91 GB 256 24.4%
>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>>> UN C (Good) 254.79 GB 256 23.7%
>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Anuj
>>>>>>
>>>>>> Sent from Yahoo Mail on Android
>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>
>>>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>>>>> <ja...@idioplatform.com> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> We’ve spent a few days running things but are in the same position.
>>>>>> To add some more flavour:
>>>>>>
>>>>>>
>>>>>> - We have a 3-node ring, replication factor = 3. We’ve been
>>>>>> running in this configuration for a few years without any real issues
>>>>>> - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>>>> brought in to replace two other nodes which had failed RAID0 configuration
>>>>>> and thus were lacking in disk space.
>>>>>> - When node 2 was brought into the ring, it exhibited high CPU
>>>>>> wait, IO and load metrics
>>>>>> - We subsequently brought 3 into the ring: as soon as 3 was fully
>>>>>> bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>>>> levels. Those same stats on 3, however, sky-rocketed
>>>>>> - We’ve confirmed configuration across all three nodes are
>>>>>> identical and in line with the recommended production settings
>>>>>> - We’ve run a full repair
>>>>>> - Node 2 is currently running compactions, 1 & 3 aren’t and have
>>>>>> no pending
>>>>>> - There is no GC happening from what I can see. Node 1 has a GC
>>>>>> log, but that’s not been written to since May last year
>>>>>>
>>>>>>
>>>>>> What we’re seeing at the moment is similar and normal stats on nodes
>>>>>> 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>>>>
>>>>>>
>>>>>> 1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>>>> 2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>>>> 3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>>>>
>>>>>>
>>>>>> Can you recommend any next steps?
>>>>>>
>>>>>> Griff
>>>>>>
>>>>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Vickrum,
>>>>>>>
>>>>>>> I would have proceeded with diagnosis as follows:
>>>>>>>
>>>>>>> 1. Analysis of sar report to check system health -cpu memory swap
>>>>>>> disk etc.
>>>>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>>>>
>>>>>>> 2. Make sure that all recommended Cassandra production settings
>>>>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>>>>
>>>>>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>>>>>> maximum token range but has significant lower data.I doubt that
>>>>>>> bootstrapping happened properly.
>>>>>>>
>>>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>>>>
>>>>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>>>>
>>>>>>> 6. Look for other common performance problems such as GC pauses to
>>>>>>> make sure that dropped mutations are not caused by GC pauses.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Anuj
>>>>>>>
>>>>>>> Sent from Yahoo Mail on Android
>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>
>>>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>>>>> <vi...@idioplatform.com> wrote:
>>>>>>> # nodetool compactionstats
>>>>>>> pending tasks: 22
>>>>>>> compaction type keyspace table
>>>>>>> completed total unit progress
>>>>>>> Compactionproduction_analytics interactions
>>>>>>> 240410213 161172668724 bytes 0.15%
>>>>>>>
>>>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>>>>> 120815385 226295183 bytes 53.39%
>>>>>>> Active compaction remaining time : 2h39m58s
>>>>>>>
>>>>>>> Worth mentioning that compactions haven't been running on this node
>>>>>>> particularly often. The node's been performing badly regardless of whether
>>>>>>> it's compacting or not.
>>>>>>>
>>>>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> What’s your output of `nodetool compactionstats`?
>>>>>>>>
>>>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <
>>>>>>>> vickrum.loi@idioplatform.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We recently added a new node to our cluster in order to replace a
>>>>>>>> node that died (hardware failure we believe). For the next two weeks it had
>>>>>>>> high disk and network activity. We replaced the server, but it's happened
>>>>>>>> again. We've looked into memory allowances, disk performance, number of
>>>>>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>>>>>> issue.
>>>>>>>>
>>>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>>>>>>> comparison to the rest of the cluster, but that's likely a symptom, not a
>>>>>>>> cause.
>>>>>>>>
>>>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The
>>>>>>>> bad node (D) has less data.
>>>>>>>>
>>>>>>>> Disk Activity[2] and Network activity[3] on this node is far higher
>>>>>>>> than the rest.
>>>>>>>>
>>>>>>>> The only other difference this node has to the rest of the cluster
>>>>>>>> is that its on the ext4 filesystem, whereas the rest are ext3, but we've
>>>>>>>> done plenty of testing there and can't see how that would affect
>>>>>>>> performance on this node so much.
>>>>>>>>
>>>>>>>> Nothing of note in system.log.
>>>>>>>>
>>>>>>>> What should our next step be in trying to diagnose this issue?
>>>>>>>>
>>>>>>>> Best wishes,
>>>>>>>> Vic
>>>>>>>>
>>>>>>>> [0] `nodetool tpstats` output:
>>>>>>>>
>>>>>>>> Good node:
>>>>>>>> Pool Name Active Pending Completed
>>>>>>>> Blocked All time blocked
>>>>>>>> ReadStage 0 0
>>>>>>>> 46311521 0 0
>>>>>>>> RequestResponseStage 0 0
>>>>>>>> 23817366 0 0
>>>>>>>> MutationStage 0 0
>>>>>>>> 47389269 0 0
>>>>>>>> ReadRepairStage 0 0
>>>>>>>> 11108 0 0
>>>>>>>> ReplicateOnWriteStage 0 0
>>>>>>>> 0 0 0
>>>>>>>> GossipStage 0 0
>>>>>>>> 5259908 0 0
>>>>>>>> CacheCleanupExecutor 0 0
>>>>>>>> 0 0 0
>>>>>>>> MigrationStage 0 0
>>>>>>>> 30 0 0
>>>>>>>> MemoryMeter 0 0
>>>>>>>> 16563 0 0
>>>>>>>> FlushWriter 0 0
>>>>>>>> 39637 0 26
>>>>>>>> ValidationExecutor 0 0
>>>>>>>> 19013 0 0
>>>>>>>> InternalResponseStage 0 0
>>>>>>>> 9 0 0
>>>>>>>> AntiEntropyStage 0 0
>>>>>>>> 38026 0 0
>>>>>>>> MemtablePostFlusher 0 0
>>>>>>>> 81740 0 0
>>>>>>>> MiscStage 0 0
>>>>>>>> 19196 0 0
>>>>>>>> PendingRangeCalculator 0 0
>>>>>>>> 23 0 0
>>>>>>>> CompactionExecutor 0 0
>>>>>>>> 61629 0 0
>>>>>>>> commitlog_archiver 0 0
>>>>>>>> 0 0 0
>>>>>>>> HintedHandoff 0 0
>>>>>>>> 63 0 0
>>>>>>>>
>>>>>>>> Message type Dropped
>>>>>>>> RANGE_SLICE 0
>>>>>>>> READ_REPAIR 0
>>>>>>>> PAGED_RANGE 0
>>>>>>>> BINARY 0
>>>>>>>> READ 640
>>>>>>>> MUTATION 0
>>>>>>>> _TRACE 0
>>>>>>>> REQUEST_RESPONSE 0
>>>>>>>> COUNTER_MUTATION 0
>>>>>>>>
>>>>>>>> Bad node:
>>>>>>>> Pool Name Active Pending Completed
>>>>>>>> Blocked All time blocked
>>>>>>>> ReadStage 32 113
>>>>>>>> 52216 0 0
>>>>>>>> RequestResponseStage 0 0
>>>>>>>> 4167 0 0
>>>>>>>> MutationStage 0 0
>>>>>>>> 127559 0 0
>>>>>>>> ReadRepairStage 0 0
>>>>>>>> 125 0 0
>>>>>>>> ReplicateOnWriteStage 0 0
>>>>>>>> 0 0 0
>>>>>>>> GossipStage 0 0
>>>>>>>> 9965 0 0
>>>>>>>> CacheCleanupExecutor 0 0
>>>>>>>> 0 0 0
>>>>>>>> MigrationStage 0 0
>>>>>>>> 0 0 0
>>>>>>>> MemoryMeter 0 0
>>>>>>>> 24 0 0
>>>>>>>> FlushWriter 0 0
>>>>>>>> 27 0 1
>>>>>>>> ValidationExecutor 0 0
>>>>>>>> 0 0 0
>>>>>>>> InternalResponseStage 0 0
>>>>>>>> 0 0 0
>>>>>>>> AntiEntropyStage 0 0
>>>>>>>> 0 0 0
>>>>>>>> MemtablePostFlusher 0 0
>>>>>>>> 96 0 0
>>>>>>>> MiscStage 0 0
>>>>>>>> 0 0 0
>>>>>>>> PendingRangeCalculator 0 0
>>>>>>>> 10 0 0
>>>>>>>> CompactionExecutor 1 1
>>>>>>>> 73 0 0
>>>>>>>> commitlog_archiver 0 0
>>>>>>>> 0 0 0
>>>>>>>> HintedHandoff 0 0
>>>>>>>> 15 0 0
>>>>>>>>
>>>>>>>> Message type Dropped
>>>>>>>> RANGE_SLICE 130
>>>>>>>> READ_REPAIR 1
>>>>>>>> PAGED_RANGE 0
>>>>>>>> BINARY 0
>>>>>>>> READ 31032
>>>>>>>> MUTATION 865
>>>>>>>> _TRACE 0
>>>>>>>> REQUEST_RESPONSE 7
>>>>>>>> COUNTER_MUTATION 0
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] `nodetool status` output:
>>>>>>>>
>>>>>>>> Status=Up/Down
>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>> -- Address Load Tokens Owns Host
>>>>>>>> ID Rack
>>>>>>>> UN A (Good) 252.37 GB 256 23.0%
>>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>>>>>> UN B (Good) 245.91 GB 256 24.4%
>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>>>>> UN C (Good) 254.79 GB 256 23.7%
>>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>>>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>>>>
>>>>>>>> [2] Disk read/write ops:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>>>>
>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>>>>
>>>>>>>> [3] Network in/out:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>>>>
>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: New node has high network and disk usage.
Posted by James Griffin <ja...@idioplatform.com>.
Hi Kai,
Below - nothing going on that I can see
$ nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 6326
Responses n/a 0 219356
Best wishes,
Griff
[image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
CTO
Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 | Twitter:
@imaginaryroots <http://twitter.com/imaginaryroots> | Skype: j.s.griffin
idio helps major brands and publishers to build closer relationships with
their customers and prospects by learning from their content consumption
and acting on that insight. We call it Content Intelligence, and it
integrates with your existing marketing technology to provide detailed
customer interest profiles in real-time across all channels, and to
personalize content into every channel for every customer. See
http://idioplatform.com
<https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F>
for
more information.
On 14 January 2016 at 14:22, Kai Wang <de...@gmail.com> wrote:
> James,
>
> Can you post the result of "nodetool netstats" on the bad node?
>
> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
> james.griffin@idioplatform.com> wrote:
>
>> A summary of what we've done this morning:
>>
>> - Noted that there are no GCInspector lines in system.log on bad node
>> (there are GCInspector logs on other healthy nodes)
>> - Turned on GC logging, noted that we had logs which stated out total
>> time for which application threads were stopped was high - ~10s.
>> - Not seeing failures or any kind (promotion or concurrent mark)
>> - Attached Visual VM: noted that heap usage was very low (~5% usage
>> and stable) and it didn't display hallmarks GC of activity. PermGen also
>> very stable
>> - Downloaded GC logs and examined in GC Viewer. Noted that:
>> - We had lots of pauses (again around 10s), but no full GC.
>> - From a 2,300s sample, just over 2,000s were spent with threads
>> paused
>> - Spotted many small GCs in the new space - realised that Xmn
>> value was very low (200M against a heap size of 3750M). Increased Xmn to
>> 937M - no change in server behaviour (high load, high reads/s on disk, high
>> CPU wait)
>>
>> Current output of jstat:
>>
>> S0 S1 E O P YGC YGCT FGC FGCT GCT
>> 2 0.00 45.20 12.82 26.84 76.21 2333 63.684 2 0.039 63.724
>> 3 63.58 0.00 33.68 8.04 75.19 14 1.812 2 0.103
>> 1.915
>>
>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than
>> 2 (which has normal load statistics).
>>
>> Anywhere else you can recommend we look?
>>
>> Griff
>>
>> On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in> wrote:
>>
>>> Ok. I saw dropped mutations on your cluster and full gc is a common
>>> cause for that.
>>> Can you just search the word GCInspector in system.log and share the
>>> frequency of minor and full gc. Moreover, are you printing promotion
>>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>>> or concurrent mode failures?
>>>
>>> If you are on CMS, you need to fine tune your heap options to address
>>> full gc.
>>>
>>>
>>>
>>> Thanks
>>> Anuj
>>>
>>> Sent from Yahoo Mail on Android
>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>
>>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>>> <ja...@idioplatform.com> wrote:
>>> I think I was incorrect in assuming GC wasn't an issue due to the lack
>>> of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>>> differences, though
>>> comparing the startup flags on the two machines show the GC config is
>>> identical.:
>>>
>>> $ jstat -gcutil
>>> S0 S1 E O P YGC YGCT FGC FGCT GCT
>>> 2 5.08 0.00 55.72 18.24 59.90 25986 619.827 28 1.597
>>> 621.424
>>> 3 0.00 0.00 22.79 17.87 59.99 422600 11225.979 668 57.383
>>> 11283.361
>>>
>>> Here's typical output for iostat on nodes 2 & 3 as well:
>>>
>>> $ iostat -dmx md0
>>>
>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>>> 2 md0 0.00 0.00 339.00 0.00 9.77 0.00
>>> 59.00 0.00 0.00 0.00 0.00 0.00 0.00
>>> 3 md0 0.00 0.00 2069.00 1.00 85.85 0.00
>>> 84.94 0.00 0.00 0.00 0.00 0.00 0.00
>>>
>>> Griff
>>>
>>> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in>
>>> wrote:
>>>
>>>> Node 2 has slightly higher data but that should be ok. Not sure how
>>>> read ops are so high when no IO intensive activity such as repair and
>>>> compaction is running on node 3.May be you can try investigating logs to
>>>> see whats happening.
>>>>
>>>> Others on the mailing list could also share their views on the
>>>> situation.
>>>>
>>>> Thanks
>>>> Anuj
>>>>
>>>>
>>>>
>>>> Sent from Yahoo Mail on Android
>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>
>>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>>> <ja...@idioplatform.com> wrote:
>>>> Hi Anuj,
>>>>
>>>> Below is the output of nodetool status. The nodes were replaced
>>>> following the instructions in Datastax documentation for replacing running
>>>> nodes since the nodes were running fine, it was that the servers had been
>>>> incorrectly initialised and they thus had less disk space. The status below
>>>> shows 2 has significantly higher load, however as I say 2 is operating
>>>> normally and is running compactions, so I guess that's not an issue?
>>>>
>>>> Datacenter: datacenter1
>>>> =======================
>>>> Status=Up/Down
>>>> |/ State=Normal/Leaving/Joining/Moving
>>>> -- Address Load Tokens Owns Host ID
>>>> Rack
>>>> UN 1 253.59 GB 256 31.7%
>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>> UN 2 302.23 GB 256 35.3%
>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>> UN 3 265.02 GB 256 33.1%
>>>> 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1
>>>>
>>>> Griff
>>>>
>>>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Revisiting the thread I can see that nodetool status had both good and
>>>>> bad nodes at same time. How do you replace nodes? When you say bad node..I
>>>>> understand that the node is no more usable even though Cassandra is UP? Is
>>>>> that correct?
>>>>>
>>>>> If a node is in bad shape and not working, adding new node may trigger
>>>>> streaming huge data from bad node too. Have you considered using the
>>>>> procedure for replacing a dead node?
>>>>>
>>>>> Please share Latest nodetool status.
>>>>>
>>>>> nodetool output shared earlier:
>>>>>
>>>>> `nodetool status` output:
>>>>>
>>>>> Status=Up/Down
>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>> -- Address Load Tokens Owns Host
>>>>> ID Rack
>>>>> UN A (Good) 252.37 GB 256 23.0%
>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>>> UN B (Good) 245.91 GB 256 24.4%
>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>> UN C (Good) 254.79 GB 256 23.7%
>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>> Anuj
>>>>>
>>>>> Sent from Yahoo Mail on Android
>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>
>>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>>>> <ja...@idioplatform.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> We’ve spent a few days running things but are in the same position. To
>>>>> add some more flavour:
>>>>>
>>>>>
>>>>> - We have a 3-node ring, replication factor = 3. We’ve been
>>>>> running in this configuration for a few years without any real issues
>>>>> - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>>> brought in to replace two other nodes which had failed RAID0 configuration
>>>>> and thus were lacking in disk space.
>>>>> - When node 2 was brought into the ring, it exhibited high CPU
>>>>> wait, IO and load metrics
>>>>> - We subsequently brought 3 into the ring: as soon as 3 was fully
>>>>> bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>>> levels. Those same stats on 3, however, sky-rocketed
>>>>> - We’ve confirmed configuration across all three nodes are
>>>>> identical and in line with the recommended production settings
>>>>> - We’ve run a full repair
>>>>> - Node 2 is currently running compactions, 1 & 3 aren’t and have
>>>>> no pending
>>>>> - There is no GC happening from what I can see. Node 1 has a GC
>>>>> log, but that’s not been written to since May last year
>>>>>
>>>>>
>>>>> What we’re seeing at the moment is similar and normal stats on nodes 1
>>>>> & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>>>
>>>>>
>>>>> 1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>>> 2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>>> 3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>>>
>>>>>
>>>>> Can you recommend any next steps?
>>>>>
>>>>> Griff
>>>>>
>>>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in>
>>>>> wrote:
>>>>>
>>>>>> Hi Vickrum,
>>>>>>
>>>>>> I would have proceeded with diagnosis as follows:
>>>>>>
>>>>>> 1. Analysis of sar report to check system health -cpu memory swap
>>>>>> disk etc.
>>>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>>>
>>>>>> 2. Make sure that all recommended Cassandra production settings
>>>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>>>
>>>>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>>>>> maximum token range but has significant lower data.I doubt that
>>>>>> bootstrapping happened properly.
>>>>>>
>>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>>>
>>>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>>>
>>>>>> 6. Look for other common performance problems such as GC pauses to
>>>>>> make sure that dropped mutations are not caused by GC pauses.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Anuj
>>>>>>
>>>>>> Sent from Yahoo Mail on Android
>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>
>>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>>>> <vi...@idioplatform.com> wrote:
>>>>>> # nodetool compactionstats
>>>>>> pending tasks: 22
>>>>>> compaction type keyspace table
>>>>>> completed total unit progress
>>>>>> Compactionproduction_analytics interactions
>>>>>> 240410213 161172668724 bytes 0.15%
>>>>>>
>>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>>>> 120815385 226295183 bytes 53.39%
>>>>>> Active compaction remaining time : 2h39m58s
>>>>>>
>>>>>> Worth mentioning that compactions haven't been running on this node
>>>>>> particularly often. The node's been performing badly regardless of whether
>>>>>> it's compacting or not.
>>>>>>
>>>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
>>>>>>
>>>>>>> What’s your output of `nodetool compactionstats`?
>>>>>>>
>>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <
>>>>>>> vickrum.loi@idioplatform.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We recently added a new node to our cluster in order to replace a
>>>>>>> node that died (hardware failure we believe). For the next two weeks it had
>>>>>>> high disk and network activity. We replaced the server, but it's happened
>>>>>>> again. We've looked into memory allowances, disk performance, number of
>>>>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>>>>> issue.
>>>>>>>
>>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>>>>>> comparison to the rest of the cluster, but that's likely a symptom, not a
>>>>>>> cause.
>>>>>>>
>>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad
>>>>>>> node (D) has less data.
>>>>>>>
>>>>>>> Disk Activity[2] and Network activity[3] on this node is far higher
>>>>>>> than the rest.
>>>>>>>
>>>>>>> The only other difference this node has to the rest of the cluster
>>>>>>> is that its on the ext4 filesystem, whereas the rest are ext3, but we've
>>>>>>> done plenty of testing there and can't see how that would affect
>>>>>>> performance on this node so much.
>>>>>>>
>>>>>>> Nothing of note in system.log.
>>>>>>>
>>>>>>> What should our next step be in trying to diagnose this issue?
>>>>>>>
>>>>>>> Best wishes,
>>>>>>> Vic
>>>>>>>
>>>>>>> [0] `nodetool tpstats` output:
>>>>>>>
>>>>>>> Good node:
>>>>>>> Pool Name Active Pending Completed
>>>>>>> Blocked All time blocked
>>>>>>> ReadStage 0 0
>>>>>>> 46311521 0 0
>>>>>>> RequestResponseStage 0 0
>>>>>>> 23817366 0 0
>>>>>>> MutationStage 0 0
>>>>>>> 47389269 0 0
>>>>>>> ReadRepairStage 0 0
>>>>>>> 11108 0 0
>>>>>>> ReplicateOnWriteStage 0 0
>>>>>>> 0 0 0
>>>>>>> GossipStage 0 0
>>>>>>> 5259908 0 0
>>>>>>> CacheCleanupExecutor 0 0
>>>>>>> 0 0 0
>>>>>>> MigrationStage 0 0
>>>>>>> 30 0 0
>>>>>>> MemoryMeter 0 0
>>>>>>> 16563 0 0
>>>>>>> FlushWriter 0 0
>>>>>>> 39637 0 26
>>>>>>> ValidationExecutor 0 0
>>>>>>> 19013 0 0
>>>>>>> InternalResponseStage 0 0
>>>>>>> 9 0 0
>>>>>>> AntiEntropyStage 0 0
>>>>>>> 38026 0 0
>>>>>>> MemtablePostFlusher 0 0
>>>>>>> 81740 0 0
>>>>>>> MiscStage 0 0
>>>>>>> 19196 0 0
>>>>>>> PendingRangeCalculator 0 0
>>>>>>> 23 0 0
>>>>>>> CompactionExecutor 0 0
>>>>>>> 61629 0 0
>>>>>>> commitlog_archiver 0 0
>>>>>>> 0 0 0
>>>>>>> HintedHandoff 0 0
>>>>>>> 63 0 0
>>>>>>>
>>>>>>> Message type Dropped
>>>>>>> RANGE_SLICE 0
>>>>>>> READ_REPAIR 0
>>>>>>> PAGED_RANGE 0
>>>>>>> BINARY 0
>>>>>>> READ 640
>>>>>>> MUTATION 0
>>>>>>> _TRACE 0
>>>>>>> REQUEST_RESPONSE 0
>>>>>>> COUNTER_MUTATION 0
>>>>>>>
>>>>>>> Bad node:
>>>>>>> Pool Name Active Pending Completed
>>>>>>> Blocked All time blocked
>>>>>>> ReadStage 32 113
>>>>>>> 52216 0 0
>>>>>>> RequestResponseStage 0 0
>>>>>>> 4167 0 0
>>>>>>> MutationStage 0 0
>>>>>>> 127559 0 0
>>>>>>> ReadRepairStage 0 0
>>>>>>> 125 0 0
>>>>>>> ReplicateOnWriteStage 0 0
>>>>>>> 0 0 0
>>>>>>> GossipStage 0 0
>>>>>>> 9965 0 0
>>>>>>> CacheCleanupExecutor 0 0
>>>>>>> 0 0 0
>>>>>>> MigrationStage 0 0
>>>>>>> 0 0 0
>>>>>>> MemoryMeter 0 0
>>>>>>> 24 0 0
>>>>>>> FlushWriter 0 0
>>>>>>> 27 0 1
>>>>>>> ValidationExecutor 0 0
>>>>>>> 0 0 0
>>>>>>> InternalResponseStage 0 0
>>>>>>> 0 0 0
>>>>>>> AntiEntropyStage 0 0
>>>>>>> 0 0 0
>>>>>>> MemtablePostFlusher 0 0
>>>>>>> 96 0 0
>>>>>>> MiscStage 0 0
>>>>>>> 0 0 0
>>>>>>> PendingRangeCalculator 0 0
>>>>>>> 10 0 0
>>>>>>> CompactionExecutor 1 1
>>>>>>> 73 0 0
>>>>>>> commitlog_archiver 0 0
>>>>>>> 0 0 0
>>>>>>> HintedHandoff 0 0
>>>>>>> 15 0 0
>>>>>>>
>>>>>>> Message type Dropped
>>>>>>> RANGE_SLICE 130
>>>>>>> READ_REPAIR 1
>>>>>>> PAGED_RANGE 0
>>>>>>> BINARY 0
>>>>>>> READ 31032
>>>>>>> MUTATION 865
>>>>>>> _TRACE 0
>>>>>>> REQUEST_RESPONSE 7
>>>>>>> COUNTER_MUTATION 0
>>>>>>>
>>>>>>>
>>>>>>> [1] `nodetool status` output:
>>>>>>>
>>>>>>> Status=Up/Down
>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>> -- Address Load Tokens Owns Host
>>>>>>> ID Rack
>>>>>>> UN A (Good) 252.37 GB 256 23.0%
>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>>>>> UN B (Good) 245.91 GB 256 24.4%
>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>>>> UN C (Good) 254.79 GB 256 23.7%
>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>>>
>>>>>>> [2] Disk read/write ops:
>>>>>>>
>>>>>>>
>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>>>
>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>>>
>>>>>>> [3] Network in/out:
>>>>>>>
>>>>>>>
>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>>>
>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: New node has high network and disk usage.
Posted by Kai Wang <de...@gmail.com>.
James,
Can you post the result of "nodetool netstats" on the bad node?
On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
james.griffin@idioplatform.com> wrote:
> A summary of what we've done this morning:
>
> - Noted that there are no GCInspector lines in system.log on bad node
> (there are GCInspector logs on other healthy nodes)
> - Turned on GC logging, noted that we had logs which stated out total
> time for which application threads were stopped was high - ~10s.
> - Not seeing failures or any kind (promotion or concurrent mark)
> - Attached Visual VM: noted that heap usage was very low (~5% usage
> and stable) and it didn't display hallmarks GC of activity. PermGen also
> very stable
> - Downloaded GC logs and examined in GC Viewer. Noted that:
> - We had lots of pauses (again around 10s), but no full GC.
> - From a 2,300s sample, just over 2,000s were spent with threads
> paused
> - Spotted many small GCs in the new space - realised that Xmn value
> was very low (200M against a heap size of 3750M). Increased Xmn to 937M -
> no change in server behaviour (high load, high reads/s on disk, high CPU
> wait)
>
> Current output of jstat:
>
> S0 S1 E O P YGC YGCT FGC FGCT GCT
> 2 0.00 45.20 12.82 26.84 76.21 2333 63.684 2 0.039 63.724
> 3 63.58 0.00 33.68 8.04 75.19 14 1.812 2 0.103 1.915
>
> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than 2
> (which has normal load statistics).
>
> Anywhere else you can recommend we look?
>
> Griff
>
> On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in> wrote:
>
>> Ok. I saw dropped mutations on your cluster and full gc is a common cause
>> for that.
>> Can you just search the word GCInspector in system.log and share the
>> frequency of minor and full gc. Moreover, are you printing promotion
>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>> or concurrent mode failures?
>>
>> If you are on CMS, you need to fine tune your heap options to address
>> full gc.
>>
>>
>>
>> Thanks
>> Anuj
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>
>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>> <ja...@idioplatform.com> wrote:
>> I think I was incorrect in assuming GC wasn't an issue due to the lack of
>> logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>> differences, though
>> comparing the startup flags on the two machines show the GC config is
>> identical.:
>>
>> $ jstat -gcutil
>> S0 S1 E O P YGC YGCT FGC FGCT GCT
>> 2 5.08 0.00 55.72 18.24 59.90 25986 619.827 28 1.597
>> 621.424
>> 3 0.00 0.00 22.79 17.87 59.99 422600 11225.979 668 57.383
>> 11283.361
>>
>> Here's typical output for iostat on nodes 2 & 3 as well:
>>
>> $ iostat -dmx md0
>>
>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>> 2 md0 0.00 0.00 339.00 0.00 9.77 0.00
>> 59.00 0.00 0.00 0.00 0.00 0.00 0.00
>> 3 md0 0.00 0.00 2069.00 1.00 85.85 0.00
>> 84.94 0.00 0.00 0.00 0.00 0.00 0.00
>>
>> Griff
>>
>> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in> wrote:
>>
>>> Node 2 has slightly higher data but that should be ok. Not sure how read
>>> ops are so high when no IO intensive activity such as repair and compaction
>>> is running on node 3.May be you can try investigating logs to see whats
>>> happening.
>>>
>>> Others on the mailing list could also share their views on the situation.
>>>
>>> Thanks
>>> Anuj
>>>
>>>
>>>
>>> Sent from Yahoo Mail on Android
>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>
>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>> <ja...@idioplatform.com> wrote:
>>> Hi Anuj,
>>>
>>> Below is the output of nodetool status. The nodes were replaced
>>> following the instructions in Datastax documentation for replacing running
>>> nodes since the nodes were running fine, it was that the servers had been
>>> incorrectly initialised and they thus had less disk space. The status below
>>> shows 2 has significantly higher load, however as I say 2 is operating
>>> normally and is running compactions, so I guess that's not an issue?
>>>
>>> Datacenter: datacenter1
>>> =======================
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> -- Address Load Tokens Owns Host ID
>>> Rack
>>> UN 1 253.59 GB 256 31.7%
>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>> UN 2 302.23 GB 256 35.3%
>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>> UN 3 265.02 GB 256 33.1%
>>> 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1
>>>
>>> Griff
>>>
>>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Revisiting the thread I can see that nodetool status had both good and
>>>> bad nodes at same time. How do you replace nodes? When you say bad node..I
>>>> understand that the node is no more usable even though Cassandra is UP? Is
>>>> that correct?
>>>>
>>>> If a node is in bad shape and not working, adding new node may trigger
>>>> streaming huge data from bad node too. Have you considered using the
>>>> procedure for replacing a dead node?
>>>>
>>>> Please share Latest nodetool status.
>>>>
>>>> nodetool output shared earlier:
>>>>
>>>> `nodetool status` output:
>>>>
>>>> Status=Up/Down
>>>> |/ State=Normal/Leaving/Joining/Moving
>>>> -- Address Load Tokens Owns Host
>>>> ID Rack
>>>> UN A (Good) 252.37 GB 256 23.0%
>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>> UN B (Good) 245.91 GB 256 24.4%
>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>> UN C (Good) 254.79 GB 256 23.7%
>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Anuj
>>>>
>>>> Sent from Yahoo Mail on Android
>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>
>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>>> <ja...@idioplatform.com> wrote:
>>>> Hi all,
>>>>
>>>> We’ve spent a few days running things but are in the same position. To
>>>> add some more flavour:
>>>>
>>>>
>>>> - We have a 3-node ring, replication factor = 3. We’ve been running
>>>> in this configuration for a few years without any real issues
>>>> - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>> brought in to replace two other nodes which had failed RAID0 configuration
>>>> and thus were lacking in disk space.
>>>> - When node 2 was brought into the ring, it exhibited high CPU
>>>> wait, IO and load metrics
>>>> - We subsequently brought 3 into the ring: as soon as 3 was fully
>>>> bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>> levels. Those same stats on 3, however, sky-rocketed
>>>> - We’ve confirmed configuration across all three nodes are
>>>> identical and in line with the recommended production settings
>>>> - We’ve run a full repair
>>>> - Node 2 is currently running compactions, 1 & 3 aren’t and have no
>>>> pending
>>>> - There is no GC happening from what I can see. Node 1 has a GC
>>>> log, but that’s not been written to since May last year
>>>>
>>>>
>>>> What we’re seeing at the moment is similar and normal stats on nodes 1
>>>> & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>>
>>>>
>>>> 1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>> 2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>> 3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>>
>>>>
>>>> Can you recommend any next steps?
>>>>
>>>> Griff
>>>>
>>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in>
>>>> wrote:
>>>>
>>>>> Hi Vickrum,
>>>>>
>>>>> I would have proceeded with diagnosis as follows:
>>>>>
>>>>> 1. Analysis of sar report to check system health -cpu memory swap
>>>>> disk etc.
>>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>>
>>>>> 2. Make sure that all recommended Cassandra production settings
>>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>>
>>>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>>>> maximum token range but has significant lower data.I doubt that
>>>>> bootstrapping happened properly.
>>>>>
>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>>
>>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>>
>>>>> 6. Look for other common performance problems such as GC pauses to
>>>>> make sure that dropped mutations are not caused by GC pauses.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Anuj
>>>>>
>>>>> Sent from Yahoo Mail on Android
>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>
>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>>> <vi...@idioplatform.com> wrote:
>>>>> # nodetool compactionstats
>>>>> pending tasks: 22
>>>>> compaction type keyspace table
>>>>> completed total unit progress
>>>>> Compactionproduction_analytics interactions
>>>>> 240410213 161172668724 bytes 0.15%
>>>>>
>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>>> 120815385 226295183 bytes 53.39%
>>>>> Active compaction remaining time : 2h39m58s
>>>>>
>>>>> Worth mentioning that compactions haven't been running on this node
>>>>> particularly often. The node's been performing badly regardless of whether
>>>>> it's compacting or not.
>>>>>
>>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
>>>>>
>>>>>> What’s your output of `nodetool compactionstats`?
>>>>>>
>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We recently added a new node to our cluster in order to replace a
>>>>>> node that died (hardware failure we believe). For the next two weeks it had
>>>>>> high disk and network activity. We replaced the server, but it's happened
>>>>>> again. We've looked into memory allowances, disk performance, number of
>>>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>>>> issue.
>>>>>>
>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>>>>> comparison to the rest of the cluster, but that's likely a symptom, not a
>>>>>> cause.
>>>>>>
>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad
>>>>>> node (D) has less data.
>>>>>>
>>>>>> Disk Activity[2] and Network activity[3] on this node is far higher
>>>>>> than the rest.
>>>>>>
>>>>>> The only other difference this node has to the rest of the cluster is
>>>>>> that its on the ext4 filesystem, whereas the rest are ext3, but we've done
>>>>>> plenty of testing there and can't see how that would affect performance on
>>>>>> this node so much.
>>>>>>
>>>>>> Nothing of note in system.log.
>>>>>>
>>>>>> What should our next step be in trying to diagnose this issue?
>>>>>>
>>>>>> Best wishes,
>>>>>> Vic
>>>>>>
>>>>>> [0] `nodetool tpstats` output:
>>>>>>
>>>>>> Good node:
>>>>>> Pool Name Active Pending Completed
>>>>>> Blocked All time blocked
>>>>>> ReadStage 0 0
>>>>>> 46311521 0 0
>>>>>> RequestResponseStage 0 0
>>>>>> 23817366 0 0
>>>>>> MutationStage 0 0
>>>>>> 47389269 0 0
>>>>>> ReadRepairStage 0 0
>>>>>> 11108 0 0
>>>>>> ReplicateOnWriteStage 0 0
>>>>>> 0 0 0
>>>>>> GossipStage 0 0
>>>>>> 5259908 0 0
>>>>>> CacheCleanupExecutor 0 0
>>>>>> 0 0 0
>>>>>> MigrationStage 0 0
>>>>>> 30 0 0
>>>>>> MemoryMeter 0 0
>>>>>> 16563 0 0
>>>>>> FlushWriter 0 0
>>>>>> 39637 0 26
>>>>>> ValidationExecutor 0 0
>>>>>> 19013 0 0
>>>>>> InternalResponseStage 0 0
>>>>>> 9 0 0
>>>>>> AntiEntropyStage 0 0
>>>>>> 38026 0 0
>>>>>> MemtablePostFlusher 0 0
>>>>>> 81740 0 0
>>>>>> MiscStage 0 0
>>>>>> 19196 0 0
>>>>>> PendingRangeCalculator 0 0
>>>>>> 23 0 0
>>>>>> CompactionExecutor 0 0
>>>>>> 61629 0 0
>>>>>> commitlog_archiver 0 0
>>>>>> 0 0 0
>>>>>> HintedHandoff 0 0
>>>>>> 63 0 0
>>>>>>
>>>>>> Message type Dropped
>>>>>> RANGE_SLICE 0
>>>>>> READ_REPAIR 0
>>>>>> PAGED_RANGE 0
>>>>>> BINARY 0
>>>>>> READ 640
>>>>>> MUTATION 0
>>>>>> _TRACE 0
>>>>>> REQUEST_RESPONSE 0
>>>>>> COUNTER_MUTATION 0
>>>>>>
>>>>>> Bad node:
>>>>>> Pool Name Active Pending Completed
>>>>>> Blocked All time blocked
>>>>>> ReadStage 32 113
>>>>>> 52216 0 0
>>>>>> RequestResponseStage 0 0
>>>>>> 4167 0 0
>>>>>> MutationStage 0 0
>>>>>> 127559 0 0
>>>>>> ReadRepairStage 0 0
>>>>>> 125 0 0
>>>>>> ReplicateOnWriteStage 0 0
>>>>>> 0 0 0
>>>>>> GossipStage 0 0
>>>>>> 9965 0 0
>>>>>> CacheCleanupExecutor 0 0
>>>>>> 0 0 0
>>>>>> MigrationStage 0 0
>>>>>> 0 0 0
>>>>>> MemoryMeter 0 0
>>>>>> 24 0 0
>>>>>> FlushWriter 0 0
>>>>>> 27 0 1
>>>>>> ValidationExecutor 0 0
>>>>>> 0 0 0
>>>>>> InternalResponseStage 0 0
>>>>>> 0 0 0
>>>>>> AntiEntropyStage 0 0
>>>>>> 0 0 0
>>>>>> MemtablePostFlusher 0 0
>>>>>> 96 0 0
>>>>>> MiscStage 0 0
>>>>>> 0 0 0
>>>>>> PendingRangeCalculator 0 0
>>>>>> 10 0 0
>>>>>> CompactionExecutor 1 1
>>>>>> 73 0 0
>>>>>> commitlog_archiver 0 0
>>>>>> 0 0 0
>>>>>> HintedHandoff 0 0
>>>>>> 15 0 0
>>>>>>
>>>>>> Message type Dropped
>>>>>> RANGE_SLICE 130
>>>>>> READ_REPAIR 1
>>>>>> PAGED_RANGE 0
>>>>>> BINARY 0
>>>>>> READ 31032
>>>>>> MUTATION 865
>>>>>> _TRACE 0
>>>>>> REQUEST_RESPONSE 7
>>>>>> COUNTER_MUTATION 0
>>>>>>
>>>>>>
>>>>>> [1] `nodetool status` output:
>>>>>>
>>>>>> Status=Up/Down
>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>> -- Address Load Tokens Owns Host
>>>>>> ID Rack
>>>>>> UN A (Good) 252.37 GB 256 23.0%
>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>>>> UN B (Good) 245.91 GB 256 24.4%
>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>>> UN C (Good) 254.79 GB 256 23.7%
>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>>
>>>>>> [2] Disk read/write ops:
>>>>>>
>>>>>>
>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>>
>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>>
>>>>>> [3] Network in/out:
>>>>>>
>>>>>>
>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>>
>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Re: New node has high network and disk usage.
Posted by James Griffin <ja...@idioplatform.com>.
A summary of what we've done this morning:
- Noted that there are no GCInspector lines in system.log on bad node
(there are GCInspector logs on other healthy nodes)
- Turned on GC logging, noted that we had logs which stated out total
time for which application threads were stopped was high - ~10s.
- Not seeing failures or any kind (promotion or concurrent mark)
- Attached Visual VM: noted that heap usage was very low (~5% usage and
stable) and it didn't display hallmarks GC of activity. PermGen also very
stable
- Downloaded GC logs and examined in GC Viewer. Noted that:
- We had lots of pauses (again around 10s), but no full GC.
- From a 2,300s sample, just over 2,000s were spent with threads
paused
- Spotted many small GCs in the new space - realised that Xmn value
was very low (200M against a heap size of 3750M). Increased Xmn to 937M -
no change in server behaviour (high load, high reads/s on disk, high CPU
wait)
Current output of jstat:
S0 S1 E O P YGC YGCT FGC FGCT GCT
2 0.00 45.20 12.82 26.84 76.21 2333 63.684 2 0.039 63.724
3 63.58 0.00 33.68 8.04 75.19 14 1.812 2 0.103 1.915
Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than 2
(which has normal load statistics).
Anywhere else you can recommend we look?
Griff
On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in> wrote:
> Ok. I saw dropped mutations on your cluster and full gc is a common cause
> for that.
> Can you just search the word GCInspector in system.log and share the
> frequency of minor and full gc. Moreover, are you printing promotion
> failures in gc logs?? Why full gc ia getting triggered??promotion failures
> or concurrent mode failures?
>
> If you are on CMS, you need to fine tune your heap options to address full
> gc.
>
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
> <ja...@idioplatform.com> wrote:
> I think I was incorrect in assuming GC wasn't an issue due to the lack of
> logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
> differences, though
> comparing the startup flags on the two machines show the GC config is
> identical.:
>
> $ jstat -gcutil
> S0 S1 E O P YGC YGCT FGC FGCT GCT
> 2 5.08 0.00 55.72 18.24 59.90 25986 619.827 28 1.597 621.424
> 3 0.00 0.00 22.79 17.87 59.99 422600 11225.979 668 57.383
> 11283.361
>
> Here's typical output for iostat on nodes 2 & 3 as well:
>
> $ iostat -dmx md0
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> 2 md0 0.00 0.00 339.00 0.00 9.77 0.00
> 59.00 0.00 0.00 0.00 0.00 0.00 0.00
> 3 md0 0.00 0.00 2069.00 1.00 85.85 0.00
> 84.94 0.00 0.00 0.00 0.00 0.00 0.00
>
> Griff
>
> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in> wrote:
>
>> Node 2 has slightly higher data but that should be ok. Not sure how read
>> ops are so high when no IO intensive activity such as repair and compaction
>> is running on node 3.May be you can try investigating logs to see whats
>> happening.
>>
>> Others on the mailing list could also share their views on the situation.
>>
>> Thanks
>> Anuj
>>
>>
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>
>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>> <ja...@idioplatform.com> wrote:
>> Hi Anuj,
>>
>> Below is the output of nodetool status. The nodes were replaced following
>> the instructions in Datastax documentation for replacing running nodes
>> since the nodes were running fine, it was that the servers had been
>> incorrectly initialised and they thus had less disk space. The status below
>> shows 2 has significantly higher load, however as I say 2 is operating
>> normally and is running compactions, so I guess that's not an issue?
>>
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> -- Address Load Tokens Owns Host ID
>> Rack
>> UN 1 253.59 GB 256 31.7%
>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>> UN 2 302.23 GB 256 35.3%
>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>> UN 3 265.02 GB 256 33.1%
>> 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1
>>
>> Griff
>>
>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in> wrote:
>>
>>> Hi,
>>>
>>> Revisiting the thread I can see that nodetool status had both good and
>>> bad nodes at same time. How do you replace nodes? When you say bad node..I
>>> understand that the node is no more usable even though Cassandra is UP? Is
>>> that correct?
>>>
>>> If a node is in bad shape and not working, adding new node may trigger
>>> streaming huge data from bad node too. Have you considered using the
>>> procedure for replacing a dead node?
>>>
>>> Please share Latest nodetool status.
>>>
>>> nodetool output shared earlier:
>>>
>>> `nodetool status` output:
>>>
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> -- Address Load Tokens Owns Host
>>> ID Rack
>>> UN A (Good) 252.37 GB 256 23.0%
>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>> UN B (Good) 245.91 GB 256 24.4%
>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>> UN C (Good) 254.79 GB 256 23.7%
>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>> UN D (Bad) 163.85 GB 256 28.8%
>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>
>>>
>>>
>>> Thanks
>>> Anuj
>>>
>>> Sent from Yahoo Mail on Android
>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>
>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>> <ja...@idioplatform.com> wrote:
>>> Hi all,
>>>
>>> We’ve spent a few days running things but are in the same position. To
>>> add some more flavour:
>>>
>>>
>>> - We have a 3-node ring, replication factor = 3. We’ve been running
>>> in this configuration for a few years without any real issues
>>> - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>> brought in to replace two other nodes which had failed RAID0 configuration
>>> and thus were lacking in disk space.
>>> - When node 2 was brought into the ring, it exhibited high CPU wait,
>>> IO and load metrics
>>> - We subsequently brought 3 into the ring: as soon as 3 was fully
>>> bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>> levels. Those same stats on 3, however, sky-rocketed
>>> - We’ve confirmed configuration across all three nodes are identical
>>> and in line with the recommended production settings
>>> - We’ve run a full repair
>>> - Node 2 is currently running compactions, 1 & 3 aren’t and have no
>>> pending
>>> - There is no GC happening from what I can see. Node 1 has a GC log,
>>> but that’s not been written to since May last year
>>>
>>>
>>> What we’re seeing at the moment is similar and normal stats on nodes 1 &
>>> 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>
>>>
>>> 1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>> 2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>> 3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>
>>>
>>> Can you recommend any next steps?
>>>
>>> Griff
>>>
>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:
>>>
>>>> Hi Vickrum,
>>>>
>>>> I would have proceeded with diagnosis as follows:
>>>>
>>>> 1. Analysis of sar report to check system health -cpu memory swap disk
>>>> etc.
>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>
>>>> 2. Make sure that all recommended Cassandra production settings
>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>
>>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>>> maximum token range but has significant lower data.I doubt that
>>>> bootstrapping happened properly.
>>>>
>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>
>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>
>>>> 6. Look for other common performance problems such as GC pauses to make
>>>> sure that dropped mutations are not caused by GC pauses.
>>>>
>>>>
>>>> Thanks
>>>> Anuj
>>>>
>>>> Sent from Yahoo Mail on Android
>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>
>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>> <vi...@idioplatform.com> wrote:
>>>> # nodetool compactionstats
>>>> pending tasks: 22
>>>> compaction type keyspace table
>>>> completed total unit progress
>>>> Compactionproduction_analytics interactions
>>>> 240410213 161172668724 bytes 0.15%
>>>>
>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>> 120815385 226295183 bytes 53.39%
>>>> Active compaction remaining time : 2h39m58s
>>>>
>>>> Worth mentioning that compactions haven't been running on this node
>>>> particularly often. The node's been performing badly regardless of whether
>>>> it's compacting or not.
>>>>
>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
>>>>
>>>>> What’s your output of `nodetool compactionstats`?
>>>>>
>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We recently added a new node to our cluster in order to replace a node
>>>>> that died (hardware failure we believe). For the next two weeks it had high
>>>>> disk and network activity. We replaced the server, but it's happened again.
>>>>> We've looked into memory allowances, disk performance, number of
>>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>>> issue.
>>>>>
>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>>>> comparison to the rest of the cluster, but that's likely a symptom, not a
>>>>> cause.
>>>>>
>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad
>>>>> node (D) has less data.
>>>>>
>>>>> Disk Activity[2] and Network activity[3] on this node is far higher
>>>>> than the rest.
>>>>>
>>>>> The only other difference this node has to the rest of the cluster is
>>>>> that its on the ext4 filesystem, whereas the rest are ext3, but we've done
>>>>> plenty of testing there and can't see how that would affect performance on
>>>>> this node so much.
>>>>>
>>>>> Nothing of note in system.log.
>>>>>
>>>>> What should our next step be in trying to diagnose this issue?
>>>>>
>>>>> Best wishes,
>>>>> Vic
>>>>>
>>>>> [0] `nodetool tpstats` output:
>>>>>
>>>>> Good node:
>>>>> Pool Name Active Pending Completed
>>>>> Blocked All time blocked
>>>>> ReadStage 0 0
>>>>> 46311521 0 0
>>>>> RequestResponseStage 0 0
>>>>> 23817366 0 0
>>>>> MutationStage 0 0
>>>>> 47389269 0 0
>>>>> ReadRepairStage 0 0
>>>>> 11108 0 0
>>>>> ReplicateOnWriteStage 0 0
>>>>> 0 0 0
>>>>> GossipStage 0 0
>>>>> 5259908 0 0
>>>>> CacheCleanupExecutor 0 0
>>>>> 0 0 0
>>>>> MigrationStage 0 0
>>>>> 30 0 0
>>>>> MemoryMeter 0 0
>>>>> 16563 0 0
>>>>> FlushWriter 0 0
>>>>> 39637 0 26
>>>>> ValidationExecutor 0 0
>>>>> 19013 0 0
>>>>> InternalResponseStage 0 0
>>>>> 9 0 0
>>>>> AntiEntropyStage 0 0
>>>>> 38026 0 0
>>>>> MemtablePostFlusher 0 0
>>>>> 81740 0 0
>>>>> MiscStage 0 0
>>>>> 19196 0 0
>>>>> PendingRangeCalculator 0 0
>>>>> 23 0 0
>>>>> CompactionExecutor 0 0
>>>>> 61629 0 0
>>>>> commitlog_archiver 0 0
>>>>> 0 0 0
>>>>> HintedHandoff 0 0
>>>>> 63 0 0
>>>>>
>>>>> Message type Dropped
>>>>> RANGE_SLICE 0
>>>>> READ_REPAIR 0
>>>>> PAGED_RANGE 0
>>>>> BINARY 0
>>>>> READ 640
>>>>> MUTATION 0
>>>>> _TRACE 0
>>>>> REQUEST_RESPONSE 0
>>>>> COUNTER_MUTATION 0
>>>>>
>>>>> Bad node:
>>>>> Pool Name Active Pending Completed
>>>>> Blocked All time blocked
>>>>> ReadStage 32 113
>>>>> 52216 0 0
>>>>> RequestResponseStage 0 0
>>>>> 4167 0 0
>>>>> MutationStage 0 0
>>>>> 127559 0 0
>>>>> ReadRepairStage 0 0
>>>>> 125 0 0
>>>>> ReplicateOnWriteStage 0 0
>>>>> 0 0 0
>>>>> GossipStage 0 0
>>>>> 9965 0 0
>>>>> CacheCleanupExecutor 0 0
>>>>> 0 0 0
>>>>> MigrationStage 0 0
>>>>> 0 0 0
>>>>> MemoryMeter 0 0
>>>>> 24 0 0
>>>>> FlushWriter 0 0
>>>>> 27 0 1
>>>>> ValidationExecutor 0 0
>>>>> 0 0 0
>>>>> InternalResponseStage 0 0
>>>>> 0 0 0
>>>>> AntiEntropyStage 0 0
>>>>> 0 0 0
>>>>> MemtablePostFlusher 0 0
>>>>> 96 0 0
>>>>> MiscStage 0 0
>>>>> 0 0 0
>>>>> PendingRangeCalculator 0 0
>>>>> 10 0 0
>>>>> CompactionExecutor 1 1
>>>>> 73 0 0
>>>>> commitlog_archiver 0 0
>>>>> 0 0 0
>>>>> HintedHandoff 0 0
>>>>> 15 0 0
>>>>>
>>>>> Message type Dropped
>>>>> RANGE_SLICE 130
>>>>> READ_REPAIR 1
>>>>> PAGED_RANGE 0
>>>>> BINARY 0
>>>>> READ 31032
>>>>> MUTATION 865
>>>>> _TRACE 0
>>>>> REQUEST_RESPONSE 7
>>>>> COUNTER_MUTATION 0
>>>>>
>>>>>
>>>>> [1] `nodetool status` output:
>>>>>
>>>>> Status=Up/Down
>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>> -- Address Load Tokens Owns Host
>>>>> ID Rack
>>>>> UN A (Good) 252.37 GB 256 23.0%
>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>>> UN B (Good) 245.91 GB 256 24.4%
>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>>> UN C (Good) 254.79 GB 256 23.7%
>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>>
>>>>> [2] Disk read/write ops:
>>>>>
>>>>>
>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>
>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>
>>>>> [3] Network in/out:
>>>>>
>>>>>
>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>
>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
Re: New node has high network and disk usage.
Posted by Anuj Wadehra <an...@yahoo.co.in>.
Ok. I saw dropped mutations on your cluster and full gc is a common cause for that.Can you just search the word GCInspector in system.log and share the frequency of minor and full gc. Moreover, are you printing promotion failures in gc logs?? Why full gc ia getting triggered??promotion failures or concurrent mode failures?
If you are on CMS, you need to fine tune your heap options to address full gc.
ThanksAnuj
Sent from Yahoo Mail on Android
On Thu, 14 Jan, 2016 at 12:57 am, James Griffin<ja...@idioplatform.com> wrote: I think I was incorrect in assuming GC wasn't an issue due to the lack of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked differences, though
comparing the startup flags on the two machines show the GC config is identical.:
$ jstat -gcutil S0 S1 E O P YGC YGCT FGC FGCT GCT2 5.08 0.00 55.72 18.24 59.90 25986 619.827 28 1.597 621.4243 0.00 0.00 22.79 17.87 59.99 422600 11225.979 668 57.383 11283.361
Here's typical output for iostat on nodes 2 & 3 as well:
$ iostat -dmx md0
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util2 md0 0.00 0.00 339.00 0.00 9.77 0.00 59.00 0.00 0.00 0.00 0.00 0.00 0.003 md0 0.00 0.00 2069.00 1.00 85.85 0.00 84.94 0.00 0.00 0.00 0.00 0.00 0.00
Griff
On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in> wrote:
Node 2 has slightly higher data but that should be ok. Not sure how read ops are so high when no IO intensive activity such as repair and compaction is running on node 3.May be you can try investigating logs to see whats happening.
Others on the mailing list could also share their views on the situation.
ThanksAnuj
Sent from Yahoo Mail on Android
On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin<ja...@idioplatform.com> wrote: Hi Anuj,
Below is the output of nodetool status. The nodes were replaced following the instructions in Datastax documentation for replacing running nodes since the nodes were running fine, it was that the servers had been incorrectly initialised and they thus had less disk space. The status below shows 2 has significantly higher load, however as I say 2 is operating normally and is running compactions, so I guess that's not an issue?
Datacenter: datacenter1=======================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns Host ID RackUN 1 253.59 GB 256 31.7% 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1UN 2 302.23 GB 256 35.3% faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1UN 3 265.02 GB 256 33.1% 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1
Griff
On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in> wrote:
Hi,
Revisiting the thread I can see that nodetool status had both good and bad nodes at same time. How do you replace nodes? When you say bad node..I understand that the node is no more usable even though Cassandra is UP? Is that correct?
If a node is in bad shape and not working, adding new node may trigger streaming huge data from bad node too. Have you considered using the procedure for replacing a dead node?
Please share Latest nodetool status.
nodetool output shared earlier:
`nodetool status` output:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN A (Good) 252.37 GB 256 23.0% 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
UN B (Good) 245.91 GB 256 24.4% 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
UN C (Good) 254.79 GB 256 23.7% f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
UN D (Bad) 163.85 GB 256 28.8% faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
ThanksAnuj
Sent from Yahoo Mail on Android
On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin<ja...@idioplatform.com> wrote: Hi all,
We’ve spent a few days running things but are in the same position. To add some more flavour:
- We have a 3-node ring, replication factor = 3. We’ve been running in this configuration for a few years without any real issues
- Nodes 2 & 3 are much newer than node 1. These two nodes were brought in to replace two other nodes which had failed RAID0 configuration and thus were lacking in disk space.
- When node 2 was brought into the ring, it exhibited high CPU wait, IO and load metrics
- We subsequently brought 3 into the ring: as soon as 3 was fully bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal levels. Those same stats on 3, however, sky-rocketed
- We’ve confirmed configuration across all three nodes are identical and in line with the recommended production settings
- We’ve run a full repair
- Node 2 is currently running compactions, 1 & 3 aren’t and have no pending
- There is no GC happening from what I can see. Node 1 has a GC log, but that’s not been written to since May last year
What we’re seeing at the moment is similar and normal stats on nodes 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
- Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
- Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
- Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
Can you recommend any next steps?
Griff
On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:
Hi Vickrum,
I would have proceeded with diagnosis as follows:
1. Analysis of sar report to check system health -cpu memory swap disk etc. System seems to be overloaded. This is evident from mutation drops.
2. Make sure that all recommended Cassandra production settings available at Datastax site are applied ,disable zone reclaim and THP.
3.Run full Repair on bad node and check data size. Node is owner of maximum token range but has significant lower data.I doubt that bootstrapping happened properly.
4.Compactionstats shows 22 pending compactions. Try throttling compactions via reducing cincurent compactors or compaction throughput.
5.Analyze logs to make sure bootstrapping happened without errors.
6. Look for other common performance problems such as GC pauses to make sure that dropped mutations are not caused by GC pauses.
ThanksAnuj
Sent from Yahoo Mail on Android
On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi<vi...@idioplatform.com> wrote: # nodetool compactionstats
pending tasks: 22
compaction type keyspace table completed total unit progress
Compactionproduction_analytics interactions 240410213 161172668724 bytes 0.15%
Compactionproduction_decisionsdecisions.decisions_q_idx 120815385 226295183 bytes 53.39%
Active compaction remaining time : 2h39m58s
Worth mentioning that compactions haven't been running on this node particularly often. The node's been performing badly regardless of whether it's compacting or not.
On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
What’s your output of `nodetool compactionstats`?
On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com> wrote:
Hi,
We recently added a new node to our cluster in order to replace a node that died (hardware failure we believe). For the next two weeks it had high disk and network activity. We replaced the server, but it's happened again. We've looked into memory allowances, disk performance, number of connections, and all the nodetool stats, but can't find the cause of the issue.
`nodetool tpstats`[0] shows a lot of active and pending threads, in comparison to the rest of the cluster, but that's likely a symptom, not a cause.
`nodetool status`[1] shows the cluster isn't quite balanced. The bad node (D) has less data.
Disk Activity[2] and Network activity[3] on this node is far higher than the rest.
The only other difference this node has to the rest of the cluster is that its on the ext4 filesystem, whereas the rest are ext3, but we've done plenty of testing there and can't see how that would affect performance on this node so much.
Nothing of note in system.log.
What should our next step be in trying to diagnose this issue?
Best wishes,
Vic
[0] `nodetool tpstats` output:
Good node:
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 46311521 0 0
RequestResponseStage 0 0 23817366 0 0
MutationStage 0 0 47389269 0 0
ReadRepairStage 0 0 11108 0 0
ReplicateOnWriteStage 0 0 0 0 0
GossipStage 0 0 5259908 0 0
CacheCleanupExecutor 0 0 0 0 0
MigrationStage 0 0 30 0 0
MemoryMeter 0 0 16563 0 0
FlushWriter 0 0 39637 0 26
ValidationExecutor 0 0 19013 0 0
InternalResponseStage 0 0 9 0 0
AntiEntropyStage 0 0 38026 0 0
MemtablePostFlusher 0 0 81740 0 0
MiscStage 0 0 19196 0 0
PendingRangeCalculator 0 0 23 0 0
CompactionExecutor 0 0 61629 0 0
commitlog_archiver 0 0 0 0 0
HintedHandoff 0 0 63 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
PAGED_RANGE 0
BINARY 0
READ 640
MUTATION 0
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
Bad node:
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 32 113 52216 0 0
RequestResponseStage 0 0 4167 0 0
MutationStage 0 0 127559 0 0
ReadRepairStage 0 0 125 0 0
ReplicateOnWriteStage 0 0 0 0 0
GossipStage 0 0 9965 0 0
CacheCleanupExecutor 0 0 0 0 0
MigrationStage 0 0 0 0 0
MemoryMeter 0 0 24 0 0
FlushWriter 0 0 27 0 1
ValidationExecutor 0 0 0 0 0
InternalResponseStage 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
MemtablePostFlusher 0 0 96 0 0
MiscStage 0 0 0 0 0
PendingRangeCalculator 0 0 10 0 0
CompactionExecutor 1 1 73 0 0
commitlog_archiver 0 0 0 0 0
HintedHandoff 0 0 15 0 0
Message type Dropped
RANGE_SLICE 130
READ_REPAIR 1
PAGED_RANGE 0
BINARY 0
READ 31032
MUTATION 865
_TRACE 0
REQUEST_RESPONSE 7
COUNTER_MUTATION 0
[1] `nodetool status` output:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN A (Good) 252.37 GB 256 23.0% 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
UN B (Good) 245.91 GB 256 24.4% 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
UN C (Good) 254.79 GB 256 23.7% f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
UN D (Bad) 163.85 GB 256 28.8% faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
[2] Disk read/write ops:
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
[3] Network in/out:
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
Re: New node has high network and disk usage.
Posted by James Griffin <ja...@idioplatform.com>.
I think I was incorrect in assuming GC wasn't an issue due to the lack of
logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
differences, though
comparing the startup flags on the two machines show the GC config is
identical.:
$ jstat -gcutil
S0 S1 E O P YGC YGCT FGC FGCT GCT
2 5.08 0.00 55.72 18.24 59.90 25986 619.827 28 1.597 621.424
3 0.00 0.00 22.79 17.87 59.99 422600 11225.979 668 57.383
11283.361
Here's typical output for iostat on nodes 2 & 3 as well:
$ iostat -dmx md0
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
2 md0 0.00 0.00 339.00 0.00 9.77 0.00
59.00 0.00 0.00 0.00 0.00 0.00 0.00
3 md0 0.00 0.00 2069.00 1.00 85.85 0.00
84.94 0.00 0.00 0.00 0.00 0.00 0.00
Griff
On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in> wrote:
> Node 2 has slightly higher data but that should be ok. Not sure how read
> ops are so high when no IO intensive activity such as repair and compaction
> is running on node 3.May be you can try investigating logs to see whats
> happening.
>
> Others on the mailing list could also share their views on the situation.
>
> Thanks
> Anuj
>
>
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
> <ja...@idioplatform.com> wrote:
> Hi Anuj,
>
> Below is the output of nodetool status. The nodes were replaced following
> the instructions in Datastax documentation for replacing running nodes
> since the nodes were running fine, it was that the servers had been
> incorrectly initialised and they thus had less disk space. The status below
> shows 2 has significantly higher load, however as I say 2 is operating
> normally and is running compactions, so I guess that's not an issue?
>
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host ID
> Rack
> UN 1 253.59 GB 256 31.7%
> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
> UN 2 302.23 GB 256 35.3%
> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
> UN 3 265.02 GB 256 33.1%
> 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1
>
> Griff
>
> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in> wrote:
>
>> Hi,
>>
>> Revisiting the thread I can see that nodetool status had both good and
>> bad nodes at same time. How do you replace nodes? When you say bad node..I
>> understand that the node is no more usable even though Cassandra is UP? Is
>> that correct?
>>
>> If a node is in bad shape and not working, adding new node may trigger
>> streaming huge data from bad node too. Have you considered using the
>> procedure for replacing a dead node?
>>
>> Please share Latest nodetool status.
>>
>> nodetool output shared earlier:
>>
>> `nodetool status` output:
>>
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> -- Address Load Tokens Owns Host
>> ID Rack
>> UN A (Good) 252.37 GB 256 23.0%
>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>> UN B (Good) 245.91 GB 256 24.4%
>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>> UN C (Good) 254.79 GB 256 23.7%
>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>> UN D (Bad) 163.85 GB 256 28.8%
>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>
>>
>>
>> Thanks
>> Anuj
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>
>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>> <ja...@idioplatform.com> wrote:
>> Hi all,
>>
>> We’ve spent a few days running things but are in the same position. To
>> add some more flavour:
>>
>>
>> - We have a 3-node ring, replication factor = 3. We’ve been running
>> in this configuration for a few years without any real issues
>> - Nodes 2 & 3 are much newer than node 1. These two nodes were
>> brought in to replace two other nodes which had failed RAID0 configuration
>> and thus were lacking in disk space.
>> - When node 2 was brought into the ring, it exhibited high CPU wait,
>> IO and load metrics
>> - We subsequently brought 3 into the ring: as soon as 3 was fully
>> bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>> levels. Those same stats on 3, however, sky-rocketed
>> - We’ve confirmed configuration across all three nodes are identical
>> and in line with the recommended production settings
>> - We’ve run a full repair
>> - Node 2 is currently running compactions, 1 & 3 aren’t and have no
>> pending
>> - There is no GC happening from what I can see. Node 1 has a GC log,
>> but that’s not been written to since May last year
>>
>>
>> What we’re seeing at the moment is similar and normal stats on nodes 1 &
>> 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>
>>
>> 1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>> 2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>> 3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>
>>
>> Can you recommend any next steps?
>>
>> Griff
>>
>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:
>>
>>> Hi Vickrum,
>>>
>>> I would have proceeded with diagnosis as follows:
>>>
>>> 1. Analysis of sar report to check system health -cpu memory swap disk
>>> etc.
>>> System seems to be overloaded. This is evident from mutation drops.
>>>
>>> 2. Make sure that all recommended Cassandra production settings
>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>
>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>> maximum token range but has significant lower data.I doubt that
>>> bootstrapping happened properly.
>>>
>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>> compactions via reducing cincurent compactors or compaction throughput.
>>>
>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>
>>> 6. Look for other common performance problems such as GC pauses to make
>>> sure that dropped mutations are not caused by GC pauses.
>>>
>>>
>>> Thanks
>>> Anuj
>>>
>>> Sent from Yahoo Mail on Android
>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>
>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>> <vi...@idioplatform.com> wrote:
>>> # nodetool compactionstats
>>> pending tasks: 22
>>> compaction type keyspace table
>>> completed total unit progress
>>> Compactionproduction_analytics interactions
>>> 240410213 161172668724 bytes 0.15%
>>>
>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>> 120815385 226295183 bytes 53.39%
>>> Active compaction remaining time : 2h39m58s
>>>
>>> Worth mentioning that compactions haven't been running on this node
>>> particularly often. The node's been performing badly regardless of whether
>>> it's compacting or not.
>>>
>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
>>>
>>>> What’s your output of `nodetool compactionstats`?
>>>>
>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We recently added a new node to our cluster in order to replace a node
>>>> that died (hardware failure we believe). For the next two weeks it had high
>>>> disk and network activity. We replaced the server, but it's happened again.
>>>> We've looked into memory allowances, disk performance, number of
>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>> issue.
>>>>
>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>>> comparison to the rest of the cluster, but that's likely a symptom, not a
>>>> cause.
>>>>
>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad
>>>> node (D) has less data.
>>>>
>>>> Disk Activity[2] and Network activity[3] on this node is far higher
>>>> than the rest.
>>>>
>>>> The only other difference this node has to the rest of the cluster is
>>>> that its on the ext4 filesystem, whereas the rest are ext3, but we've done
>>>> plenty of testing there and can't see how that would affect performance on
>>>> this node so much.
>>>>
>>>> Nothing of note in system.log.
>>>>
>>>> What should our next step be in trying to diagnose this issue?
>>>>
>>>> Best wishes,
>>>> Vic
>>>>
>>>> [0] `nodetool tpstats` output:
>>>>
>>>> Good node:
>>>> Pool Name Active Pending Completed
>>>> Blocked All time blocked
>>>> ReadStage 0 0
>>>> 46311521 0 0
>>>> RequestResponseStage 0 0
>>>> 23817366 0 0
>>>> MutationStage 0 0
>>>> 47389269 0 0
>>>> ReadRepairStage 0 0
>>>> 11108 0 0
>>>> ReplicateOnWriteStage 0 0
>>>> 0 0 0
>>>> GossipStage 0 0
>>>> 5259908 0 0
>>>> CacheCleanupExecutor 0 0
>>>> 0 0 0
>>>> MigrationStage 0 0
>>>> 30 0 0
>>>> MemoryMeter 0 0
>>>> 16563 0 0
>>>> FlushWriter 0 0
>>>> 39637 0 26
>>>> ValidationExecutor 0 0
>>>> 19013 0 0
>>>> InternalResponseStage 0 0
>>>> 9 0 0
>>>> AntiEntropyStage 0 0
>>>> 38026 0 0
>>>> MemtablePostFlusher 0 0
>>>> 81740 0 0
>>>> MiscStage 0 0
>>>> 19196 0 0
>>>> PendingRangeCalculator 0 0
>>>> 23 0 0
>>>> CompactionExecutor 0 0
>>>> 61629 0 0
>>>> commitlog_archiver 0 0
>>>> 0 0 0
>>>> HintedHandoff 0 0
>>>> 63 0 0
>>>>
>>>> Message type Dropped
>>>> RANGE_SLICE 0
>>>> READ_REPAIR 0
>>>> PAGED_RANGE 0
>>>> BINARY 0
>>>> READ 640
>>>> MUTATION 0
>>>> _TRACE 0
>>>> REQUEST_RESPONSE 0
>>>> COUNTER_MUTATION 0
>>>>
>>>> Bad node:
>>>> Pool Name Active Pending Completed
>>>> Blocked All time blocked
>>>> ReadStage 32 113
>>>> 52216 0 0
>>>> RequestResponseStage 0 0
>>>> 4167 0 0
>>>> MutationStage 0 0
>>>> 127559 0 0
>>>> ReadRepairStage 0 0
>>>> 125 0 0
>>>> ReplicateOnWriteStage 0 0
>>>> 0 0 0
>>>> GossipStage 0 0
>>>> 9965 0 0
>>>> CacheCleanupExecutor 0 0
>>>> 0 0 0
>>>> MigrationStage 0 0
>>>> 0 0 0
>>>> MemoryMeter 0 0
>>>> 24 0 0
>>>> FlushWriter 0 0
>>>> 27 0 1
>>>> ValidationExecutor 0 0
>>>> 0 0 0
>>>> InternalResponseStage 0 0
>>>> 0 0 0
>>>> AntiEntropyStage 0 0
>>>> 0 0 0
>>>> MemtablePostFlusher 0 0
>>>> 96 0 0
>>>> MiscStage 0 0
>>>> 0 0 0
>>>> PendingRangeCalculator 0 0
>>>> 10 0 0
>>>> CompactionExecutor 1 1
>>>> 73 0 0
>>>> commitlog_archiver 0 0
>>>> 0 0 0
>>>> HintedHandoff 0 0
>>>> 15 0 0
>>>>
>>>> Message type Dropped
>>>> RANGE_SLICE 130
>>>> READ_REPAIR 1
>>>> PAGED_RANGE 0
>>>> BINARY 0
>>>> READ 31032
>>>> MUTATION 865
>>>> _TRACE 0
>>>> REQUEST_RESPONSE 7
>>>> COUNTER_MUTATION 0
>>>>
>>>>
>>>> [1] `nodetool status` output:
>>>>
>>>> Status=Up/Down
>>>> |/ State=Normal/Leaving/Joining/Moving
>>>> -- Address Load Tokens Owns Host
>>>> ID Rack
>>>> UN A (Good) 252.37 GB 256 23.0%
>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>>> UN B (Good) 245.91 GB 256 24.4%
>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>>> UN C (Good) 254.79 GB 256 23.7%
>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>>> UN D (Bad) 163.85 GB 256 28.8%
>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>>
>>>> [2] Disk read/write ops:
>>>>
>>>>
>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>
>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>
>>>> [3] Network in/out:
>>>>
>>>>
>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>
>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>
>>>>
>>>>
>>>
>>
>
Re: New node has high network and disk usage.
Posted by Anuj Wadehra <an...@yahoo.co.in>.
Node 2 has slightly higher data but that should be ok. Not sure how read ops are so high when no IO intensive activity such as repair and compaction is running on node 3.May be you can try investigating logs to see whats happening.
Others on the mailing list could also share their views on the situation.
ThanksAnuj
Sent from Yahoo Mail on Android
On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin<ja...@idioplatform.com> wrote: Hi Anuj,
Below is the output of nodetool status. The nodes were replaced following the instructions in Datastax documentation for replacing running nodes since the nodes were running fine, it was that the servers had been incorrectly initialised and they thus had less disk space. The status below shows 2 has significantly higher load, however as I say 2 is operating normally and is running compactions, so I guess that's not an issue?
Datacenter: datacenter1=======================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving-- Address Load Tokens Owns Host ID RackUN 1 253.59 GB 256 31.7% 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1UN 2 302.23 GB 256 35.3% faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1UN 3 265.02 GB 256 33.1% 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1
Griff
On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in> wrote:
Hi,
Revisiting the thread I can see that nodetool status had both good and bad nodes at same time. How do you replace nodes? When you say bad node..I understand that the node is no more usable even though Cassandra is UP? Is that correct?
If a node is in bad shape and not working, adding new node may trigger streaming huge data from bad node too. Have you considered using the procedure for replacing a dead node?
Please share Latest nodetool status.
nodetool output shared earlier:
`nodetool status` output:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN A (Good) 252.37 GB 256 23.0% 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
UN B (Good) 245.91 GB 256 24.4% 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
UN C (Good) 254.79 GB 256 23.7% f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
UN D (Bad) 163.85 GB 256 28.8% faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
ThanksAnuj
Sent from Yahoo Mail on Android
On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin<ja...@idioplatform.com> wrote: Hi all,
We’ve spent a few days running things but are in the same position. To add some more flavour:
- We have a 3-node ring, replication factor = 3. We’ve been running in this configuration for a few years without any real issues
- Nodes 2 & 3 are much newer than node 1. These two nodes were brought in to replace two other nodes which had failed RAID0 configuration and thus were lacking in disk space.
- When node 2 was brought into the ring, it exhibited high CPU wait, IO and load metrics
- We subsequently brought 3 into the ring: as soon as 3 was fully bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal levels. Those same stats on 3, however, sky-rocketed
- We’ve confirmed configuration across all three nodes are identical and in line with the recommended production settings
- We’ve run a full repair
- Node 2 is currently running compactions, 1 & 3 aren’t and have no pending
- There is no GC happening from what I can see. Node 1 has a GC log, but that’s not been written to since May last year
What we’re seeing at the moment is similar and normal stats on nodes 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
- Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
- Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
- Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
Can you recommend any next steps?
Griff
On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:
Hi Vickrum,
I would have proceeded with diagnosis as follows:
1. Analysis of sar report to check system health -cpu memory swap disk etc. System seems to be overloaded. This is evident from mutation drops.
2. Make sure that all recommended Cassandra production settings available at Datastax site are applied ,disable zone reclaim and THP.
3.Run full Repair on bad node and check data size. Node is owner of maximum token range but has significant lower data.I doubt that bootstrapping happened properly.
4.Compactionstats shows 22 pending compactions. Try throttling compactions via reducing cincurent compactors or compaction throughput.
5.Analyze logs to make sure bootstrapping happened without errors.
6. Look for other common performance problems such as GC pauses to make sure that dropped mutations are not caused by GC pauses.
ThanksAnuj
Sent from Yahoo Mail on Android
On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi<vi...@idioplatform.com> wrote: # nodetool compactionstats
pending tasks: 22
compaction type keyspace table completed total unit progress
Compactionproduction_analytics interactions 240410213 161172668724 bytes 0.15%
Compactionproduction_decisionsdecisions.decisions_q_idx 120815385 226295183 bytes 53.39%
Active compaction remaining time : 2h39m58s
Worth mentioning that compactions haven't been running on this node particularly often. The node's been performing badly regardless of whether it's compacting or not.
On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
What’s your output of `nodetool compactionstats`?
On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com> wrote:
Hi,
We recently added a new node to our cluster in order to replace a node that died (hardware failure we believe). For the next two weeks it had high disk and network activity. We replaced the server, but it's happened again. We've looked into memory allowances, disk performance, number of connections, and all the nodetool stats, but can't find the cause of the issue.
`nodetool tpstats`[0] shows a lot of active and pending threads, in comparison to the rest of the cluster, but that's likely a symptom, not a cause.
`nodetool status`[1] shows the cluster isn't quite balanced. The bad node (D) has less data.
Disk Activity[2] and Network activity[3] on this node is far higher than the rest.
The only other difference this node has to the rest of the cluster is that its on the ext4 filesystem, whereas the rest are ext3, but we've done plenty of testing there and can't see how that would affect performance on this node so much.
Nothing of note in system.log.
What should our next step be in trying to diagnose this issue?
Best wishes,
Vic
[0] `nodetool tpstats` output:
Good node:
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 46311521 0 0
RequestResponseStage 0 0 23817366 0 0
MutationStage 0 0 47389269 0 0
ReadRepairStage 0 0 11108 0 0
ReplicateOnWriteStage 0 0 0 0 0
GossipStage 0 0 5259908 0 0
CacheCleanupExecutor 0 0 0 0 0
MigrationStage 0 0 30 0 0
MemoryMeter 0 0 16563 0 0
FlushWriter 0 0 39637 0 26
ValidationExecutor 0 0 19013 0 0
InternalResponseStage 0 0 9 0 0
AntiEntropyStage 0 0 38026 0 0
MemtablePostFlusher 0 0 81740 0 0
MiscStage 0 0 19196 0 0
PendingRangeCalculator 0 0 23 0 0
CompactionExecutor 0 0 61629 0 0
commitlog_archiver 0 0 0 0 0
HintedHandoff 0 0 63 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
PAGED_RANGE 0
BINARY 0
READ 640
MUTATION 0
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
Bad node:
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 32 113 52216 0 0
RequestResponseStage 0 0 4167 0 0
MutationStage 0 0 127559 0 0
ReadRepairStage 0 0 125 0 0
ReplicateOnWriteStage 0 0 0 0 0
GossipStage 0 0 9965 0 0
CacheCleanupExecutor 0 0 0 0 0
MigrationStage 0 0 0 0 0
MemoryMeter 0 0 24 0 0
FlushWriter 0 0 27 0 1
ValidationExecutor 0 0 0 0 0
InternalResponseStage 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
MemtablePostFlusher 0 0 96 0 0
MiscStage 0 0 0 0 0
PendingRangeCalculator 0 0 10 0 0
CompactionExecutor 1 1 73 0 0
commitlog_archiver 0 0 0 0 0
HintedHandoff 0 0 15 0 0
Message type Dropped
RANGE_SLICE 130
READ_REPAIR 1
PAGED_RANGE 0
BINARY 0
READ 31032
MUTATION 865
_TRACE 0
REQUEST_RESPONSE 7
COUNTER_MUTATION 0
[1] `nodetool status` output:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN A (Good) 252.37 GB 256 23.0% 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
UN B (Good) 245.91 GB 256 24.4% 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
UN C (Good) 254.79 GB 256 23.7% f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
UN D (Bad) 163.85 GB 256 28.8% faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
[2] Disk read/write ops:
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
[3] Network in/out:
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
Re: New node has high network and disk usage.
Posted by James Griffin <ja...@idioplatform.com>.
Hi Anuj,
Below is the output of nodetool status. The nodes were replaced following
the instructions in Datastax documentation for replacing running nodes
since the nodes were running fine, it was that the servers had been
incorrectly initialised and they thus had less disk space. The status below
shows 2 has significantly higher load, however as I say 2 is operating
normally and is running compactions, so I guess that's not an issue?
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 1 253.59 GB 256 31.7%
6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
UN 2 302.23 GB 256 35.3%
faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
UN 3 265.02 GB 256 33.1%
74b15507-db5c-45df-81db-6e5bcb7438a3 rack1
Griff
On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in> wrote:
> Hi,
>
> Revisiting the thread I can see that nodetool status had both good and bad
> nodes at same time. How do you replace nodes? When you say bad node..I
> understand that the node is no more usable even though Cassandra is UP? Is
> that correct?
>
> If a node is in bad shape and not working, adding new node may trigger
> streaming huge data from bad node too. Have you considered using the
> procedure for replacing a dead node?
>
> Please share Latest nodetool status.
>
> nodetool output shared earlier:
>
> `nodetool status` output:
>
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host
> ID Rack
> UN A (Good) 252.37 GB 256 23.0%
> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
> UN B (Good) 245.91 GB 256 24.4%
> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
> UN C (Good) 254.79 GB 256 23.7%
> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
> UN D (Bad) 163.85 GB 256 28.8%
> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
> <ja...@idioplatform.com> wrote:
> Hi all,
>
> We’ve spent a few days running things but are in the same position. To add
> some more flavour:
>
>
> - We have a 3-node ring, replication factor = 3. We’ve been running in
> this configuration for a few years without any real issues
> - Nodes 2 & 3 are much newer than node 1. These two nodes were brought
> in to replace two other nodes which had failed RAID0 configuration and thus
> were lacking in disk space.
> - When node 2 was brought into the ring, it exhibited high CPU wait,
> IO and load metrics
> - We subsequently brought 3 into the ring: as soon as 3 was fully
> bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
> levels. Those same stats on 3, however, sky-rocketed
> - We’ve confirmed configuration across all three nodes are identical
> and in line with the recommended production settings
> - We’ve run a full repair
> - Node 2 is currently running compactions, 1 & 3 aren’t and have no
> pending
> - There is no GC happening from what I can see. Node 1 has a GC log,
> but that’s not been written to since May last year
>
>
> What we’re seeing at the moment is similar and normal stats on nodes 1 &
> 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>
>
> 1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
> 2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
> 3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>
>
> Can you recommend any next steps?
>
> Griff
>
> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:
>
>> Hi Vickrum,
>>
>> I would have proceeded with diagnosis as follows:
>>
>> 1. Analysis of sar report to check system health -cpu memory swap disk
>> etc.
>> System seems to be overloaded. This is evident from mutation drops.
>>
>> 2. Make sure that all recommended Cassandra production settings
>> available at Datastax site are applied ,disable zone reclaim and THP.
>>
>> 3.Run full Repair on bad node and check data size. Node is owner of
>> maximum token range but has significant lower data.I doubt that
>> bootstrapping happened properly.
>>
>> 4.Compactionstats shows 22 pending compactions. Try throttling
>> compactions via reducing cincurent compactors or compaction throughput.
>>
>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>
>> 6. Look for other common performance problems such as GC pauses to make
>> sure that dropped mutations are not caused by GC pauses.
>>
>>
>> Thanks
>> Anuj
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>
>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>> <vi...@idioplatform.com> wrote:
>> # nodetool compactionstats
>> pending tasks: 22
>> compaction type keyspace table
>> completed total unit progress
>> Compactionproduction_analytics interactions
>> 240410213 161172668724 bytes 0.15%
>>
>> Compactionproduction_decisionsdecisions.decisions_q_idx
>> 120815385 226295183 bytes 53.39%
>> Active compaction remaining time : 2h39m58s
>>
>> Worth mentioning that compactions haven't been running on this node
>> particularly often. The node's been performing badly regardless of whether
>> it's compacting or not.
>>
>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
>>
>>> What’s your output of `nodetool compactionstats`?
>>>
>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> We recently added a new node to our cluster in order to replace a node
>>> that died (hardware failure we believe). For the next two weeks it had high
>>> disk and network activity. We replaced the server, but it's happened again.
>>> We've looked into memory allowances, disk performance, number of
>>> connections, and all the nodetool stats, but can't find the cause of the
>>> issue.
>>>
>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>> comparison to the rest of the cluster, but that's likely a symptom, not a
>>> cause.
>>>
>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad
>>> node (D) has less data.
>>>
>>> Disk Activity[2] and Network activity[3] on this node is far higher than
>>> the rest.
>>>
>>> The only other difference this node has to the rest of the cluster is
>>> that its on the ext4 filesystem, whereas the rest are ext3, but we've done
>>> plenty of testing there and can't see how that would affect performance on
>>> this node so much.
>>>
>>> Nothing of note in system.log.
>>>
>>> What should our next step be in trying to diagnose this issue?
>>>
>>> Best wishes,
>>> Vic
>>>
>>> [0] `nodetool tpstats` output:
>>>
>>> Good node:
>>> Pool Name Active Pending Completed
>>> Blocked All time blocked
>>> ReadStage 0 0 46311521
>>> 0 0
>>> RequestResponseStage 0 0 23817366
>>> 0 0
>>> MutationStage 0 0 47389269
>>> 0 0
>>> ReadRepairStage 0 0 11108
>>> 0 0
>>> ReplicateOnWriteStage 0 0 0
>>> 0 0
>>> GossipStage 0 0 5259908
>>> 0 0
>>> CacheCleanupExecutor 0 0 0
>>> 0 0
>>> MigrationStage 0 0 30
>>> 0 0
>>> MemoryMeter 0 0 16563
>>> 0 0
>>> FlushWriter 0 0 39637
>>> 0 26
>>> ValidationExecutor 0 0 19013
>>> 0 0
>>> InternalResponseStage 0 0 9
>>> 0 0
>>> AntiEntropyStage 0 0 38026
>>> 0 0
>>> MemtablePostFlusher 0 0 81740
>>> 0 0
>>> MiscStage 0 0 19196
>>> 0 0
>>> PendingRangeCalculator 0 0 23
>>> 0 0
>>> CompactionExecutor 0 0 61629
>>> 0 0
>>> commitlog_archiver 0 0 0
>>> 0 0
>>> HintedHandoff 0 0 63
>>> 0 0
>>>
>>> Message type Dropped
>>> RANGE_SLICE 0
>>> READ_REPAIR 0
>>> PAGED_RANGE 0
>>> BINARY 0
>>> READ 640
>>> MUTATION 0
>>> _TRACE 0
>>> REQUEST_RESPONSE 0
>>> COUNTER_MUTATION 0
>>>
>>> Bad node:
>>> Pool Name Active Pending Completed
>>> Blocked All time blocked
>>> ReadStage 32 113 52216
>>> 0 0
>>> RequestResponseStage 0 0 4167
>>> 0 0
>>> MutationStage 0 0 127559
>>> 0 0
>>> ReadRepairStage 0 0 125
>>> 0 0
>>> ReplicateOnWriteStage 0 0 0
>>> 0 0
>>> GossipStage 0 0 9965
>>> 0 0
>>> CacheCleanupExecutor 0 0 0
>>> 0 0
>>> MigrationStage 0 0 0
>>> 0 0
>>> MemoryMeter 0 0 24
>>> 0 0
>>> FlushWriter 0 0 27
>>> 0 1
>>> ValidationExecutor 0 0 0
>>> 0 0
>>> InternalResponseStage 0 0 0
>>> 0 0
>>> AntiEntropyStage 0 0 0
>>> 0 0
>>> MemtablePostFlusher 0 0 96
>>> 0 0
>>> MiscStage 0 0 0
>>> 0 0
>>> PendingRangeCalculator 0 0 10
>>> 0 0
>>> CompactionExecutor 1 1 73
>>> 0 0
>>> commitlog_archiver 0 0 0
>>> 0 0
>>> HintedHandoff 0 0 15
>>> 0 0
>>>
>>> Message type Dropped
>>> RANGE_SLICE 130
>>> READ_REPAIR 1
>>> PAGED_RANGE 0
>>> BINARY 0
>>> READ 31032
>>> MUTATION 865
>>> _TRACE 0
>>> REQUEST_RESPONSE 7
>>> COUNTER_MUTATION 0
>>>
>>>
>>> [1] `nodetool status` output:
>>>
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> -- Address Load Tokens Owns Host
>>> ID Rack
>>> UN A (Good) 252.37 GB 256 23.0%
>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>>> UN B (Good) 245.91 GB 256 24.4%
>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>>> UN C (Good) 254.79 GB 256 23.7%
>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>>> UN D (Bad) 163.85 GB 256 28.8%
>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>>
>>> [2] Disk read/write ops:
>>>
>>>
>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>
>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>
>>> [3] Network in/out:
>>>
>>>
>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>
>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>
>>>
>>>
>>
>
Re: New node has high network and disk usage.
Posted by Anuj Wadehra <an...@yahoo.co.in>.
Hi,
Revisiting the thread I can see that nodetool status had both good and bad nodes at same time. How do you replace nodes? When you say bad node..I understand that the node is no more usable even though Cassandra is UP? Is that correct?
If a node is in bad shape and not working, adding new node may trigger streaming huge data from bad node too. Have you considered using the procedure for replacing a dead node?
Please share Latest nodetool status.
nodetool output shared earlier:
`nodetool status` output:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN A (Good) 252.37 GB 256 23.0% 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
UN B (Good) 245.91 GB 256 24.4% 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
UN C (Good) 254.79 GB 256 23.7% f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
UN D (Bad) 163.85 GB 256 28.8% faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
ThanksAnuj
Sent from Yahoo Mail on Android
On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin<ja...@idioplatform.com> wrote: Hi all,
We’ve spent a few days running things but are in the same position. To add some more flavour:
- We have a 3-node ring, replication factor = 3. We’ve been running in this configuration for a few years without any real issues
- Nodes 2 & 3 are much newer than node 1. These two nodes were brought in to replace two other nodes which had failed RAID0 configuration and thus were lacking in disk space.
- When node 2 was brought into the ring, it exhibited high CPU wait, IO and load metrics
- We subsequently brought 3 into the ring: as soon as 3 was fully bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal levels. Those same stats on 3, however, sky-rocketed
- We’ve confirmed configuration across all three nodes are identical and in line with the recommended production settings
- We’ve run a full repair
- Node 2 is currently running compactions, 1 & 3 aren’t and have no pending
- There is no GC happening from what I can see. Node 1 has a GC log, but that’s not been written to since May last year
What we’re seeing at the moment is similar and normal stats on nodes 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
- Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
- Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
- Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
Can you recommend any next steps?
Griff
On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:
Hi Vickrum,
I would have proceeded with diagnosis as follows:
1. Analysis of sar report to check system health -cpu memory swap disk etc. System seems to be overloaded. This is evident from mutation drops.
2. Make sure that all recommended Cassandra production settings available at Datastax site are applied ,disable zone reclaim and THP.
3.Run full Repair on bad node and check data size. Node is owner of maximum token range but has significant lower data.I doubt that bootstrapping happened properly.
4.Compactionstats shows 22 pending compactions. Try throttling compactions via reducing cincurent compactors or compaction throughput.
5.Analyze logs to make sure bootstrapping happened without errors.
6. Look for other common performance problems such as GC pauses to make sure that dropped mutations are not caused by GC pauses.
ThanksAnuj
Sent from Yahoo Mail on Android
On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi<vi...@idioplatform.com> wrote: # nodetool compactionstats
pending tasks: 22
compaction type keyspace table completed total unit progress
Compactionproduction_analytics interactions 240410213 161172668724 bytes 0.15%
Compactionproduction_decisionsdecisions.decisions_q_idx 120815385 226295183 bytes 53.39%
Active compaction remaining time : 2h39m58s
Worth mentioning that compactions haven't been running on this node particularly often. The node's been performing badly regardless of whether it's compacting or not.
On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
What’s your output of `nodetool compactionstats`?
On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com> wrote:
Hi,
We recently added a new node to our cluster in order to replace a node that died (hardware failure we believe). For the next two weeks it had high disk and network activity. We replaced the server, but it's happened again. We've looked into memory allowances, disk performance, number of connections, and all the nodetool stats, but can't find the cause of the issue.
`nodetool tpstats`[0] shows a lot of active and pending threads, in comparison to the rest of the cluster, but that's likely a symptom, not a cause.
`nodetool status`[1] shows the cluster isn't quite balanced. The bad node (D) has less data.
Disk Activity[2] and Network activity[3] on this node is far higher than the rest.
The only other difference this node has to the rest of the cluster is that its on the ext4 filesystem, whereas the rest are ext3, but we've done plenty of testing there and can't see how that would affect performance on this node so much.
Nothing of note in system.log.
What should our next step be in trying to diagnose this issue?
Best wishes,
Vic
[0] `nodetool tpstats` output:
Good node:
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 46311521 0 0
RequestResponseStage 0 0 23817366 0 0
MutationStage 0 0 47389269 0 0
ReadRepairStage 0 0 11108 0 0
ReplicateOnWriteStage 0 0 0 0 0
GossipStage 0 0 5259908 0 0
CacheCleanupExecutor 0 0 0 0 0
MigrationStage 0 0 30 0 0
MemoryMeter 0 0 16563 0 0
FlushWriter 0 0 39637 0 26
ValidationExecutor 0 0 19013 0 0
InternalResponseStage 0 0 9 0 0
AntiEntropyStage 0 0 38026 0 0
MemtablePostFlusher 0 0 81740 0 0
MiscStage 0 0 19196 0 0
PendingRangeCalculator 0 0 23 0 0
CompactionExecutor 0 0 61629 0 0
commitlog_archiver 0 0 0 0 0
HintedHandoff 0 0 63 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
PAGED_RANGE 0
BINARY 0
READ 640
MUTATION 0
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
Bad node:
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 32 113 52216 0 0
RequestResponseStage 0 0 4167 0 0
MutationStage 0 0 127559 0 0
ReadRepairStage 0 0 125 0 0
ReplicateOnWriteStage 0 0 0 0 0
GossipStage 0 0 9965 0 0
CacheCleanupExecutor 0 0 0 0 0
MigrationStage 0 0 0 0 0
MemoryMeter 0 0 24 0 0
FlushWriter 0 0 27 0 1
ValidationExecutor 0 0 0 0 0
InternalResponseStage 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
MemtablePostFlusher 0 0 96 0 0
MiscStage 0 0 0 0 0
PendingRangeCalculator 0 0 10 0 0
CompactionExecutor 1 1 73 0 0
commitlog_archiver 0 0 0 0 0
HintedHandoff 0 0 15 0 0
Message type Dropped
RANGE_SLICE 130
READ_REPAIR 1
PAGED_RANGE 0
BINARY 0
READ 31032
MUTATION 865
_TRACE 0
REQUEST_RESPONSE 7
COUNTER_MUTATION 0
[1] `nodetool status` output:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN A (Good) 252.37 GB 256 23.0% 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
UN B (Good) 245.91 GB 256 24.4% 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
UN C (Good) 254.79 GB 256 23.7% f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
UN D (Bad) 163.85 GB 256 28.8% faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
[2] Disk read/write ops:
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
[3] Network in/out:
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
Re: New node has high network and disk usage.
Posted by James Griffin <ja...@idioplatform.com>.
Hi all,
We’ve spent a few days running things but are in the same position. To add
some more flavour:
- We have a 3-node ring, replication factor = 3. We’ve been running in
this configuration for a few years without any real issues
- Nodes 2 & 3 are much newer than node 1. These two nodes were brought
in to replace two other nodes which had failed RAID0 configuration and thus
were lacking in disk space.
- When node 2 was brought into the ring, it exhibited high CPU wait, IO
and load metrics
- We subsequently brought 3 into the ring: as soon as 3 was fully
bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
levels. Those same stats on 3, however, sky-rocketed
- We’ve confirmed configuration across all three nodes are identical and
in line with the recommended production settings
- We’ve run a full repair
- Node 2 is currently running compactions, 1 & 3 aren’t and have no
pending
- There is no GC happening from what I can see. Node 1 has a GC log, but
that’s not been written to since May last year
What we’re seeing at the moment is similar and normal stats on nodes 1 & 2,
but high CPU wait, IO and load stats on 3. As a snapshot:
1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
Can you recommend any next steps?
Griff
On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:
> Hi Vickrum,
>
> I would have proceeded with diagnosis as follows:
>
> 1. Analysis of sar report to check system health -cpu memory swap disk
> etc.
> System seems to be overloaded. This is evident from mutation drops.
>
> 2. Make sure that all recommended Cassandra production settings available
> at Datastax site are applied ,disable zone reclaim and THP.
>
> 3.Run full Repair on bad node and check data size. Node is owner of
> maximum token range but has significant lower data.I doubt that
> bootstrapping happened properly.
>
> 4.Compactionstats shows 22 pending compactions. Try throttling compactions
> via reducing cincurent compactors or compaction throughput.
>
> 5.Analyze logs to make sure bootstrapping happened without errors.
>
> 6. Look for other common performance problems such as GC pauses to make
> sure that dropped mutations are not caused by GC pauses.
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
> <vi...@idioplatform.com> wrote:
> # nodetool compactionstats
> pending tasks: 22
> compaction type keyspace table
> completed total unit progress
> Compactionproduction_analytics interactions
> 240410213 161172668724 bytes 0.15%
>
> Compactionproduction_decisionsdecisions.decisions_q_idx
> 120815385 226295183 bytes 53.39%
> Active compaction remaining time : 2h39m58s
>
> Worth mentioning that compactions haven't been running on this node
> particularly often. The node's been performing badly regardless of whether
> it's compacting or not.
>
> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
>
>> What’s your output of `nodetool compactionstats`?
>>
>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com>
>> wrote:
>>
>> Hi,
>>
>> We recently added a new node to our cluster in order to replace a node
>> that died (hardware failure we believe). For the next two weeks it had high
>> disk and network activity. We replaced the server, but it's happened again.
>> We've looked into memory allowances, disk performance, number of
>> connections, and all the nodetool stats, but can't find the cause of the
>> issue.
>>
>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>> comparison to the rest of the cluster, but that's likely a symptom, not a
>> cause.
>>
>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad node
>> (D) has less data.
>>
>> Disk Activity[2] and Network activity[3] on this node is far higher than
>> the rest.
>>
>> The only other difference this node has to the rest of the cluster is
>> that its on the ext4 filesystem, whereas the rest are ext3, but we've done
>> plenty of testing there and can't see how that would affect performance on
>> this node so much.
>>
>> Nothing of note in system.log.
>>
>> What should our next step be in trying to diagnose this issue?
>>
>> Best wishes,
>> Vic
>>
>> [0] `nodetool tpstats` output:
>>
>> Good node:
>> Pool Name Active Pending Completed
>> Blocked All time blocked
>> ReadStage 0 0 46311521
>> 0 0
>> RequestResponseStage 0 0 23817366
>> 0 0
>> MutationStage 0 0 47389269
>> 0 0
>> ReadRepairStage 0 0 11108
>> 0 0
>> ReplicateOnWriteStage 0 0 0
>> 0 0
>> GossipStage 0 0 5259908
>> 0 0
>> CacheCleanupExecutor 0 0 0
>> 0 0
>> MigrationStage 0 0 30
>> 0 0
>> MemoryMeter 0 0 16563
>> 0 0
>> FlushWriter 0 0 39637
>> 0 26
>> ValidationExecutor 0 0 19013
>> 0 0
>> InternalResponseStage 0 0 9
>> 0 0
>> AntiEntropyStage 0 0 38026
>> 0 0
>> MemtablePostFlusher 0 0 81740
>> 0 0
>> MiscStage 0 0 19196
>> 0 0
>> PendingRangeCalculator 0 0 23
>> 0 0
>> CompactionExecutor 0 0 61629
>> 0 0
>> commitlog_archiver 0 0 0
>> 0 0
>> HintedHandoff 0 0 63
>> 0 0
>>
>> Message type Dropped
>> RANGE_SLICE 0
>> READ_REPAIR 0
>> PAGED_RANGE 0
>> BINARY 0
>> READ 640
>> MUTATION 0
>> _TRACE 0
>> REQUEST_RESPONSE 0
>> COUNTER_MUTATION 0
>>
>> Bad node:
>> Pool Name Active Pending Completed
>> Blocked All time blocked
>> ReadStage 32 113 52216
>> 0 0
>> RequestResponseStage 0 0 4167
>> 0 0
>> MutationStage 0 0 127559
>> 0 0
>> ReadRepairStage 0 0 125
>> 0 0
>> ReplicateOnWriteStage 0 0 0
>> 0 0
>> GossipStage 0 0 9965
>> 0 0
>> CacheCleanupExecutor 0 0 0
>> 0 0
>> MigrationStage 0 0 0
>> 0 0
>> MemoryMeter 0 0 24
>> 0 0
>> FlushWriter 0 0 27
>> 0 1
>> ValidationExecutor 0 0 0
>> 0 0
>> InternalResponseStage 0 0 0
>> 0 0
>> AntiEntropyStage 0 0 0
>> 0 0
>> MemtablePostFlusher 0 0 96
>> 0 0
>> MiscStage 0 0 0
>> 0 0
>> PendingRangeCalculator 0 0 10
>> 0 0
>> CompactionExecutor 1 1 73
>> 0 0
>> commitlog_archiver 0 0 0
>> 0 0
>> HintedHandoff 0 0 15
>> 0 0
>>
>> Message type Dropped
>> RANGE_SLICE 130
>> READ_REPAIR 1
>> PAGED_RANGE 0
>> BINARY 0
>> READ 31032
>> MUTATION 865
>> _TRACE 0
>> REQUEST_RESPONSE 7
>> COUNTER_MUTATION 0
>>
>>
>> [1] `nodetool status` output:
>>
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> -- Address Load Tokens Owns Host
>> ID Rack
>> UN A (Good) 252.37 GB 256 23.0%
>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
>> UN B (Good) 245.91 GB 256 24.4%
>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
>> UN C (Good) 254.79 GB 256 23.7%
>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
>> UN D (Bad) 163.85 GB 256 28.8%
>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>>
>> [2] Disk read/write ops:
>>
>>
>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>
>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>
>> [3] Network in/out:
>>
>>
>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>
>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>
>>
>>
>
Re: New node has high network and disk usage.
Posted by Anuj Wadehra <an...@yahoo.co.in>.
Hi Vickrum,
I would have proceeded with diagnosis as follows:
1. Analysis of sar report to check system health -cpu memory swap disk etc. System seems to be overloaded. This is evident from mutation drops.
2. Make sure that all recommended Cassandra production settings available at Datastax site are applied ,disable zone reclaim and THP.
3.Run full Repair on bad node and check data size. Node is owner of maximum token range but has significant lower data.I doubt that bootstrapping happened properly.
4.Compactionstats shows 22 pending compactions. Try throttling compactions via reducing cincurent compactors or compaction throughput.
5.Analyze logs to make sure bootstrapping happened without errors.
6. Look for other common performance problems such as GC pauses to make sure that dropped mutations are not caused by GC pauses.
ThanksAnuj
Sent from Yahoo Mail on Android
On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi<vi...@idioplatform.com> wrote: # nodetool compactionstats
pending tasks: 22
compaction type keyspace table completed total unit progress
Compactionproduction_analytics interactions 240410213 161172668724 bytes 0.15%
Compactionproduction_decisionsdecisions.decisions_q_idx 120815385 226295183 bytes 53.39%
Active compaction remaining time : 2h39m58s
Worth mentioning that compactions haven't been running on this node particularly often. The node's been performing badly regardless of whether it's compacting or not.
On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
What’s your output of `nodetool compactionstats`?
On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com> wrote:
Hi,
We recently added a new node to our cluster in order to replace a node that died (hardware failure we believe). For the next two weeks it had high disk and network activity. We replaced the server, but it's happened again. We've looked into memory allowances, disk performance, number of connections, and all the nodetool stats, but can't find the cause of the issue.
`nodetool tpstats`[0] shows a lot of active and pending threads, in comparison to the rest of the cluster, but that's likely a symptom, not a cause.
`nodetool status`[1] shows the cluster isn't quite balanced. The bad node (D) has less data.
Disk Activity[2] and Network activity[3] on this node is far higher than the rest.
The only other difference this node has to the rest of the cluster is that its on the ext4 filesystem, whereas the rest are ext3, but we've done plenty of testing there and can't see how that would affect performance on this node so much.
Nothing of note in system.log.
What should our next step be in trying to diagnose this issue?
Best wishes,
Vic
[0] `nodetool tpstats` output:
Good node:
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 46311521 0 0
RequestResponseStage 0 0 23817366 0 0
MutationStage 0 0 47389269 0 0
ReadRepairStage 0 0 11108 0 0
ReplicateOnWriteStage 0 0 0 0 0
GossipStage 0 0 5259908 0 0
CacheCleanupExecutor 0 0 0 0 0
MigrationStage 0 0 30 0 0
MemoryMeter 0 0 16563 0 0
FlushWriter 0 0 39637 0 26
ValidationExecutor 0 0 19013 0 0
InternalResponseStage 0 0 9 0 0
AntiEntropyStage 0 0 38026 0 0
MemtablePostFlusher 0 0 81740 0 0
MiscStage 0 0 19196 0 0
PendingRangeCalculator 0 0 23 0 0
CompactionExecutor 0 0 61629 0 0
commitlog_archiver 0 0 0 0 0
HintedHandoff 0 0 63 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
PAGED_RANGE 0
BINARY 0
READ 640
MUTATION 0
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
Bad node:
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 32 113 52216 0 0
RequestResponseStage 0 0 4167 0 0
MutationStage 0 0 127559 0 0
ReadRepairStage 0 0 125 0 0
ReplicateOnWriteStage 0 0 0 0 0
GossipStage 0 0 9965 0 0
CacheCleanupExecutor 0 0 0 0 0
MigrationStage 0 0 0 0 0
MemoryMeter 0 0 24 0 0
FlushWriter 0 0 27 0 1
ValidationExecutor 0 0 0 0 0
InternalResponseStage 0 0 0 0 0
AntiEntropyStage 0 0 0 0 0
MemtablePostFlusher 0 0 96 0 0
MiscStage 0 0 0 0 0
PendingRangeCalculator 0 0 10 0 0
CompactionExecutor 1 1 73 0 0
commitlog_archiver 0 0 0 0 0
HintedHandoff 0 0 15 0 0
Message type Dropped
RANGE_SLICE 130
READ_REPAIR 1
PAGED_RANGE 0
BINARY 0
READ 31032
MUTATION 865
_TRACE 0
REQUEST_RESPONSE 7
COUNTER_MUTATION 0
[1] `nodetool status` output:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN A (Good) 252.37 GB 256 23.0% 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
UN B (Good) 245.91 GB 256 24.4% 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
UN C (Good) 254.79 GB 256 23.7% f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
UN D (Bad) 163.85 GB 256 28.8% faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
[2] Disk read/write ops:
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
[3] Network in/out:
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
Re: New node has high network and disk usage.
Posted by Vickrum Loi <vi...@idioplatform.com>.
# nodetool compactionstats
pending tasks: 22
compaction type keyspace table
completed total unit progress
Compactionproduction_analytics interactions
240410213 161172668724 bytes 0.15%
Compactionproduction_decisionsdecisions.decisions_q_idx
120815385 226295183 bytes 53.39%
Active compaction remaining time : 2h39m58s
Worth mentioning that compactions haven't been running on this node
particularly often. The node's been performing badly regardless of whether
it's compacting or not.
On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
> What’s your output of `nodetool compactionstats`?
>
> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com>
> wrote:
>
> Hi,
>
> We recently added a new node to our cluster in order to replace a node
> that died (hardware failure we believe). For the next two weeks it had high
> disk and network activity. We replaced the server, but it's happened again.
> We've looked into memory allowances, disk performance, number of
> connections, and all the nodetool stats, but can't find the cause of the
> issue.
>
> `nodetool tpstats`[0] shows a lot of active and pending threads, in
> comparison to the rest of the cluster, but that's likely a symptom, not a
> cause.
>
> `nodetool status`[1] shows the cluster isn't quite balanced. The bad node
> (D) has less data.
>
> Disk Activity[2] and Network activity[3] on this node is far higher than
> the rest.
>
> The only other difference this node has to the rest of the cluster is that
> its on the ext4 filesystem, whereas the rest are ext3, but we've done
> plenty of testing there and can't see how that would affect performance on
> this node so much.
>
> Nothing of note in system.log.
>
> What should our next step be in trying to diagnose this issue?
>
> Best wishes,
> Vic
>
> [0] `nodetool tpstats` output:
>
> Good node:
> Pool Name Active Pending Completed
> Blocked All time blocked
> ReadStage 0 0 46311521
> 0 0
> RequestResponseStage 0 0 23817366
> 0 0
> MutationStage 0 0 47389269
> 0 0
> ReadRepairStage 0 0 11108
> 0 0
> ReplicateOnWriteStage 0 0 0
> 0 0
> GossipStage 0 0 5259908
> 0 0
> CacheCleanupExecutor 0 0 0
> 0 0
> MigrationStage 0 0 30
> 0 0
> MemoryMeter 0 0 16563
> 0 0
> FlushWriter 0 0 39637
> 0 26
> ValidationExecutor 0 0 19013
> 0 0
> InternalResponseStage 0 0 9
> 0 0
> AntiEntropyStage 0 0 38026
> 0 0
> MemtablePostFlusher 0 0 81740
> 0 0
> MiscStage 0 0 19196
> 0 0
> PendingRangeCalculator 0 0 23
> 0 0
> CompactionExecutor 0 0 61629
> 0 0
> commitlog_archiver 0 0 0
> 0 0
> HintedHandoff 0 0 63
> 0 0
>
> Message type Dropped
> RANGE_SLICE 0
> READ_REPAIR 0
> PAGED_RANGE 0
> BINARY 0
> READ 640
> MUTATION 0
> _TRACE 0
> REQUEST_RESPONSE 0
> COUNTER_MUTATION 0
>
> Bad node:
> Pool Name Active Pending Completed
> Blocked All time blocked
> ReadStage 32 113 52216
> 0 0
> RequestResponseStage 0 0 4167
> 0 0
> MutationStage 0 0 127559
> 0 0
> ReadRepairStage 0 0 125
> 0 0
> ReplicateOnWriteStage 0 0 0
> 0 0
> GossipStage 0 0 9965
> 0 0
> CacheCleanupExecutor 0 0 0
> 0 0
> MigrationStage 0 0 0
> 0 0
> MemoryMeter 0 0 24
> 0 0
> FlushWriter 0 0 27
> 0 1
> ValidationExecutor 0 0 0
> 0 0
> InternalResponseStage 0 0 0
> 0 0
> AntiEntropyStage 0 0 0
> 0 0
> MemtablePostFlusher 0 0 96
> 0 0
> MiscStage 0 0 0
> 0 0
> PendingRangeCalculator 0 0 10
> 0 0
> CompactionExecutor 1 1 73
> 0 0
> commitlog_archiver 0 0 0
> 0 0
> HintedHandoff 0 0 15
> 0 0
>
> Message type Dropped
> RANGE_SLICE 130
> READ_REPAIR 1
> PAGED_RANGE 0
> BINARY 0
> READ 31032
> MUTATION 865
> _TRACE 0
> REQUEST_RESPONSE 7
> COUNTER_MUTATION 0
>
>
> [1] `nodetool status` output:
>
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host
> ID Rack
> UN A (Good) 252.37 GB 256 23.0%
> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
> UN B (Good) 245.91 GB 256 24.4%
> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
> UN C (Good) 254.79 GB 256 23.7%
> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
> UN D (Bad) 163.85 GB 256 28.8%
> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>
> [2] Disk read/write ops:
>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>
> [3] Network in/out:
>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>
>
>
Re: New node has high network and disk usage.
Posted by Jeff Ferland <jb...@tubularlabs.com>.
What’s your output of `nodetool compactionstats`?
> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com> wrote:
>
> Hi,
>
> We recently added a new node to our cluster in order to replace a node that died (hardware failure we believe). For the next two weeks it had high disk and network activity. We replaced the server, but it's happened again. We've looked into memory allowances, disk performance, number of connections, and all the nodetool stats, but can't find the cause of the issue.
>
> `nodetool tpstats`[0] shows a lot of active and pending threads, in comparison to the rest of the cluster, but that's likely a symptom, not a cause.
>
> `nodetool status`[1] shows the cluster isn't quite balanced. The bad node (D) has less data.
>
> Disk Activity[2] and Network activity[3] on this node is far higher than the rest.
>
> The only other difference this node has to the rest of the cluster is that its on the ext4 filesystem, whereas the rest are ext3, but we've done plenty of testing there and can't see how that would affect performance on this node so much.
>
> Nothing of note in system.log.
>
> What should our next step be in trying to diagnose this issue?
>
> Best wishes,
> Vic
>
> [0] `nodetool tpstats` output:
>
> Good node:
> Pool Name Active Pending Completed Blocked All time blocked
> ReadStage 0 0 46311521 0 0
> RequestResponseStage 0 0 23817366 0 0
> MutationStage 0 0 47389269 0 0
> ReadRepairStage 0 0 11108 0 0
> ReplicateOnWriteStage 0 0 0 0 0
> GossipStage 0 0 5259908 0 0
> CacheCleanupExecutor 0 0 0 0 0
> MigrationStage 0 0 30 0 0
> MemoryMeter 0 0 16563 0 0
> FlushWriter 0 0 39637 0 26
> ValidationExecutor 0 0 19013 0 0
> InternalResponseStage 0 0 9 0 0
> AntiEntropyStage 0 0 38026 0 0
> MemtablePostFlusher 0 0 81740 0 0
> MiscStage 0 0 19196 0 0
> PendingRangeCalculator 0 0 23 0 0
> CompactionExecutor 0 0 61629 0 0
> commitlog_archiver 0 0 0 0 0
> HintedHandoff 0 0 63 0 0
>
> Message type Dropped
> RANGE_SLICE 0
> READ_REPAIR 0
> PAGED_RANGE 0
> BINARY 0
> READ 640
> MUTATION 0
> _TRACE 0
> REQUEST_RESPONSE 0
> COUNTER_MUTATION 0
>
> Bad node:
> Pool Name Active Pending Completed Blocked All time blocked
> ReadStage 32 113 52216 0 0
> RequestResponseStage 0 0 4167 0 0
> MutationStage 0 0 127559 0 0
> ReadRepairStage 0 0 125 0 0
> ReplicateOnWriteStage 0 0 0 0 0
> GossipStage 0 0 9965 0 0
> CacheCleanupExecutor 0 0 0 0 0
> MigrationStage 0 0 0 0 0
> MemoryMeter 0 0 24 0 0
> FlushWriter 0 0 27 0 1
> ValidationExecutor 0 0 0 0 0
> InternalResponseStage 0 0 0 0 0
> AntiEntropyStage 0 0 0 0 0
> MemtablePostFlusher 0 0 96 0 0
> MiscStage 0 0 0 0 0
> PendingRangeCalculator 0 0 10 0 0
> CompactionExecutor 1 1 73 0 0
> commitlog_archiver 0 0 0 0 0
> HintedHandoff 0 0 15 0 0
>
> Message type Dropped
> RANGE_SLICE 130
> READ_REPAIR 1
> PAGED_RANGE 0
> BINARY 0
> READ 31032
> MUTATION 865
> _TRACE 0
> REQUEST_RESPONSE 7
> COUNTER_MUTATION 0
>
>
> [1] `nodetool status` output:
>
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host ID Rack
> UN A (Good) 252.37 GB 256 23.0% 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
> UN B (Good) 245.91 GB 256 24.4% 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
> UN C (Good) 254.79 GB 256 23.7% f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
> UN D (Bad) 163.85 GB 256 28.8% faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>
> [2] Disk read/write ops:
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png>
>
> [3] Network in/out:
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png>
Re: New node has high network and disk usage.
Posted by Vickrum Loi <vi...@idioplatform.com>.
I should probably have mentioned that we're on Cassandra 2.0.10.
On 6 January 2016 at 15:26, Vickrum Loi <vi...@idioplatform.com>
wrote:
> Hi,
>
> We recently added a new node to our cluster in order to replace a node
> that died (hardware failure we believe). For the next two weeks it had high
> disk and network activity. We replaced the server, but it's happened again.
> We've looked into memory allowances, disk performance, number of
> connections, and all the nodetool stats, but can't find the cause of the
> issue.
>
> `nodetool tpstats`[0] shows a lot of active and pending threads, in
> comparison to the rest of the cluster, but that's likely a symptom, not a
> cause.
>
> `nodetool status`[1] shows the cluster isn't quite balanced. The bad node
> (D) has less data.
>
> Disk Activity[2] and Network activity[3] on this node is far higher than
> the rest.
>
> The only other difference this node has to the rest of the cluster is that
> its on the ext4 filesystem, whereas the rest are ext3, but we've done
> plenty of testing there and can't see how that would affect performance on
> this node so much.
>
> Nothing of note in system.log.
>
> What should our next step be in trying to diagnose this issue?
>
> Best wishes,
> Vic
>
> [0] `nodetool tpstats` output:
>
> Good node:
> Pool Name Active Pending Completed
> Blocked All time blocked
> ReadStage 0 0 46311521
> 0 0
> RequestResponseStage 0 0 23817366
> 0 0
> MutationStage 0 0 47389269
> 0 0
> ReadRepairStage 0 0 11108
> 0 0
> ReplicateOnWriteStage 0 0 0
> 0 0
> GossipStage 0 0 5259908
> 0 0
> CacheCleanupExecutor 0 0 0
> 0 0
> MigrationStage 0 0 30
> 0 0
> MemoryMeter 0 0 16563
> 0 0
> FlushWriter 0 0 39637
> 0 26
> ValidationExecutor 0 0 19013
> 0 0
> InternalResponseStage 0 0 9
> 0 0
> AntiEntropyStage 0 0 38026
> 0 0
> MemtablePostFlusher 0 0 81740
> 0 0
> MiscStage 0 0 19196
> 0 0
> PendingRangeCalculator 0 0 23
> 0 0
> CompactionExecutor 0 0 61629
> 0 0
> commitlog_archiver 0 0 0
> 0 0
> HintedHandoff 0 0 63
> 0 0
>
> Message type Dropped
> RANGE_SLICE 0
> READ_REPAIR 0
> PAGED_RANGE 0
> BINARY 0
> READ 640
> MUTATION 0
> _TRACE 0
> REQUEST_RESPONSE 0
> COUNTER_MUTATION 0
>
> Bad node:
> Pool Name Active Pending Completed
> Blocked All time blocked
> ReadStage 32 113 52216
> 0 0
> RequestResponseStage 0 0 4167
> 0 0
> MutationStage 0 0 127559
> 0 0
> ReadRepairStage 0 0 125
> 0 0
> ReplicateOnWriteStage 0 0 0
> 0 0
> GossipStage 0 0 9965
> 0 0
> CacheCleanupExecutor 0 0 0
> 0 0
> MigrationStage 0 0 0
> 0 0
> MemoryMeter 0 0 24
> 0 0
> FlushWriter 0 0 27
> 0 1
> ValidationExecutor 0 0 0
> 0 0
> InternalResponseStage 0 0 0
> 0 0
> AntiEntropyStage 0 0 0
> 0 0
> MemtablePostFlusher 0 0 96
> 0 0
> MiscStage 0 0 0
> 0 0
> PendingRangeCalculator 0 0 10
> 0 0
> CompactionExecutor 1 1 73
> 0 0
> commitlog_archiver 0 0 0
> 0 0
> HintedHandoff 0 0 15
> 0 0
>
> Message type Dropped
> RANGE_SLICE 130
> READ_REPAIR 1
> PAGED_RANGE 0
> BINARY 0
> READ 31032
> MUTATION 865
> _TRACE 0
> REQUEST_RESPONSE 7
> COUNTER_MUTATION 0
>
>
> [1] `nodetool status` output:
>
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host
> ID Rack
> UN A (Good) 252.37 GB 256 23.0%
> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
> UN B (Good) 245.91 GB 256 24.4%
> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
> UN C (Good) 254.79 GB 256 23.7%
> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
> UN D (Bad) 163.85 GB 256 28.8%
> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>
> [2] Disk read/write ops:
>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>
> [3] Network in/out:
>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>