You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vickrum Loi <vi...@idioplatform.com> on 2016/01/06 16:26:47 UTC

New node has high network and disk usage.

Hi,

We recently added a new node to our cluster in order to replace a node that
died (hardware failure we believe). For the next two weeks it had high disk
and network activity. We replaced the server, but it's happened again.
We've looked into memory allowances, disk performance, number of
connections, and all the nodetool stats, but can't find the cause of the
issue.

`nodetool tpstats`[0] shows a lot of active and pending threads, in
comparison to the rest of the cluster, but that's likely a symptom, not a
cause.

`nodetool status`[1] shows the cluster isn't quite balanced. The bad node
(D) has less data.

Disk Activity[2] and Network activity[3] on this node is far higher than
the rest.

The only other difference this node has to the rest of the cluster is that
its on the ext4 filesystem, whereas the rest are ext3, but we've done
plenty of testing there and can't see how that would affect performance on
this node so much.

Nothing of note in system.log.

What should our next step be in trying to diagnose this issue?

Best wishes,
Vic

[0] `nodetool tpstats` output:

Good node:
    Pool Name                    Active   Pending      Completed   Blocked
All time blocked
    ReadStage                         0         0       46311521
0                 0
    RequestResponseStage              0         0       23817366
0                 0
    MutationStage                     0         0       47389269
0                 0
    ReadRepairStage                   0         0          11108
0                 0
    ReplicateOnWriteStage             0         0              0
0                 0
    GossipStage                       0         0        5259908
0                 0
    CacheCleanupExecutor              0         0              0
0                 0
    MigrationStage                    0         0             30
0                 0
    MemoryMeter                       0         0          16563
0                 0
    FlushWriter                       0         0          39637
0                26
    ValidationExecutor                0         0          19013
0                 0
    InternalResponseStage             0         0              9
0                 0
    AntiEntropyStage                  0         0          38026
0                 0
    MemtablePostFlusher               0         0          81740
0                 0
    MiscStage                         0         0          19196
0                 0
    PendingRangeCalculator            0         0             23
0                 0
    CompactionExecutor                0         0          61629
0                 0
    commitlog_archiver                0         0              0
0                 0
    HintedHandoff                     0         0             63
0                 0

    Message type           Dropped
    RANGE_SLICE                  0
    READ_REPAIR                  0
    PAGED_RANGE                  0
    BINARY                       0
    READ                       640
    MUTATION                     0
    _TRACE                       0
    REQUEST_RESPONSE             0
    COUNTER_MUTATION             0

Bad node:
    Pool Name                    Active   Pending      Completed   Blocked
All time blocked
    ReadStage                        32       113          52216
0                 0
    RequestResponseStage              0         0           4167
0                 0
    MutationStage                     0         0         127559
0                 0
    ReadRepairStage                   0         0            125
0                 0
    ReplicateOnWriteStage             0         0              0
0                 0
    GossipStage                       0         0           9965
0                 0
    CacheCleanupExecutor              0         0              0
0                 0
    MigrationStage                    0         0              0
0                 0
    MemoryMeter                       0         0             24
0                 0
    FlushWriter                       0         0             27
0                 1
    ValidationExecutor                0         0              0
0                 0
    InternalResponseStage             0         0              0
0                 0
    AntiEntropyStage                  0         0              0
0                 0
    MemtablePostFlusher               0         0             96
0                 0
    MiscStage                         0         0              0
0                 0
    PendingRangeCalculator            0         0             10
0                 0
    CompactionExecutor                1         1             73
0                 0
    commitlog_archiver                0         0              0
0                 0
    HintedHandoff                     0         0             15
0                 0

    Message type           Dropped
    RANGE_SLICE                130
    READ_REPAIR                  1
    PAGED_RANGE                  0
    BINARY                       0
    READ                     31032
    MUTATION                   865
    _TRACE                       0
    REQUEST_RESPONSE             7
    COUNTER_MUTATION             0


[1] `nodetool status` output:

    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address         Load       Tokens  Owns   Host
ID                               Rack
    UN  A (Good)        252.37 GB  256     23.0%
9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
    UN  B (Good)        245.91 GB  256     24.4%
6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
    UN  C (Good)        254.79 GB  256     23.7%
f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
    UN  D (Bad)         163.85 GB  256     28.8%
faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1

[2] Disk read/write ops:


https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png

https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png

[3] Network in/out:


https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png

https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png

Re: New node has high network and disk usage.

Posted by Kai Wang <de...@gmail.com>.
James,

Thanks for sharing. Anyway, good to know there's one more thing to add to
the checklist.

On Sun, Jan 17, 2016 at 12:23 PM, James Griffin <
james.griffin@idioplatform.com> wrote:

> Hi all,
>
> Just to let you know, we finally figured this out on Friday. It turns out
> the new nodes had an older version of the kernel installed. Upgrading the
> kernel solved our issues. For reference, the "bad" kernel was
> 3.2.0-75-virtual, upgrading to 3.2.0-86-virtual resolved the issue. We
> still don't fully understand why this kernel bug didn't affect *all *our
> nodes (in the end we had three nodes with that kernel, only two of them
> exhibited this issue), but there we go.
>
> Thanks everyone for your help
>
> Cheers,
> Griff
>
> On 14 January 2016 at 15:14, James Griffin <james.griffin@idioplatform.com
> > wrote:
>
>> Hi Kai,
>>
>> Well observed - running `nodetool status` without specifying keyspace
>> does report ~33% on each node. We have two keyspaces on this cluster - if I
>> specify either of them the ownership reported by each node is 100%, so I
>> believe the repair completed successfully.
>>
>> Best wishes,
>>
>> Griff
>>
>> [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
>> CTO
>> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 |
>> Twitter: @imaginaryroots <http://twitter.com/imaginaryroots> | Skype:
>> j.s.griffin
>> idio helps major brands and publishers to build closer relationships with
>> their customers and prospects by learning from their content consumption
>> and acting on that insight. We call it Content Intelligence, and it
>> integrates with your existing marketing technology to provide detailed
>> customer interest profiles in real-time across all channels, and to
>> personalize content into every channel for every customer. See
>> http://idioplatform.com
>> <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F> for
>> more information.
>>
>> On 14 January 2016 at 15:08, Kai Wang <de...@gmail.com> wrote:
>>
>>> James,
>>>
>>> I may miss something. You mentioned your cluster had RF=3. Then why
>>> does "nodetool status" show each node owns 1/3 of the data especially after
>>> a full repair?
>>>
>>> On Thu, Jan 14, 2016 at 9:56 AM, James Griffin <
>>> james.griffin@idioplatform.com> wrote:
>>>
>>>> Hi Kai,
>>>>
>>>> Below - nothing going on that I can see
>>>>
>>>> $ nodetool netstats
>>>> Mode: NORMAL
>>>> Not sending any streams.
>>>> Read Repair Statistics:
>>>> Attempted: 0
>>>> Mismatch (Blocking): 0
>>>> Mismatch (Background): 0
>>>> Pool Name                    Active   Pending      Completed
>>>> Commands                        n/a         0           6326
>>>> Responses                       n/a         0         219356
>>>>
>>>>
>>>>
>>>> Best wishes,
>>>>
>>>> Griff
>>>>
>>>> [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
>>>> CTO
>>>> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 |
>>>> Twitter: @imaginaryroots <http://twitter.com/imaginaryroots> | Skype:
>>>> j.s.griffin
>>>> idio helps major brands and publishers to build closer relationships
>>>> with their customers and prospects by learning from their content
>>>> consumption and acting on that insight. We call it Content Intelligence,
>>>> and it integrates with your existing marketing technology to provide
>>>> detailed customer interest profiles in real-time across all channels, and
>>>> to personalize content into every channel for every customer. See
>>>> http://idioplatform.com
>>>> <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F> for
>>>> more information.
>>>>
>>>> On 14 January 2016 at 14:22, Kai Wang <de...@gmail.com> wrote:
>>>>
>>>>> James,
>>>>>
>>>>> Can you post the result of "nodetool netstats" on the bad node?
>>>>>
>>>>> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
>>>>> james.griffin@idioplatform.com> wrote:
>>>>>
>>>>>> A summary of what we've done this morning:
>>>>>>
>>>>>>    - Noted that there are no GCInspector lines in system.log on bad
>>>>>>    node (there are GCInspector logs on other healthy nodes)
>>>>>>    - Turned on GC logging, noted that we had logs which stated out
>>>>>>    total time for which application threads were stopped was high - ~10s.
>>>>>>    - Not seeing failures or any kind (promotion or concurrent mark)
>>>>>>    - Attached Visual VM: noted that heap usage was very low (~5%
>>>>>>    usage and stable) and it didn't display hallmarks GC of activity. PermGen
>>>>>>    also very stable
>>>>>>    - Downloaded GC logs and examined in GC Viewer. Noted that:
>>>>>>    - We had lots of pauses (again around 10s), but no full GC.
>>>>>>       - From a 2,300s sample, just over 2,000s were spent with
>>>>>>       threads paused
>>>>>>       - Spotted many small GCs in the new space - realised that Xmn
>>>>>>       value was very low (200M against a heap size of 3750M). Increased Xmn to
>>>>>>       937M - no change in server behaviour (high load, high reads/s on disk, high
>>>>>>       CPU wait)
>>>>>>
>>>>>> Current output of jstat:
>>>>>>
>>>>>>   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT
>>>>>> GCT
>>>>>> 2 0.00  45.20  12.82  26.84  76.21   2333   63.684     2    0.039
>>>>>> 63.724
>>>>>> 3 63.58   0.00  33.68   8.04  75.19     14    1.812     2    0.103
>>>>>>  1.915
>>>>>>
>>>>>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise
>>>>>> than 2 (which has normal load statistics).
>>>>>>
>>>>>> Anywhere else you can recommend we look?
>>>>>>
>>>>>> Griff
>>>>>>
>>>>>> On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in>
>>>>>> wrote:
>>>>>>
>>>>>>> Ok. I saw dropped mutations on your cluster and full gc is a common
>>>>>>> cause for that.
>>>>>>> Can you just search the word GCInspector in system.log and share the
>>>>>>> frequency of minor and full gc. Moreover, are you printing promotion
>>>>>>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>>>>>>> or concurrent mode failures?
>>>>>>>
>>>>>>> If you are on CMS, you need to fine tune your heap options to
>>>>>>> address full gc.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Anuj
>>>>>>>
>>>>>>> Sent from Yahoo Mail on Android
>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>
>>>>>>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>>>>>>> <ja...@idioplatform.com> wrote:
>>>>>>> I think I was incorrect in assuming GC wasn't an issue due to the
>>>>>>> lack of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>>>>>>> differences, though
>>>>>>> comparing the startup flags on the two machines show the GC config
>>>>>>> is identical.:
>>>>>>>
>>>>>>> $ jstat -gcutil
>>>>>>>    S0     S1     E      O      P     YGC     YGCT    FGC    FGCT
>>>>>>> GCT
>>>>>>> 2  5.08   0.00  55.72  18.24  59.90  25986  619.827    28    1.597
>>>>>>>  621.424
>>>>>>> 3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
>>>>>>> 11283.361
>>>>>>>
>>>>>>> Here's typical output for iostat on nodes 2 & 3 as well:
>>>>>>>
>>>>>>> $ iostat -dmx md0
>>>>>>>
>>>>>>>   Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>>>>>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>>>>> 2 md0               0.00     0.00  339.00    0.00     9.77     0.00
>>>>>>>    59.00     0.00    0.00    0.00    0.00   0.00   0.00
>>>>>>> 3 md0               0.00     0.00 2069.00    1.00    85.85     0.00
>>>>>>>    84.94     0.00    0.00    0.00    0.00   0.00   0.00
>>>>>>>
>>>>>>> Griff
>>>>>>>
>>>>>>> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Node 2 has slightly higher data but that should be ok. Not sure how
>>>>>>>> read ops are so high when no IO intensive activity such as repair and
>>>>>>>> compaction is running on node 3.May be you can try investigating logs to
>>>>>>>> see whats happening.
>>>>>>>>
>>>>>>>> Others on the mailing list could also share their views on the
>>>>>>>> situation.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Anuj
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Sent from Yahoo Mail on Android
>>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>>
>>>>>>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>>>>>>> <ja...@idioplatform.com> wrote:
>>>>>>>> Hi Anuj,
>>>>>>>>
>>>>>>>> Below is the output of nodetool status. The nodes were replaced
>>>>>>>> following the instructions in Datastax documentation for replacing running
>>>>>>>> nodes since the nodes were running fine, it was that the servers had been
>>>>>>>> incorrectly initialised and they thus had less disk space. The status below
>>>>>>>> shows 2 has significantly higher load, however as I say 2 is operating
>>>>>>>> normally and is running compactions, so I guess that's not an issue?
>>>>>>>>
>>>>>>>> Datacenter: datacenter1
>>>>>>>> =======================
>>>>>>>> Status=Up/Down
>>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>>> --  Address         Load       Tokens  Owns   Host ID
>>>>>>>>                 Rack
>>>>>>>> UN  1               253.59 GB  256     31.7%
>>>>>>>>  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>>> UN  2               302.23 GB  256     35.3%
>>>>>>>>  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>>> UN  3               265.02 GB  256     33.1%
>>>>>>>>  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
>>>>>>>>
>>>>>>>> Griff
>>>>>>>>
>>>>>>>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Revisiting the thread I can see that nodetool status had both good
>>>>>>>>> and bad nodes at same time. How do you replace nodes? When you say bad
>>>>>>>>> node..I understand that the node is no more usable even though Cassandra is
>>>>>>>>> UP? Is that correct?
>>>>>>>>>
>>>>>>>>> If a node is in bad shape and not working, adding new node may
>>>>>>>>> trigger streaming huge data from bad node too. Have you considered using
>>>>>>>>> the procedure for replacing a dead node?
>>>>>>>>>
>>>>>>>>> Please share Latest nodetool status.
>>>>>>>>>
>>>>>>>>> nodetool output shared earlier:
>>>>>>>>>
>>>>>>>>>  `nodetool status` output:
>>>>>>>>>
>>>>>>>>>     Status=Up/Down
>>>>>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>>>>>> ID                               Rack
>>>>>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Anuj
>>>>>>>>>
>>>>>>>>> Sent from Yahoo Mail on Android
>>>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>>>
>>>>>>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>>>>>>>> <ja...@idioplatform.com> wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> We’ve spent a few days running things but are in the same
>>>>>>>>> position. To add some more flavour:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - We have a 3-node ring, replication factor = 3. We’ve been
>>>>>>>>>    running in this configuration for a few years without any real issues
>>>>>>>>>    - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>>>>>>>    brought in to replace two other nodes which had failed RAID0 configuration
>>>>>>>>>    and thus were lacking in disk space.
>>>>>>>>>    - When node 2 was brought into the ring, it exhibited high CPU
>>>>>>>>>    wait, IO and load metrics
>>>>>>>>>    - We subsequently brought 3 into the ring: as soon as 3 was
>>>>>>>>>    fully bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>>>>>>>    levels. Those same stats on 3, however, sky-rocketed
>>>>>>>>>    - We’ve confirmed configuration across all three nodes are
>>>>>>>>>    identical and in line with the recommended production settings
>>>>>>>>>    - We’ve run a full repair
>>>>>>>>>    - Node 2 is currently running compactions, 1 & 3 aren’t and
>>>>>>>>>    have no pending
>>>>>>>>>    - There is no GC happening from what I can see. Node 1 has a
>>>>>>>>>    GC log, but that’s not been written to since May last year
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What we’re seeing at the moment is similar and normal stats on
>>>>>>>>> nodes 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>>>>>>>    2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>>>>>>>    3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Can you recommend any next steps?
>>>>>>>>>
>>>>>>>>> Griff
>>>>>>>>>
>>>>>>>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Vickrum,
>>>>>>>>>>
>>>>>>>>>> I would have proceeded with diagnosis as follows:
>>>>>>>>>>
>>>>>>>>>> 1. Analysis of sar report to check system health -cpu memory
>>>>>>>>>> swap disk etc.
>>>>>>>>>> System seems to be overloaded. This is evident from mutation
>>>>>>>>>> drops.
>>>>>>>>>>
>>>>>>>>>> 2. Make sure that  all recommended Cassandra production settings
>>>>>>>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>>>>>>>
>>>>>>>>>> 3.Run full Repair on bad node and check data size. Node is owner
>>>>>>>>>> of maximum token range but has significant lower data.I doubt that
>>>>>>>>>> bootstrapping happened properly.
>>>>>>>>>>
>>>>>>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>>>>>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>>>>>>>
>>>>>>>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>>>>>>>
>>>>>>>>>> 6. Look for other common performance problems such as GC pauses
>>>>>>>>>> to make sure that dropped mutations are not caused by GC pauses.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Anuj
>>>>>>>>>>
>>>>>>>>>> Sent from Yahoo Mail on Android
>>>>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>>>>
>>>>>>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>>>>>>>> <vi...@idioplatform.com> wrote:
>>>>>>>>>> # nodetool compactionstats
>>>>>>>>>> pending tasks: 22
>>>>>>>>>>           compaction type        keyspace           table
>>>>>>>>>> completed           total      unit  progress
>>>>>>>>>>                Compactionproduction_analytics
>>>>>>>>>> interactions       240410213    161172668724     bytes     0.15%
>>>>>>>>>>
>>>>>>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>>>>>>>> 120815385       226295183     bytes    53.39%
>>>>>>>>>> Active compaction remaining time :   2h39m58s
>>>>>>>>>>
>>>>>>>>>> Worth mentioning that compactions haven't been running on this
>>>>>>>>>> node particularly often. The node's been performing badly regardless of
>>>>>>>>>> whether it's compacting or not.
>>>>>>>>>>
>>>>>>>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> What’s your output of `nodetool compactionstats`?
>>>>>>>>>>>
>>>>>>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <
>>>>>>>>>>> vickrum.loi@idioplatform.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> We recently added a new node to our cluster in order to replace
>>>>>>>>>>> a node that died (hardware failure we believe). For the next two weeks it
>>>>>>>>>>> had high disk and network activity. We replaced the server, but it's
>>>>>>>>>>> happened again. We've looked into memory allowances, disk performance,
>>>>>>>>>>> number of connections, and all the nodetool stats, but can't find the cause
>>>>>>>>>>> of the issue.
>>>>>>>>>>>
>>>>>>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads,
>>>>>>>>>>> in comparison to the rest of the cluster, but that's likely a symptom, not
>>>>>>>>>>> a cause.
>>>>>>>>>>>
>>>>>>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The
>>>>>>>>>>> bad node (D) has less data.
>>>>>>>>>>>
>>>>>>>>>>> Disk Activity[2] and Network activity[3] on this node is far
>>>>>>>>>>> higher than the rest.
>>>>>>>>>>>
>>>>>>>>>>> The only other difference this node has to the rest of the
>>>>>>>>>>> cluster is that its on the ext4 filesystem, whereas the rest are ext3, but
>>>>>>>>>>> we've done plenty of testing there and can't see how that would affect
>>>>>>>>>>> performance on this node so much.
>>>>>>>>>>>
>>>>>>>>>>> Nothing of note in system.log.
>>>>>>>>>>>
>>>>>>>>>>> What should our next step be in trying to diagnose this issue?
>>>>>>>>>>>
>>>>>>>>>>> Best wishes,
>>>>>>>>>>> Vic
>>>>>>>>>>>
>>>>>>>>>>> [0] `nodetool tpstats` output:
>>>>>>>>>>>
>>>>>>>>>>> Good node:
>>>>>>>>>>>     Pool Name                    Active   Pending
>>>>>>>>>>> Completed   Blocked  All time blocked
>>>>>>>>>>>     ReadStage                         0         0
>>>>>>>>>>> 46311521         0                 0
>>>>>>>>>>>     RequestResponseStage              0         0
>>>>>>>>>>> 23817366         0                 0
>>>>>>>>>>>     MutationStage                     0         0
>>>>>>>>>>> 47389269         0                 0
>>>>>>>>>>>     ReadRepairStage                   0         0
>>>>>>>>>>> 11108         0                 0
>>>>>>>>>>>     ReplicateOnWriteStage             0         0
>>>>>>>>>>> 0         0                 0
>>>>>>>>>>>     GossipStage                       0         0
>>>>>>>>>>> 5259908         0                 0
>>>>>>>>>>>     CacheCleanupExecutor              0         0
>>>>>>>>>>> 0         0                 0
>>>>>>>>>>>     MigrationStage                    0         0
>>>>>>>>>>> 30         0                 0
>>>>>>>>>>>     MemoryMeter                       0         0
>>>>>>>>>>> 16563         0                 0
>>>>>>>>>>>     FlushWriter                       0         0
>>>>>>>>>>> 39637         0                26
>>>>>>>>>>>     ValidationExecutor                0         0
>>>>>>>>>>> 19013         0                 0
>>>>>>>>>>>     InternalResponseStage             0         0
>>>>>>>>>>> 9         0                 0
>>>>>>>>>>>     AntiEntropyStage                  0         0
>>>>>>>>>>> 38026         0                 0
>>>>>>>>>>>     MemtablePostFlusher               0         0
>>>>>>>>>>> 81740         0                 0
>>>>>>>>>>>     MiscStage                         0         0
>>>>>>>>>>> 19196         0                 0
>>>>>>>>>>>     PendingRangeCalculator            0         0
>>>>>>>>>>> 23         0                 0
>>>>>>>>>>>     CompactionExecutor                0         0
>>>>>>>>>>> 61629         0                 0
>>>>>>>>>>>     commitlog_archiver                0         0
>>>>>>>>>>> 0         0                 0
>>>>>>>>>>>     HintedHandoff                     0         0
>>>>>>>>>>> 63         0                 0
>>>>>>>>>>>
>>>>>>>>>>>     Message type           Dropped
>>>>>>>>>>>     RANGE_SLICE                  0
>>>>>>>>>>>     READ_REPAIR                  0
>>>>>>>>>>>     PAGED_RANGE                  0
>>>>>>>>>>>     BINARY                       0
>>>>>>>>>>>     READ                       640
>>>>>>>>>>>     MUTATION                     0
>>>>>>>>>>>     _TRACE                       0
>>>>>>>>>>>     REQUEST_RESPONSE             0
>>>>>>>>>>>     COUNTER_MUTATION             0
>>>>>>>>>>>
>>>>>>>>>>> Bad node:
>>>>>>>>>>>     Pool Name                    Active   Pending
>>>>>>>>>>> Completed   Blocked  All time blocked
>>>>>>>>>>>     ReadStage                        32       113
>>>>>>>>>>> 52216         0                 0
>>>>>>>>>>>     RequestResponseStage              0         0
>>>>>>>>>>> 4167         0                 0
>>>>>>>>>>>     MutationStage                     0         0
>>>>>>>>>>> 127559         0                 0
>>>>>>>>>>>     ReadRepairStage                   0         0
>>>>>>>>>>> 125         0                 0
>>>>>>>>>>>     ReplicateOnWriteStage             0         0
>>>>>>>>>>> 0         0                 0
>>>>>>>>>>>     GossipStage                       0         0
>>>>>>>>>>> 9965         0                 0
>>>>>>>>>>>     CacheCleanupExecutor              0         0
>>>>>>>>>>> 0         0                 0
>>>>>>>>>>>     MigrationStage                    0         0
>>>>>>>>>>> 0         0                 0
>>>>>>>>>>>     MemoryMeter                       0         0
>>>>>>>>>>> 24         0                 0
>>>>>>>>>>>     FlushWriter                       0         0
>>>>>>>>>>> 27         0                 1
>>>>>>>>>>>     ValidationExecutor                0         0
>>>>>>>>>>> 0         0                 0
>>>>>>>>>>>     InternalResponseStage             0         0
>>>>>>>>>>> 0         0                 0
>>>>>>>>>>>     AntiEntropyStage                  0         0
>>>>>>>>>>> 0         0                 0
>>>>>>>>>>>     MemtablePostFlusher               0         0
>>>>>>>>>>> 96         0                 0
>>>>>>>>>>>     MiscStage                         0         0
>>>>>>>>>>> 0         0                 0
>>>>>>>>>>>     PendingRangeCalculator            0         0
>>>>>>>>>>> 10         0                 0
>>>>>>>>>>>     CompactionExecutor                1         1
>>>>>>>>>>> 73         0                 0
>>>>>>>>>>>     commitlog_archiver                0         0
>>>>>>>>>>> 0         0                 0
>>>>>>>>>>>     HintedHandoff                     0         0
>>>>>>>>>>> 15         0                 0
>>>>>>>>>>>
>>>>>>>>>>>     Message type           Dropped
>>>>>>>>>>>     RANGE_SLICE                130
>>>>>>>>>>>     READ_REPAIR                  1
>>>>>>>>>>>     PAGED_RANGE                  0
>>>>>>>>>>>     BINARY                       0
>>>>>>>>>>>     READ                     31032
>>>>>>>>>>>     MUTATION                   865
>>>>>>>>>>>     _TRACE                       0
>>>>>>>>>>>     REQUEST_RESPONSE             7
>>>>>>>>>>>     COUNTER_MUTATION             0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1] `nodetool status` output:
>>>>>>>>>>>
>>>>>>>>>>>     Status=Up/Down
>>>>>>>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>>>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>>>>>>>> ID                               Rack
>>>>>>>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>>>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>>>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>>>>>>
>>>>>>>>>>> [2] Disk read/write ops:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>>>>>>>
>>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>>>>>>>
>>>>>>>>>>> [3] Network in/out:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>>>>>>>
>>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: New node has high network and disk usage.

Posted by James Griffin <ja...@idioplatform.com>.
Hi all,

Just to let you know, we finally figured this out on Friday. It turns out
the new nodes had an older version of the kernel installed. Upgrading the
kernel solved our issues. For reference, the "bad" kernel was
3.2.0-75-virtual, upgrading to 3.2.0-86-virtual resolved the issue. We
still don't fully understand why this kernel bug didn't affect *all *our
nodes (in the end we had three nodes with that kernel, only two of them
exhibited this issue), but there we go.

Thanks everyone for your help

Cheers,
Griff

On 14 January 2016 at 15:14, James Griffin <ja...@idioplatform.com>
wrote:

> Hi Kai,
>
> Well observed - running `nodetool status` without specifying keyspace does
> report ~33% on each node. We have two keyspaces on this cluster - if I
> specify either of them the ownership reported by each node is 100%, so I
> believe the repair completed successfully.
>
> Best wishes,
>
> Griff
>
> [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
> CTO
> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 | Twitter:
> @imaginaryroots <http://twitter.com/imaginaryroots> | Skype: j.s.griffin
> idio helps major brands and publishers to build closer relationships with
> their customers and prospects by learning from their content consumption
> and acting on that insight. We call it Content Intelligence, and it
> integrates with your existing marketing technology to provide detailed
> customer interest profiles in real-time across all channels, and to
> personalize content into every channel for every customer. See
> http://idioplatform.com
> <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F> for
> more information.
>
> On 14 January 2016 at 15:08, Kai Wang <de...@gmail.com> wrote:
>
>> James,
>>
>> I may miss something. You mentioned your cluster had RF=3. Then why does
>> "nodetool status" show each node owns 1/3 of the data especially after a
>> full repair?
>>
>> On Thu, Jan 14, 2016 at 9:56 AM, James Griffin <
>> james.griffin@idioplatform.com> wrote:
>>
>>> Hi Kai,
>>>
>>> Below - nothing going on that I can see
>>>
>>> $ nodetool netstats
>>> Mode: NORMAL
>>> Not sending any streams.
>>> Read Repair Statistics:
>>> Attempted: 0
>>> Mismatch (Blocking): 0
>>> Mismatch (Background): 0
>>> Pool Name                    Active   Pending      Completed
>>> Commands                        n/a         0           6326
>>> Responses                       n/a         0         219356
>>>
>>>
>>>
>>> Best wishes,
>>>
>>> Griff
>>>
>>> [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
>>> CTO
>>> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 |
>>> Twitter: @imaginaryroots <http://twitter.com/imaginaryroots> | Skype:
>>> j.s.griffin
>>> idio helps major brands and publishers to build closer relationships
>>> with their customers and prospects by learning from their content
>>> consumption and acting on that insight. We call it Content Intelligence,
>>> and it integrates with your existing marketing technology to provide
>>> detailed customer interest profiles in real-time across all channels, and
>>> to personalize content into every channel for every customer. See
>>> http://idioplatform.com
>>> <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F> for
>>> more information.
>>>
>>> On 14 January 2016 at 14:22, Kai Wang <de...@gmail.com> wrote:
>>>
>>>> James,
>>>>
>>>> Can you post the result of "nodetool netstats" on the bad node?
>>>>
>>>> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
>>>> james.griffin@idioplatform.com> wrote:
>>>>
>>>>> A summary of what we've done this morning:
>>>>>
>>>>>    - Noted that there are no GCInspector lines in system.log on bad
>>>>>    node (there are GCInspector logs on other healthy nodes)
>>>>>    - Turned on GC logging, noted that we had logs which stated out
>>>>>    total time for which application threads were stopped was high - ~10s.
>>>>>    - Not seeing failures or any kind (promotion or concurrent mark)
>>>>>    - Attached Visual VM: noted that heap usage was very low (~5%
>>>>>    usage and stable) and it didn't display hallmarks GC of activity. PermGen
>>>>>    also very stable
>>>>>    - Downloaded GC logs and examined in GC Viewer. Noted that:
>>>>>    - We had lots of pauses (again around 10s), but no full GC.
>>>>>       - From a 2,300s sample, just over 2,000s were spent with
>>>>>       threads paused
>>>>>       - Spotted many small GCs in the new space - realised that Xmn
>>>>>       value was very low (200M against a heap size of 3750M). Increased Xmn to
>>>>>       937M - no change in server behaviour (high load, high reads/s on disk, high
>>>>>       CPU wait)
>>>>>
>>>>> Current output of jstat:
>>>>>
>>>>>   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>>>>> 2 0.00  45.20  12.82  26.84  76.21   2333   63.684     2    0.039
>>>>> 63.724
>>>>> 3 63.58   0.00  33.68   8.04  75.19     14    1.812     2    0.103
>>>>>  1.915
>>>>>
>>>>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise
>>>>> than 2 (which has normal load statistics).
>>>>>
>>>>> Anywhere else you can recommend we look?
>>>>>
>>>>> Griff
>>>>>
>>>>> On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in>
>>>>> wrote:
>>>>>
>>>>>> Ok. I saw dropped mutations on your cluster and full gc is a common
>>>>>> cause for that.
>>>>>> Can you just search the word GCInspector in system.log and share the
>>>>>> frequency of minor and full gc. Moreover, are you printing promotion
>>>>>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>>>>>> or concurrent mode failures?
>>>>>>
>>>>>> If you are on CMS, you need to fine tune your heap options to address
>>>>>> full gc.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Anuj
>>>>>>
>>>>>> Sent from Yahoo Mail on Android
>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>
>>>>>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>>>>>> <ja...@idioplatform.com> wrote:
>>>>>> I think I was incorrect in assuming GC wasn't an issue due to the
>>>>>> lack of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>>>>>> differences, though
>>>>>> comparing the startup flags on the two machines show the GC config is
>>>>>> identical.:
>>>>>>
>>>>>> $ jstat -gcutil
>>>>>>    S0     S1     E      O      P     YGC     YGCT    FGC    FGCT
>>>>>> GCT
>>>>>> 2  5.08   0.00  55.72  18.24  59.90  25986  619.827    28    1.597
>>>>>>  621.424
>>>>>> 3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
>>>>>> 11283.361
>>>>>>
>>>>>> Here's typical output for iostat on nodes 2 & 3 as well:
>>>>>>
>>>>>> $ iostat -dmx md0
>>>>>>
>>>>>>   Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>>>>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>>>> 2 md0               0.00     0.00  339.00    0.00     9.77     0.00
>>>>>>  59.00     0.00    0.00    0.00    0.00   0.00   0.00
>>>>>> 3 md0               0.00     0.00 2069.00    1.00    85.85     0.00
>>>>>>  84.94     0.00    0.00    0.00    0.00   0.00   0.00
>>>>>>
>>>>>> Griff
>>>>>>
>>>>>> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in>
>>>>>> wrote:
>>>>>>
>>>>>>> Node 2 has slightly higher data but that should be ok. Not sure how
>>>>>>> read ops are so high when no IO intensive activity such as repair and
>>>>>>> compaction is running on node 3.May be you can try investigating logs to
>>>>>>> see whats happening.
>>>>>>>
>>>>>>> Others on the mailing list could also share their views on the
>>>>>>> situation.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Anuj
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Sent from Yahoo Mail on Android
>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>
>>>>>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>>>>>> <ja...@idioplatform.com> wrote:
>>>>>>> Hi Anuj,
>>>>>>>
>>>>>>> Below is the output of nodetool status. The nodes were replaced
>>>>>>> following the instructions in Datastax documentation for replacing running
>>>>>>> nodes since the nodes were running fine, it was that the servers had been
>>>>>>> incorrectly initialised and they thus had less disk space. The status below
>>>>>>> shows 2 has significantly higher load, however as I say 2 is operating
>>>>>>> normally and is running compactions, so I guess that's not an issue?
>>>>>>>
>>>>>>> Datacenter: datacenter1
>>>>>>> =======================
>>>>>>> Status=Up/Down
>>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>>> --  Address         Load       Tokens  Owns   Host ID
>>>>>>>                 Rack
>>>>>>> UN  1               253.59 GB  256     31.7%
>>>>>>>  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>> UN  2               302.23 GB  256     35.3%
>>>>>>>  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>> UN  3               265.02 GB  256     33.1%
>>>>>>>  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
>>>>>>>
>>>>>>> Griff
>>>>>>>
>>>>>>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Revisiting the thread I can see that nodetool status had both good
>>>>>>>> and bad nodes at same time. How do you replace nodes? When you say bad
>>>>>>>> node..I understand that the node is no more usable even though Cassandra is
>>>>>>>> UP? Is that correct?
>>>>>>>>
>>>>>>>> If a node is in bad shape and not working, adding new node may
>>>>>>>> trigger streaming huge data from bad node too. Have you considered using
>>>>>>>> the procedure for replacing a dead node?
>>>>>>>>
>>>>>>>> Please share Latest nodetool status.
>>>>>>>>
>>>>>>>> nodetool output shared earlier:
>>>>>>>>
>>>>>>>>  `nodetool status` output:
>>>>>>>>
>>>>>>>>     Status=Up/Down
>>>>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>>>>> ID                               Rack
>>>>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Anuj
>>>>>>>>
>>>>>>>> Sent from Yahoo Mail on Android
>>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>>
>>>>>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>>>>>>> <ja...@idioplatform.com> wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> We’ve spent a few days running things but are in the same position.
>>>>>>>> To add some more flavour:
>>>>>>>>
>>>>>>>>
>>>>>>>>    - We have a 3-node ring, replication factor = 3. We’ve been
>>>>>>>>    running in this configuration for a few years without any real issues
>>>>>>>>    - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>>>>>>    brought in to replace two other nodes which had failed RAID0 configuration
>>>>>>>>    and thus were lacking in disk space.
>>>>>>>>    - When node 2 was brought into the ring, it exhibited high CPU
>>>>>>>>    wait, IO and load metrics
>>>>>>>>    - We subsequently brought 3 into the ring: as soon as 3 was
>>>>>>>>    fully bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>>>>>>    levels. Those same stats on 3, however, sky-rocketed
>>>>>>>>    - We’ve confirmed configuration across all three nodes are
>>>>>>>>    identical and in line with the recommended production settings
>>>>>>>>    - We’ve run a full repair
>>>>>>>>    - Node 2 is currently running compactions, 1 & 3 aren’t and
>>>>>>>>    have no pending
>>>>>>>>    - There is no GC happening from what I can see. Node 1 has a GC
>>>>>>>>    log, but that’s not been written to since May last year
>>>>>>>>
>>>>>>>>
>>>>>>>> What we’re seeing at the moment is similar and normal stats on
>>>>>>>> nodes 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>>>>>>
>>>>>>>>
>>>>>>>>    1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>>>>>>    2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>>>>>>    3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>>>>>>
>>>>>>>>
>>>>>>>> Can you recommend any next steps?
>>>>>>>>
>>>>>>>> Griff
>>>>>>>>
>>>>>>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Vickrum,
>>>>>>>>>
>>>>>>>>> I would have proceeded with diagnosis as follows:
>>>>>>>>>
>>>>>>>>> 1. Analysis of sar report to check system health -cpu memory swap
>>>>>>>>> disk etc.
>>>>>>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>>>>>>
>>>>>>>>> 2. Make sure that  all recommended Cassandra production settings
>>>>>>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>>>>>>
>>>>>>>>> 3.Run full Repair on bad node and check data size. Node is owner
>>>>>>>>> of maximum token range but has significant lower data.I doubt that
>>>>>>>>> bootstrapping happened properly.
>>>>>>>>>
>>>>>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>>>>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>>>>>>
>>>>>>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>>>>>>
>>>>>>>>> 6. Look for other common performance problems such as GC pauses to
>>>>>>>>> make sure that dropped mutations are not caused by GC pauses.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Anuj
>>>>>>>>>
>>>>>>>>> Sent from Yahoo Mail on Android
>>>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>>>
>>>>>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>>>>>>> <vi...@idioplatform.com> wrote:
>>>>>>>>> # nodetool compactionstats
>>>>>>>>> pending tasks: 22
>>>>>>>>>           compaction type        keyspace           table
>>>>>>>>> completed           total      unit  progress
>>>>>>>>>                Compactionproduction_analytics
>>>>>>>>> interactions       240410213    161172668724     bytes     0.15%
>>>>>>>>>
>>>>>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>>>>>>> 120815385       226295183     bytes    53.39%
>>>>>>>>> Active compaction remaining time :   2h39m58s
>>>>>>>>>
>>>>>>>>> Worth mentioning that compactions haven't been running on this
>>>>>>>>> node particularly often. The node's been performing badly regardless of
>>>>>>>>> whether it's compacting or not.
>>>>>>>>>
>>>>>>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> What’s your output of `nodetool compactionstats`?
>>>>>>>>>>
>>>>>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <
>>>>>>>>>> vickrum.loi@idioplatform.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> We recently added a new node to our cluster in order to replace a
>>>>>>>>>> node that died (hardware failure we believe). For the next two weeks it had
>>>>>>>>>> high disk and network activity. We replaced the server, but it's happened
>>>>>>>>>> again. We've looked into memory allowances, disk performance, number of
>>>>>>>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>>>>>>>> issue.
>>>>>>>>>>
>>>>>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads,
>>>>>>>>>> in comparison to the rest of the cluster, but that's likely a symptom, not
>>>>>>>>>> a cause.
>>>>>>>>>>
>>>>>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The
>>>>>>>>>> bad node (D) has less data.
>>>>>>>>>>
>>>>>>>>>> Disk Activity[2] and Network activity[3] on this node is far
>>>>>>>>>> higher than the rest.
>>>>>>>>>>
>>>>>>>>>> The only other difference this node has to the rest of the
>>>>>>>>>> cluster is that its on the ext4 filesystem, whereas the rest are ext3, but
>>>>>>>>>> we've done plenty of testing there and can't see how that would affect
>>>>>>>>>> performance on this node so much.
>>>>>>>>>>
>>>>>>>>>> Nothing of note in system.log.
>>>>>>>>>>
>>>>>>>>>> What should our next step be in trying to diagnose this issue?
>>>>>>>>>>
>>>>>>>>>> Best wishes,
>>>>>>>>>> Vic
>>>>>>>>>>
>>>>>>>>>> [0] `nodetool tpstats` output:
>>>>>>>>>>
>>>>>>>>>> Good node:
>>>>>>>>>>     Pool Name                    Active   Pending
>>>>>>>>>> Completed   Blocked  All time blocked
>>>>>>>>>>     ReadStage                         0         0
>>>>>>>>>> 46311521         0                 0
>>>>>>>>>>     RequestResponseStage              0         0
>>>>>>>>>> 23817366         0                 0
>>>>>>>>>>     MutationStage                     0         0
>>>>>>>>>> 47389269         0                 0
>>>>>>>>>>     ReadRepairStage                   0         0
>>>>>>>>>> 11108         0                 0
>>>>>>>>>>     ReplicateOnWriteStage             0         0
>>>>>>>>>> 0         0                 0
>>>>>>>>>>     GossipStage                       0         0
>>>>>>>>>> 5259908         0                 0
>>>>>>>>>>     CacheCleanupExecutor              0         0
>>>>>>>>>> 0         0                 0
>>>>>>>>>>     MigrationStage                    0         0
>>>>>>>>>> 30         0                 0
>>>>>>>>>>     MemoryMeter                       0         0
>>>>>>>>>> 16563         0                 0
>>>>>>>>>>     FlushWriter                       0         0
>>>>>>>>>> 39637         0                26
>>>>>>>>>>     ValidationExecutor                0         0
>>>>>>>>>> 19013         0                 0
>>>>>>>>>>     InternalResponseStage             0         0
>>>>>>>>>> 9         0                 0
>>>>>>>>>>     AntiEntropyStage                  0         0
>>>>>>>>>> 38026         0                 0
>>>>>>>>>>     MemtablePostFlusher               0         0
>>>>>>>>>> 81740         0                 0
>>>>>>>>>>     MiscStage                         0         0
>>>>>>>>>> 19196         0                 0
>>>>>>>>>>     PendingRangeCalculator            0         0
>>>>>>>>>> 23         0                 0
>>>>>>>>>>     CompactionExecutor                0         0
>>>>>>>>>> 61629         0                 0
>>>>>>>>>>     commitlog_archiver                0         0
>>>>>>>>>> 0         0                 0
>>>>>>>>>>     HintedHandoff                     0         0
>>>>>>>>>> 63         0                 0
>>>>>>>>>>
>>>>>>>>>>     Message type           Dropped
>>>>>>>>>>     RANGE_SLICE                  0
>>>>>>>>>>     READ_REPAIR                  0
>>>>>>>>>>     PAGED_RANGE                  0
>>>>>>>>>>     BINARY                       0
>>>>>>>>>>     READ                       640
>>>>>>>>>>     MUTATION                     0
>>>>>>>>>>     _TRACE                       0
>>>>>>>>>>     REQUEST_RESPONSE             0
>>>>>>>>>>     COUNTER_MUTATION             0
>>>>>>>>>>
>>>>>>>>>> Bad node:
>>>>>>>>>>     Pool Name                    Active   Pending
>>>>>>>>>> Completed   Blocked  All time blocked
>>>>>>>>>>     ReadStage                        32       113
>>>>>>>>>> 52216         0                 0
>>>>>>>>>>     RequestResponseStage              0         0
>>>>>>>>>> 4167         0                 0
>>>>>>>>>>     MutationStage                     0         0
>>>>>>>>>> 127559         0                 0
>>>>>>>>>>     ReadRepairStage                   0         0
>>>>>>>>>> 125         0                 0
>>>>>>>>>>     ReplicateOnWriteStage             0         0
>>>>>>>>>> 0         0                 0
>>>>>>>>>>     GossipStage                       0         0
>>>>>>>>>> 9965         0                 0
>>>>>>>>>>     CacheCleanupExecutor              0         0
>>>>>>>>>> 0         0                 0
>>>>>>>>>>     MigrationStage                    0         0
>>>>>>>>>> 0         0                 0
>>>>>>>>>>     MemoryMeter                       0         0
>>>>>>>>>> 24         0                 0
>>>>>>>>>>     FlushWriter                       0         0
>>>>>>>>>> 27         0                 1
>>>>>>>>>>     ValidationExecutor                0         0
>>>>>>>>>> 0         0                 0
>>>>>>>>>>     InternalResponseStage             0         0
>>>>>>>>>> 0         0                 0
>>>>>>>>>>     AntiEntropyStage                  0         0
>>>>>>>>>> 0         0                 0
>>>>>>>>>>     MemtablePostFlusher               0         0
>>>>>>>>>> 96         0                 0
>>>>>>>>>>     MiscStage                         0         0
>>>>>>>>>> 0         0                 0
>>>>>>>>>>     PendingRangeCalculator            0         0
>>>>>>>>>> 10         0                 0
>>>>>>>>>>     CompactionExecutor                1         1
>>>>>>>>>> 73         0                 0
>>>>>>>>>>     commitlog_archiver                0         0
>>>>>>>>>> 0         0                 0
>>>>>>>>>>     HintedHandoff                     0         0
>>>>>>>>>> 15         0                 0
>>>>>>>>>>
>>>>>>>>>>     Message type           Dropped
>>>>>>>>>>     RANGE_SLICE                130
>>>>>>>>>>     READ_REPAIR                  1
>>>>>>>>>>     PAGED_RANGE                  0
>>>>>>>>>>     BINARY                       0
>>>>>>>>>>     READ                     31032
>>>>>>>>>>     MUTATION                   865
>>>>>>>>>>     _TRACE                       0
>>>>>>>>>>     REQUEST_RESPONSE             7
>>>>>>>>>>     COUNTER_MUTATION             0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1] `nodetool status` output:
>>>>>>>>>>
>>>>>>>>>>     Status=Up/Down
>>>>>>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>>>>>>> ID                               Rack
>>>>>>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>>>>>
>>>>>>>>>> [2] Disk read/write ops:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>>>>>>
>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>>>>>>
>>>>>>>>>> [3] Network in/out:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>>>>>>
>>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Cassandra 3.1.1 with respect to HeapSpace

Posted by Jean Tremblay <je...@zen-innovations.com>.
Thank you Sebastián!

On 15 Jan 2016, at 19:09 , Sebastian Estevez <se...@datastax.com>> wrote:

The recommended (and default when available) heap size for Cassandra is 8GB and for New size it's 100mb per core.

Your milage may vary based on workload, hardware etc.

There are also some alternative JVM tuning schools of thought. See cassandra-8150 (large heap) and CASSANDRA-7486 (G1GC).



All the best,

[datastax_logo.png]<http://www.datastax.com/>
Sebastián Estévez
Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com<ma...@datastax.com>
[linkedin.png]<https://www.linkedin.com/company/datastax> [facebook.png] <https://www.facebook.com/datastax>  [twitter.png] <https://twitter.com/datastax>  [g+.png] <https://plus.google.com/+Datastax/about>  [https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IHrhsh-98hpi--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-bQ6h_mQ7k7iMybcR1dr1VhWgLMxcmg] <http://feeds.feedburner.com/datastax>
<http://goog_410786983/>

[http://learn.datastax.com/rs/059-YLZ-577/images/Gartner_728x90_Sig4.png]<http://www.datastax.com/gartner-magic-quadrant-odbms>

DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jan 15, 2016 at 4:00 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Thank you Sebastián for your useful advice. I managed restarting the nodes, but I needed to delete all the commit logs, not only the last one specified. Nevertheless I’m back in business.

Would there be a better memory configuration to select for my nodes in a C* 3 cluster? Currently I use MAX_HEAP_SIZE=“6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.

Thanks for your help.

Jean

On 15 Jan 2016, at 24:24 , Sebastian Estevez <se...@datastax.com>> wrote:

Try starting the other nodes. You may have to delete or mv the commitlog segment referenced in the error message for the node to come up since apparently it is corrupted.

All the best,

[datastax_logo.png]<http://www.datastax.com/>
Sebastián Estévez
Solutions Architect | 954 905 8615<tel:954%20905%208615> | sebastian.estevez@datastax.com<ma...@datastax.com>
[linkedin.png]<https://www.linkedin.com/company/datastax> [facebook.png] <https://www.facebook.com/datastax>  [twitter.png] <https://twitter.com/datastax>  [g+.png] <https://plus.google.com/+Datastax/about>  [https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IHrhsh-98hpi--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-bQ6h_mQ7k7iMybcR1dr1VhWgLMxcmg] <http://feeds.feedburner.com/datastax>
<http://goog_410786983/>

[http://learn.datastax.com/rs/059-YLZ-577/images/Gartner_728x90_Sig4.png]<http://www.datastax.com/gartner-magic-quadrant-odbms>

DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Thu, Jan 14, 2016 at 1:00 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
How can I restart?
It blocks with the error listed below.
Are my memory settings good for my configuration?

On 14 Jan 2016, at 18:30, Jake Luciani <ja...@gmail.com>> wrote:

Yes you can restart without data loss.

Can you please include info about how much data you have loaded per node and perhaps what your schema looks like?

Thanks

On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:

Ok, I will open a ticket.

How could I restart my cluster without loosing everything ?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.

Thanks

Jean

On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com>> wrote:

I don't think that's a known issue.  Can you open a ticket at https://issues.apache.org/jira/browse/CASSANDRA and attach your schema along with the commitlog files and the mutation that was saved to /tmp?

On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,

I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
I use Cassandra 3.1.1.
I use the following setup for the memory:
  MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="496M"

I have been loading a lot of data in this cluster over the last 24 hours. The system behaved I think very nicely. It was loading very fast, and giving excellent read time. There was no error messages until this one:


ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602 JVMStabilityInspector.java:139 - JVM state determined to be unstable.  Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
at org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-3.1.1.jar:3.1.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.1.1.jar:3.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]

4 nodes out of 5 crashed with this error message. Now when I want to restart the first node I have the following error;

ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Unexpected error deserializing mutation; saved to /tmp/mutation7465380878750576105dat.  This may be caused by replaying a mutation against a table with the same name but incompatible schema.  Exception follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to read a map
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677) [apache-cassandra-3.1.1.jar:3.1.1]

I can no longer start my nodes.

How can I restart my cluster?
Is this problem known?
Is there a better Cassandra 3 version which would behave better with respect to this problem?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.


Thank you very much for your advice.

Kind regards

Jean



--
Tyler Hobbs
DataStax<http://datastax.com/>



--
http://twitter.com/tjake





Re: Cassandra 3.1.1 with respect to HeapSpace

Posted by Sebastian Estevez <se...@datastax.com>.
The recommended (and default when available) heap size for Cassandra is 8GB
and for New size it's 100mb per core.

Your milage may vary based on workload, hardware etc.

There are also some alternative JVM tuning schools of thought. See
cassandra-8150 (large heap) and CASSANDRA-7486 (G1GC).



All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>
<http://goog_410786983>


<http://www.datastax.com/gartner-magic-quadrant-odbms>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, Jan 15, 2016 at 4:00 AM, Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:

> Thank you Sebastián for your useful advice. I managed restarting the
> nodes, but I needed to delete all the commit logs, not only the last one
> specified. Nevertheless I’m back in business.
>
> Would there be a better memory configuration to select for my nodes in a
> C* 3 cluster? Currently I use MAX_HEAP_SIZE=“6G" HEAP_NEWSIZE=“496M” for
> a 16M RAM node.
>
> Thanks for your help.
>
> Jean
>
> On 15 Jan 2016, at 24:24 , Sebastian Estevez <
> sebastian.estevez@datastax.com> wrote:
>
>
> Try starting the other nodes. You may have to delete or mv the commitlog
> segment referenced in the error message for the node to come up since
> apparently it is corrupted.
>
> All the best,
>
> [image: datastax_logo.png] <http://www.datastax.com/>
> Sebastián Estévez
> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
> <http://goog_410786983/>
>
> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Thu, Jan 14, 2016 at 1:00 PM, Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
>> How can I restart?
>> It blocks with the error listed below.
>> Are my memory settings good for my configuration?
>>
>> On 14 Jan 2016, at 18:30, Jake Luciani <ja...@gmail.com> wrote:
>>
>> Yes you can restart without data loss.
>>
>> Can you please include info about how much data you have loaded per node
>> and perhaps what your schema looks like?
>>
>> Thanks
>>
>> On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <
>> jean.tremblay@zen-innovations.com> wrote:
>>
>>>
>>> Ok, I will open a ticket.
>>>
>>> How could I restart my cluster without loosing everything ?
>>> Would there be a better memory configuration to select for my nodes?
>>> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
>>>
>>> Thanks
>>>
>>> Jean
>>>
>>> On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com> wrote:
>>>
>>> I don't think that's a known issue.  Can you open a ticket at
>>> https://issues.apache.org/jira/browse/CASSANDRA and attach your schema
>>> along with the commitlog files and the mutation that was saved to /tmp?
>>>
>>> On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <
>>> jean.tremblay@zen-innovations.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
>>>> I use Cassandra 3.1.1.
>>>> I use the following setup for the memory:
>>>>   MAX_HEAP_SIZE="6G"
>>>> HEAP_NEWSIZE="496M"
>>>>
>>>> I have been loading a lot of data in this cluster over the last 24
>>>> hours. The system behaved I think very nicely. It was loading very fast,
>>>> and giving excellent read time. There was no error messages until this one:
>>>>
>>>>
>>>> ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602
>>>> JVMStabilityInspector.java:139 - JVM state determined to be unstable.
>>>> Exiting forcefully due to:
>>>> java.lang.OutOfMemoryError: Java heap space
>>>> at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
>>>> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
>>>> at
>>>> org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>> ~[na:1.8.0_65]
>>>> at
>>>> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>>> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
>>>>
>>>> 4 nodes out of 5 crashed with this error message. Now when I want to
>>>> restart the first node I have the following error;
>>>>
>>>> ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 -
>>>> Exiting due to error while processing commit log during initialization.
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
>>>> Unexpected error deserializing mutation; saved to
>>>> /tmp/mutation7465380878750576105dat.  This may be caused by replaying a
>>>> mutation against a table with the same name but incompatible schema.
>>>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>>>> enough bytes to read a map
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>> at
>>>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677)
>>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>>
>>>> I can no longer start my nodes.
>>>>
>>>> How can I restart my cluster?
>>>> Is this problem known?
>>>> Is there a better Cassandra 3 version which would behave better with
>>>> respect to this problem?
>>>> Would there be a better memory configuration to select for my nodes?
>>>> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM
>>>> node.
>>>>
>>>>
>>>> Thank you very much for your advice.
>>>>
>>>> Kind regards
>>>>
>>>> Jean
>>>>
>>>
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax <http://datastax.com/>
>>>
>>>
>>
>>
>> --
>> http://twitter.com/tjake
>>
>>
>
>

Re: Cassandra 3.1.1 with respect to HeapSpace

Posted by Jean Tremblay <je...@zen-innovations.com>.
Thank you Sebastián for your useful advice. I managed restarting the nodes, but I needed to delete all the commit logs, not only the last one specified. Nevertheless I’m back in business.

Would there be a better memory configuration to select for my nodes in a C* 3 cluster? Currently I use MAX_HEAP_SIZE=“6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.

Thanks for your help.

Jean

On 15 Jan 2016, at 24:24 , Sebastian Estevez <se...@datastax.com>> wrote:

Try starting the other nodes. You may have to delete or mv the commitlog segment referenced in the error message for the node to come up since apparently it is corrupted.

All the best,

[datastax_logo.png]<http://www.datastax.com/>
Sebastián Estévez
Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com<ma...@datastax.com>
[linkedin.png]<https://www.linkedin.com/company/datastax> [facebook.png] <https://www.facebook.com/datastax>  [twitter.png] <https://twitter.com/datastax>  [g+.png] <https://plus.google.com/+Datastax/about>  [https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IHrhsh-98hpi--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-bQ6h_mQ7k7iMybcR1dr1VhWgLMxcmg] <http://feeds.feedburner.com/datastax>
<http://goog_410786983/>

[http://learn.datastax.com/rs/059-YLZ-577/images/Gartner_728x90_Sig4.png]<http://www.datastax.com/gartner-magic-quadrant-odbms>

DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Thu, Jan 14, 2016 at 1:00 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:
How can I restart?
It blocks with the error listed below.
Are my memory settings good for my configuration?

On 14 Jan 2016, at 18:30, Jake Luciani <ja...@gmail.com>> wrote:

Yes you can restart without data loss.

Can you please include info about how much data you have loaded per node and perhaps what your schema looks like?

Thanks

On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:

Ok, I will open a ticket.

How could I restart my cluster without loosing everything ?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.

Thanks

Jean

On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com>> wrote:

I don't think that's a known issue.  Can you open a ticket at https://issues.apache.org/jira/browse/CASSANDRA and attach your schema along with the commitlog files and the mutation that was saved to /tmp?

On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,

I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
I use Cassandra 3.1.1.
I use the following setup for the memory:
  MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="496M"

I have been loading a lot of data in this cluster over the last 24 hours. The system behaved I think very nicely. It was loading very fast, and giving excellent read time. There was no error messages until this one:


ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602 JVMStabilityInspector.java:139 - JVM state determined to be unstable.  Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
at org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-3.1.1.jar:3.1.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.1.1.jar:3.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]

4 nodes out of 5 crashed with this error message. Now when I want to restart the first node I have the following error;

ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Unexpected error deserializing mutation; saved to /tmp/mutation7465380878750576105dat.  This may be caused by replaying a mutation against a table with the same name but incompatible schema.  Exception follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to read a map
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677) [apache-cassandra-3.1.1.jar:3.1.1]

I can no longer start my nodes.

How can I restart my cluster?
Is this problem known?
Is there a better Cassandra 3 version which would behave better with respect to this problem?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.


Thank you very much for your advice.

Kind regards

Jean



--
Tyler Hobbs
DataStax<http://datastax.com/>



--
http://twitter.com/tjake



Re: Cassandra 3.1.1 with respect to HeapSpace

Posted by Sebastian Estevez <se...@datastax.com>.
Try starting the other nodes. You may have to delete or mv the commitlog
segment referenced in the error message for the node to come up since
apparently it is corrupted.

All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>
<http://goog_410786983>


<http://www.datastax.com/gartner-magic-quadrant-odbms>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Thu, Jan 14, 2016 at 1:00 PM, Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:

> How can I restart?
> It blocks with the error listed below.
> Are my memory settings good for my configuration?
>
> On 14 Jan 2016, at 18:30, Jake Luciani <ja...@gmail.com> wrote:
>
> Yes you can restart without data loss.
>
> Can you please include info about how much data you have loaded per node
> and perhaps what your schema looks like?
>
> Thanks
>
> On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
>>
>> Ok, I will open a ticket.
>>
>> How could I restart my cluster without loosing everything ?
>> Would there be a better memory configuration to select for my nodes?
>> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
>>
>> Thanks
>>
>> Jean
>>
>> On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com> wrote:
>>
>> I don't think that's a known issue.  Can you open a ticket at
>> https://issues.apache.org/jira/browse/CASSANDRA and attach your schema
>> along with the commitlog files and the mutation that was saved to /tmp?
>>
>> On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <
>> jean.tremblay@zen-innovations.com> wrote:
>>
>>> Hi,
>>>
>>> I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
>>> I use Cassandra 3.1.1.
>>> I use the following setup for the memory:
>>>   MAX_HEAP_SIZE="6G"
>>> HEAP_NEWSIZE="496M"
>>>
>>> I have been loading a lot of data in this cluster over the last 24
>>> hours. The system behaved I think very nicely. It was loading very fast,
>>> and giving excellent read time. There was no error messages until this one:
>>>
>>>
>>> ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602
>>> JVMStabilityInspector.java:139 - JVM state determined to be unstable.
>>> Exiting forcefully due to:
>>> java.lang.OutOfMemoryError: Java heap space
>>> at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
>>> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
>>> at
>>> org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> ~[na:1.8.0_65]
>>> at
>>> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>>> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
>>>
>>> 4 nodes out of 5 crashed with this error message. Now when I want to
>>> restart the first node I have the following error;
>>>
>>> ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 -
>>> Exiting due to error while processing commit log during initialization.
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
>>> Unexpected error deserializing mutation; saved to
>>> /tmp/mutation7465380878750576105dat.  This may be caused by replaying a
>>> mutation against a table with the same name but incompatible schema.
>>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>>> enough bytes to read a map
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>> at
>>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677)
>>> [apache-cassandra-3.1.1.jar:3.1.1]
>>>
>>> I can no longer start my nodes.
>>>
>>> How can I restart my cluster?
>>> Is this problem known?
>>> Is there a better Cassandra 3 version which would behave better with
>>> respect to this problem?
>>> Would there be a better memory configuration to select for my nodes?
>>> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM
>>> node.
>>>
>>>
>>> Thank you very much for your advice.
>>>
>>> Kind regards
>>>
>>> Jean
>>>
>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>>
>
>
> --
> http://twitter.com/tjake
>
>

Re: Cassandra 3.1.1 with respect to HeapSpace

Posted by Jean Tremblay <je...@zen-innovations.com>.
How can I restart?
It blocks with the error listed below.
Are my memory settings good for my configuration?

On 14 Jan 2016, at 18:30, Jake Luciani <ja...@gmail.com>> wrote:

Yes you can restart without data loss.

Can you please include info about how much data you have loaded per node and perhaps what your schema looks like?

Thanks

On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <je...@zen-innovations.com>> wrote:

Ok, I will open a ticket.

How could I restart my cluster without loosing everything ?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.

Thanks

Jean

On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com>> wrote:

I don't think that's a known issue.  Can you open a ticket at https://issues.apache.org/jira/browse/CASSANDRA and attach your schema along with the commitlog files and the mutation that was saved to /tmp?

On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,

I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
I use Cassandra 3.1.1.
I use the following setup for the memory:
  MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="496M"

I have been loading a lot of data in this cluster over the last 24 hours. The system behaved I think very nicely. It was loading very fast, and giving excellent read time. There was no error messages until this one:


ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602 JVMStabilityInspector.java:139 - JVM state determined to be unstable.  Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
at org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-3.1.1.jar:3.1.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.1.1.jar:3.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]

4 nodes out of 5 crashed with this error message. Now when I want to restart the first node I have the following error;

ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Unexpected error deserializing mutation; saved to /tmp/mutation7465380878750576105dat.  This may be caused by replaying a mutation against a table with the same name but incompatible schema.  Exception follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to read a map
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677) [apache-cassandra-3.1.1.jar:3.1.1]

I can no longer start my nodes.

How can I restart my cluster?
Is this problem known?
Is there a better Cassandra 3 version which would behave better with respect to this problem?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.


Thank you very much for your advice.

Kind regards

Jean



--
Tyler Hobbs
DataStax<http://datastax.com/>



--
http://twitter.com/tjake

Re: Cassandra 3.1.1 with respect to HeapSpace

Posted by Jake Luciani <ja...@gmail.com>.
Yes you can restart without data loss.

Can you please include info about how much data you have loaded per node
and perhaps what your schema looks like?

Thanks

On Thu, Jan 14, 2016 at 12:24 PM, Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:

>
> Ok, I will open a ticket.
>
> How could I restart my cluster without loosing everything ?
> Would there be a better memory configuration to select for my nodes?
> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
>
> Thanks
>
> Jean
>
> On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com> wrote:
>
> I don't think that's a known issue.  Can you open a ticket at
> https://issues.apache.org/jira/browse/CASSANDRA and attach your schema
> along with the commitlog files and the mutation that was saved to /tmp?
>
> On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
>> Hi,
>>
>> I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
>> I use Cassandra 3.1.1.
>> I use the following setup for the memory:
>>   MAX_HEAP_SIZE="6G"
>> HEAP_NEWSIZE="496M"
>>
>> I have been loading a lot of data in this cluster over the last 24 hours.
>> The system behaved I think very nicely. It was loading very fast, and
>> giving excellent read time. There was no error messages until this one:
>>
>>
>> ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602
>> JVMStabilityInspector.java:139 - JVM state determined to be unstable.
>> Exiting forcefully due to:
>> java.lang.OutOfMemoryError: Java heap space
>> at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
>> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
>> at
>> org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> ~[na:1.8.0_65]
>> at
>> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>> ~[apache-cassandra-3.1.1.jar:3.1.1]
>> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
>>
>> 4 nodes out of 5 crashed with this error message. Now when I want to
>> restart the first node I have the following error;
>>
>> ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 -
>> Exiting due to error while processing commit log during initialization.
>> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
>> Unexpected error deserializing mutation; saved to
>> /tmp/mutation7465380878750576105dat.  This may be caused by replaying a
>> mutation against a table with the same name but incompatible schema.
>> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
>> enough bytes to read a map
>> at
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>> at
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677)
>> [apache-cassandra-3.1.1.jar:3.1.1]
>>
>> I can no longer start my nodes.
>>
>> How can I restart my cluster?
>> Is this problem known?
>> Is there a better Cassandra 3 version which would behave better with
>> respect to this problem?
>> Would there be a better memory configuration to select for my nodes?
>> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM
>> node.
>>
>>
>> Thank you very much for your advice.
>>
>> Kind regards
>>
>> Jean
>>
>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
>


-- 
http://twitter.com/tjake

Re: Cassandra 3.1.1 with respect to HeapSpace

Posted by Jean Tremblay <je...@zen-innovations.com>.
Ok, I will open a ticket.

How could I restart my cluster without loosing everything ?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.

Thanks

Jean

On 14 Jan 2016, at 18:19, Tyler Hobbs <ty...@datastax.com>> wrote:

I don't think that's a known issue.  Can you open a ticket at https://issues.apache.org/jira/browse/CASSANDRA and attach your schema along with the commitlog files and the mutation that was saved to /tmp?

On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <je...@zen-innovations.com>> wrote:
Hi,

I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
I use Cassandra 3.1.1.
I use the following setup for the memory:
  MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="496M"

I have been loading a lot of data in this cluster over the last 24 hours. The system behaved I think very nicely. It was loading very fast, and giving excellent read time. There was no error messages until this one:


ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602 JVMStabilityInspector.java:139 - JVM state determined to be unstable.  Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
at org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-3.1.1.jar:3.1.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.1.1.jar:3.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]

4 nodes out of 5 crashed with this error message. Now when I want to restart the first node I have the following error;

ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Unexpected error deserializing mutation; saved to /tmp/mutation7465380878750576105dat.  This may be caused by replaying a mutation against a table with the same name but incompatible schema.  Exception follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to read a map
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677) [apache-cassandra-3.1.1.jar:3.1.1]

I can no longer start my nodes.

How can I restart my cluster?
Is this problem known?
Is there a better Cassandra 3 version which would behave better with respect to this problem?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.


Thank you very much for your advice.

Kind regards

Jean



--
Tyler Hobbs
DataStax<http://datastax.com/>

Re: Cassandra 3.1.1 with respect to HeapSpace

Posted by Tyler Hobbs <ty...@datastax.com>.
I don't think that's a known issue.  Can you open a ticket at
https://issues.apache.org/jira/browse/CASSANDRA and attach your schema
along with the commitlog files and the mutation that was saved to /tmp?

On Thu, Jan 14, 2016 at 10:56 AM, Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:

> Hi,
>
> I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
> I use Cassandra 3.1.1.
> I use the following setup for the memory:
>   MAX_HEAP_SIZE="6G"
> HEAP_NEWSIZE="496M"
>
> I have been loading a lot of data in this cluster over the last 24 hours.
> The system behaved I think very nicely. It was loading very fast, and
> giving excellent read time. There was no error messages until this one:
>
>
> ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602
> JVMStabilityInspector.java:139 - JVM state determined to be unstable.
> Exiting forcefully due to:
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
> at
> org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_65]
> at
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
> ~[apache-cassandra-3.1.1.jar:3.1.1]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
>
> 4 nodes out of 5 crashed with this error message. Now when I want to
> restart the first node I have the following error;
>
> ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 -
> Exiting due to error while processing commit log during initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException:
> Unexpected error deserializing mutation; saved to
> /tmp/mutation7465380878750576105dat.  This may be caused by replaying a
> mutation against a table with the same name but incompatible schema.
> Exception follows: org.apache.cassandra.serializers.MarshalException: Not
> enough bytes to read a map
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549)
> [apache-cassandra-3.1.1.jar:3.1.1]
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677)
> [apache-cassandra-3.1.1.jar:3.1.1]
>
> I can no longer start my nodes.
>
> How can I restart my cluster?
> Is this problem known?
> Is there a better Cassandra 3 version which would behave better with
> respect to this problem?
> Would there be a better memory configuration to select for my nodes?
> Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.
>
>
> Thank you very much for your advice.
>
> Kind regards
>
> Jean
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Cassandra 3.1.1 with respect to HeapSpace

Posted by Jean Tremblay <je...@zen-innovations.com>.
Hi,

I have a small Cassandra Cluster with 5 nodes, having 16MB of RAM.
I use Cassandra 3.1.1.
I use the following setup for the memory:
  MAX_HEAP_SIZE="6G"
HEAP_NEWSIZE="496M"

I have been loading a lot of data in this cluster over the last 24 hours. The system behaved I think very nicely. It was loading very fast, and giving excellent read time. There was no error messages until this one:


ERROR [SharedPool-Worker-35] 2016-01-14 17:05:23,602 JVMStabilityInspector.java:139 - JVM state determined to be unstable.  Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.8.0_65]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_65]
at org.apache.cassandra.io.util.DataOutputBuffer.reallocate(DataOutputBuffer.java:126) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.DataOutputBuffer.doFlush(DataOutputBuffer.java:86) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:151) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.utils.ByteBufferUtil.writeWithVIntLength(ByteBufferUtil.java:297) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.marshal.AbstractType.writeValue(AbstractType.java:374) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.BufferCell$Serializer.serialize(BufferCell.java:263) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:183) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:298) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:136) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:128) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-3.1.1.jar:3.1.1]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.1.1.jar:3.1.1]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]

4 nodes out of 5 crashed with this error message. Now when I want to restart the first node I have the following error;

ERROR [main] 2016-01-14 17:15:59,617 JVMStabilityInspector.java:81 - Exiting due to error while processing commit log during initialization.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Unexpected error deserializing mutation; saved to /tmp/mutation7465380878750576105dat.  This may be caused by replaying a mutation against a table with the same name but incompatible schema.  Exception follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to read a map
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:633) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayMutation(CommitLogReplayer.java:556) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:509) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:404) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:151) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:283) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:549) [apache-cassandra-3.1.1.jar:3.1.1]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:677) [apache-cassandra-3.1.1.jar:3.1.1]

I can no longer start my nodes.

How can I restart my cluster?
Is this problem known?
Is there a better Cassandra 3 version which would behave better with respect to this problem?
Would there be a better memory configuration to select for my nodes? Currently I use MAX_HEAP_SIZE="6G" HEAP_NEWSIZE=“496M” for a 16M RAM node.


Thank you very much for your advice.

Kind regards

Jean

Re: New node has high network and disk usage.

Posted by James Griffin <ja...@idioplatform.com>.
Hi Kai,

Well observed - running `nodetool status` without specifying keyspace does
report ~33% on each node. We have two keyspaces on this cluster - if I
specify either of them the ownership reported by each node is 100%, so I
believe the repair completed successfully.

Best wishes,

Griff

[image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
CTO
Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 | Twitter:
@imaginaryroots <http://twitter.com/imaginaryroots> | Skype: j.s.griffin
idio helps major brands and publishers to build closer relationships with
their customers and prospects by learning from their content consumption
and acting on that insight. We call it Content Intelligence, and it
integrates with your existing marketing technology to provide detailed
customer interest profiles in real-time across all channels, and to
personalize content into every channel for every customer. See
http://idioplatform.com
<https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F>
for
more information.

On 14 January 2016 at 15:08, Kai Wang <de...@gmail.com> wrote:

> James,
>
> I may miss something. You mentioned your cluster had RF=3. Then why does
> "nodetool status" show each node owns 1/3 of the data especially after a
> full repair?
>
> On Thu, Jan 14, 2016 at 9:56 AM, James Griffin <
> james.griffin@idioplatform.com> wrote:
>
>> Hi Kai,
>>
>> Below - nothing going on that I can see
>>
>> $ nodetool netstats
>> Mode: NORMAL
>> Not sending any streams.
>> Read Repair Statistics:
>> Attempted: 0
>> Mismatch (Blocking): 0
>> Mismatch (Background): 0
>> Pool Name                    Active   Pending      Completed
>> Commands                        n/a         0           6326
>> Responses                       n/a         0         219356
>>
>>
>>
>> Best wishes,
>>
>> Griff
>>
>> [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
>> CTO
>> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 |
>> Twitter: @imaginaryroots <http://twitter.com/imaginaryroots> | Skype:
>> j.s.griffin
>> idio helps major brands and publishers to build closer relationships with
>> their customers and prospects by learning from their content consumption
>> and acting on that insight. We call it Content Intelligence, and it
>> integrates with your existing marketing technology to provide detailed
>> customer interest profiles in real-time across all channels, and to
>> personalize content into every channel for every customer. See
>> http://idioplatform.com
>> <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F> for
>> more information.
>>
>> On 14 January 2016 at 14:22, Kai Wang <de...@gmail.com> wrote:
>>
>>> James,
>>>
>>> Can you post the result of "nodetool netstats" on the bad node?
>>>
>>> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
>>> james.griffin@idioplatform.com> wrote:
>>>
>>>> A summary of what we've done this morning:
>>>>
>>>>    - Noted that there are no GCInspector lines in system.log on bad
>>>>    node (there are GCInspector logs on other healthy nodes)
>>>>    - Turned on GC logging, noted that we had logs which stated out
>>>>    total time for which application threads were stopped was high - ~10s.
>>>>    - Not seeing failures or any kind (promotion or concurrent mark)
>>>>    - Attached Visual VM: noted that heap usage was very low (~5% usage
>>>>    and stable) and it didn't display hallmarks GC of activity. PermGen also
>>>>    very stable
>>>>    - Downloaded GC logs and examined in GC Viewer. Noted that:
>>>>    - We had lots of pauses (again around 10s), but no full GC.
>>>>       - From a 2,300s sample, just over 2,000s were spent with threads
>>>>       paused
>>>>       - Spotted many small GCs in the new space - realised that Xmn
>>>>       value was very low (200M against a heap size of 3750M). Increased Xmn to
>>>>       937M - no change in server behaviour (high load, high reads/s on disk, high
>>>>       CPU wait)
>>>>
>>>> Current output of jstat:
>>>>
>>>>   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>>>> 2 0.00  45.20  12.82  26.84  76.21   2333   63.684     2    0.039
>>>> 63.724
>>>> 3 63.58   0.00  33.68   8.04  75.19     14    1.812     2    0.103
>>>>  1.915
>>>>
>>>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise
>>>> than 2 (which has normal load statistics).
>>>>
>>>> Anywhere else you can recommend we look?
>>>>
>>>> Griff
>>>>
>>>> On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in>
>>>> wrote:
>>>>
>>>>> Ok. I saw dropped mutations on your cluster and full gc is a common
>>>>> cause for that.
>>>>> Can you just search the word GCInspector in system.log and share the
>>>>> frequency of minor and full gc. Moreover, are you printing promotion
>>>>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>>>>> or concurrent mode failures?
>>>>>
>>>>> If you are on CMS, you need to fine tune your heap options to address
>>>>> full gc.
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>> Anuj
>>>>>
>>>>> Sent from Yahoo Mail on Android
>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>
>>>>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>>>>> <ja...@idioplatform.com> wrote:
>>>>> I think I was incorrect in assuming GC wasn't an issue due to the lack
>>>>> of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>>>>> differences, though
>>>>> comparing the startup flags on the two machines show the GC config is
>>>>> identical.:
>>>>>
>>>>> $ jstat -gcutil
>>>>>    S0     S1     E      O      P     YGC     YGCT    FGC    FGCT
>>>>> GCT
>>>>> 2  5.08   0.00  55.72  18.24  59.90  25986  619.827    28    1.597
>>>>>  621.424
>>>>> 3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
>>>>> 11283.361
>>>>>
>>>>> Here's typical output for iostat on nodes 2 & 3 as well:
>>>>>
>>>>> $ iostat -dmx md0
>>>>>
>>>>>   Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>>>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>>> 2 md0               0.00     0.00  339.00    0.00     9.77     0.00
>>>>>  59.00     0.00    0.00    0.00    0.00   0.00   0.00
>>>>> 3 md0               0.00     0.00 2069.00    1.00    85.85     0.00
>>>>>  84.94     0.00    0.00    0.00    0.00   0.00   0.00
>>>>>
>>>>> Griff
>>>>>
>>>>> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in>
>>>>> wrote:
>>>>>
>>>>>> Node 2 has slightly higher data but that should be ok. Not sure how
>>>>>> read ops are so high when no IO intensive activity such as repair and
>>>>>> compaction is running on node 3.May be you can try investigating logs to
>>>>>> see whats happening.
>>>>>>
>>>>>> Others on the mailing list could also share their views on the
>>>>>> situation.
>>>>>>
>>>>>> Thanks
>>>>>> Anuj
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sent from Yahoo Mail on Android
>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>
>>>>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>>>>> <ja...@idioplatform.com> wrote:
>>>>>> Hi Anuj,
>>>>>>
>>>>>> Below is the output of nodetool status. The nodes were replaced
>>>>>> following the instructions in Datastax documentation for replacing running
>>>>>> nodes since the nodes were running fine, it was that the servers had been
>>>>>> incorrectly initialised and they thus had less disk space. The status below
>>>>>> shows 2 has significantly higher load, however as I say 2 is operating
>>>>>> normally and is running compactions, so I guess that's not an issue?
>>>>>>
>>>>>> Datacenter: datacenter1
>>>>>> =======================
>>>>>> Status=Up/Down
>>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>>> --  Address         Load       Tokens  Owns   Host ID
>>>>>>               Rack
>>>>>> UN  1               253.59 GB  256     31.7%
>>>>>>  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>> UN  2               302.23 GB  256     35.3%
>>>>>>  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>> UN  3               265.02 GB  256     33.1%
>>>>>>  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
>>>>>>
>>>>>> Griff
>>>>>>
>>>>>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Revisiting the thread I can see that nodetool status had both good
>>>>>>> and bad nodes at same time. How do you replace nodes? When you say bad
>>>>>>> node..I understand that the node is no more usable even though Cassandra is
>>>>>>> UP? Is that correct?
>>>>>>>
>>>>>>> If a node is in bad shape and not working, adding new node may
>>>>>>> trigger streaming huge data from bad node too. Have you considered using
>>>>>>> the procedure for replacing a dead node?
>>>>>>>
>>>>>>> Please share Latest nodetool status.
>>>>>>>
>>>>>>> nodetool output shared earlier:
>>>>>>>
>>>>>>>  `nodetool status` output:
>>>>>>>
>>>>>>>     Status=Up/Down
>>>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>>>> ID                               Rack
>>>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Anuj
>>>>>>>
>>>>>>> Sent from Yahoo Mail on Android
>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>
>>>>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>>>>>> <ja...@idioplatform.com> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> We’ve spent a few days running things but are in the same position.
>>>>>>> To add some more flavour:
>>>>>>>
>>>>>>>
>>>>>>>    - We have a 3-node ring, replication factor = 3. We’ve been
>>>>>>>    running in this configuration for a few years without any real issues
>>>>>>>    - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>>>>>    brought in to replace two other nodes which had failed RAID0 configuration
>>>>>>>    and thus were lacking in disk space.
>>>>>>>    - When node 2 was brought into the ring, it exhibited high CPU
>>>>>>>    wait, IO and load metrics
>>>>>>>    - We subsequently brought 3 into the ring: as soon as 3 was
>>>>>>>    fully bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>>>>>    levels. Those same stats on 3, however, sky-rocketed
>>>>>>>    - We’ve confirmed configuration across all three nodes are
>>>>>>>    identical and in line with the recommended production settings
>>>>>>>    - We’ve run a full repair
>>>>>>>    - Node 2 is currently running compactions, 1 & 3 aren’t and have
>>>>>>>    no pending
>>>>>>>    - There is no GC happening from what I can see. Node 1 has a GC
>>>>>>>    log, but that’s not been written to since May last year
>>>>>>>
>>>>>>>
>>>>>>> What we’re seeing at the moment is similar and normal stats on nodes
>>>>>>> 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>>>>>
>>>>>>>
>>>>>>>    1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>>>>>    2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>>>>>    3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>>>>>
>>>>>>>
>>>>>>> Can you recommend any next steps?
>>>>>>>
>>>>>>> Griff
>>>>>>>
>>>>>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Vickrum,
>>>>>>>>
>>>>>>>> I would have proceeded with diagnosis as follows:
>>>>>>>>
>>>>>>>> 1. Analysis of sar report to check system health -cpu memory swap
>>>>>>>> disk etc.
>>>>>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>>>>>
>>>>>>>> 2. Make sure that  all recommended Cassandra production settings
>>>>>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>>>>>
>>>>>>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>>>>>>> maximum token range but has significant lower data.I doubt that
>>>>>>>> bootstrapping happened properly.
>>>>>>>>
>>>>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>>>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>>>>>
>>>>>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>>>>>
>>>>>>>> 6. Look for other common performance problems such as GC pauses to
>>>>>>>> make sure that dropped mutations are not caused by GC pauses.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Anuj
>>>>>>>>
>>>>>>>> Sent from Yahoo Mail on Android
>>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>>
>>>>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>>>>>> <vi...@idioplatform.com> wrote:
>>>>>>>> # nodetool compactionstats
>>>>>>>> pending tasks: 22
>>>>>>>>           compaction type        keyspace           table
>>>>>>>> completed           total      unit  progress
>>>>>>>>                Compactionproduction_analytics    interactions
>>>>>>>> 240410213    161172668724     bytes     0.15%
>>>>>>>>
>>>>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>>>>>> 120815385       226295183     bytes    53.39%
>>>>>>>> Active compaction remaining time :   2h39m58s
>>>>>>>>
>>>>>>>> Worth mentioning that compactions haven't been running on this node
>>>>>>>> particularly often. The node's been performing badly regardless of whether
>>>>>>>> it's compacting or not.
>>>>>>>>
>>>>>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> What’s your output of `nodetool compactionstats`?
>>>>>>>>>
>>>>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <
>>>>>>>>> vickrum.loi@idioplatform.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> We recently added a new node to our cluster in order to replace a
>>>>>>>>> node that died (hardware failure we believe). For the next two weeks it had
>>>>>>>>> high disk and network activity. We replaced the server, but it's happened
>>>>>>>>> again. We've looked into memory allowances, disk performance, number of
>>>>>>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>>>>>>> issue.
>>>>>>>>>
>>>>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads,
>>>>>>>>> in comparison to the rest of the cluster, but that's likely a symptom, not
>>>>>>>>> a cause.
>>>>>>>>>
>>>>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The
>>>>>>>>> bad node (D) has less data.
>>>>>>>>>
>>>>>>>>> Disk Activity[2] and Network activity[3] on this node is far
>>>>>>>>> higher than the rest.
>>>>>>>>>
>>>>>>>>> The only other difference this node has to the rest of the cluster
>>>>>>>>> is that its on the ext4 filesystem, whereas the rest are ext3, but we've
>>>>>>>>> done plenty of testing there and can't see how that would affect
>>>>>>>>> performance on this node so much.
>>>>>>>>>
>>>>>>>>> Nothing of note in system.log.
>>>>>>>>>
>>>>>>>>> What should our next step be in trying to diagnose this issue?
>>>>>>>>>
>>>>>>>>> Best wishes,
>>>>>>>>> Vic
>>>>>>>>>
>>>>>>>>> [0] `nodetool tpstats` output:
>>>>>>>>>
>>>>>>>>> Good node:
>>>>>>>>>     Pool Name                    Active   Pending      Completed
>>>>>>>>> Blocked  All time blocked
>>>>>>>>>     ReadStage                         0         0
>>>>>>>>> 46311521         0                 0
>>>>>>>>>     RequestResponseStage              0         0
>>>>>>>>> 23817366         0                 0
>>>>>>>>>     MutationStage                     0         0
>>>>>>>>> 47389269         0                 0
>>>>>>>>>     ReadRepairStage                   0         0
>>>>>>>>> 11108         0                 0
>>>>>>>>>     ReplicateOnWriteStage             0         0
>>>>>>>>> 0         0                 0
>>>>>>>>>     GossipStage                       0         0
>>>>>>>>> 5259908         0                 0
>>>>>>>>>     CacheCleanupExecutor              0         0
>>>>>>>>> 0         0                 0
>>>>>>>>>     MigrationStage                    0         0
>>>>>>>>> 30         0                 0
>>>>>>>>>     MemoryMeter                       0         0
>>>>>>>>> 16563         0                 0
>>>>>>>>>     FlushWriter                       0         0
>>>>>>>>> 39637         0                26
>>>>>>>>>     ValidationExecutor                0         0
>>>>>>>>> 19013         0                 0
>>>>>>>>>     InternalResponseStage             0         0
>>>>>>>>> 9         0                 0
>>>>>>>>>     AntiEntropyStage                  0         0
>>>>>>>>> 38026         0                 0
>>>>>>>>>     MemtablePostFlusher               0         0
>>>>>>>>> 81740         0                 0
>>>>>>>>>     MiscStage                         0         0
>>>>>>>>> 19196         0                 0
>>>>>>>>>     PendingRangeCalculator            0         0
>>>>>>>>> 23         0                 0
>>>>>>>>>     CompactionExecutor                0         0
>>>>>>>>> 61629         0                 0
>>>>>>>>>     commitlog_archiver                0         0
>>>>>>>>> 0         0                 0
>>>>>>>>>     HintedHandoff                     0         0
>>>>>>>>> 63         0                 0
>>>>>>>>>
>>>>>>>>>     Message type           Dropped
>>>>>>>>>     RANGE_SLICE                  0
>>>>>>>>>     READ_REPAIR                  0
>>>>>>>>>     PAGED_RANGE                  0
>>>>>>>>>     BINARY                       0
>>>>>>>>>     READ                       640
>>>>>>>>>     MUTATION                     0
>>>>>>>>>     _TRACE                       0
>>>>>>>>>     REQUEST_RESPONSE             0
>>>>>>>>>     COUNTER_MUTATION             0
>>>>>>>>>
>>>>>>>>> Bad node:
>>>>>>>>>     Pool Name                    Active   Pending      Completed
>>>>>>>>> Blocked  All time blocked
>>>>>>>>>     ReadStage                        32       113
>>>>>>>>> 52216         0                 0
>>>>>>>>>     RequestResponseStage              0         0
>>>>>>>>> 4167         0                 0
>>>>>>>>>     MutationStage                     0         0
>>>>>>>>> 127559         0                 0
>>>>>>>>>     ReadRepairStage                   0         0
>>>>>>>>> 125         0                 0
>>>>>>>>>     ReplicateOnWriteStage             0         0
>>>>>>>>> 0         0                 0
>>>>>>>>>     GossipStage                       0         0
>>>>>>>>> 9965         0                 0
>>>>>>>>>     CacheCleanupExecutor              0         0
>>>>>>>>> 0         0                 0
>>>>>>>>>     MigrationStage                    0         0
>>>>>>>>> 0         0                 0
>>>>>>>>>     MemoryMeter                       0         0
>>>>>>>>> 24         0                 0
>>>>>>>>>     FlushWriter                       0         0
>>>>>>>>> 27         0                 1
>>>>>>>>>     ValidationExecutor                0         0
>>>>>>>>> 0         0                 0
>>>>>>>>>     InternalResponseStage             0         0
>>>>>>>>> 0         0                 0
>>>>>>>>>     AntiEntropyStage                  0         0
>>>>>>>>> 0         0                 0
>>>>>>>>>     MemtablePostFlusher               0         0
>>>>>>>>> 96         0                 0
>>>>>>>>>     MiscStage                         0         0
>>>>>>>>> 0         0                 0
>>>>>>>>>     PendingRangeCalculator            0         0
>>>>>>>>> 10         0                 0
>>>>>>>>>     CompactionExecutor                1         1
>>>>>>>>> 73         0                 0
>>>>>>>>>     commitlog_archiver                0         0
>>>>>>>>> 0         0                 0
>>>>>>>>>     HintedHandoff                     0         0
>>>>>>>>> 15         0                 0
>>>>>>>>>
>>>>>>>>>     Message type           Dropped
>>>>>>>>>     RANGE_SLICE                130
>>>>>>>>>     READ_REPAIR                  1
>>>>>>>>>     PAGED_RANGE                  0
>>>>>>>>>     BINARY                       0
>>>>>>>>>     READ                     31032
>>>>>>>>>     MUTATION                   865
>>>>>>>>>     _TRACE                       0
>>>>>>>>>     REQUEST_RESPONSE             7
>>>>>>>>>     COUNTER_MUTATION             0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] `nodetool status` output:
>>>>>>>>>
>>>>>>>>>     Status=Up/Down
>>>>>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>>>>>> ID                               Rack
>>>>>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>>>>
>>>>>>>>> [2] Disk read/write ops:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>>>>>
>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>>>>>
>>>>>>>>> [3] Network in/out:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>>>>>
>>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: New node has high network and disk usage.

Posted by Kai Wang <de...@gmail.com>.
James,

I may miss something. You mentioned your cluster had RF=3. Then why does
"nodetool status" show each node owns 1/3 of the data especially after a
full repair?

On Thu, Jan 14, 2016 at 9:56 AM, James Griffin <
james.griffin@idioplatform.com> wrote:

> Hi Kai,
>
> Below - nothing going on that I can see
>
> $ nodetool netstats
> Mode: NORMAL
> Not sending any streams.
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0           6326
> Responses                       n/a         0         219356
>
>
>
> Best wishes,
>
> Griff
>
> [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
> CTO
> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 | Twitter:
> @imaginaryroots <http://twitter.com/imaginaryroots> | Skype: j.s.griffin
> idio helps major brands and publishers to build closer relationships with
> their customers and prospects by learning from their content consumption
> and acting on that insight. We call it Content Intelligence, and it
> integrates with your existing marketing technology to provide detailed
> customer interest profiles in real-time across all channels, and to
> personalize content into every channel for every customer. See
> http://idioplatform.com
> <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F> for
> more information.
>
> On 14 January 2016 at 14:22, Kai Wang <de...@gmail.com> wrote:
>
>> James,
>>
>> Can you post the result of "nodetool netstats" on the bad node?
>>
>> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
>> james.griffin@idioplatform.com> wrote:
>>
>>> A summary of what we've done this morning:
>>>
>>>    - Noted that there are no GCInspector lines in system.log on bad
>>>    node (there are GCInspector logs on other healthy nodes)
>>>    - Turned on GC logging, noted that we had logs which stated out
>>>    total time for which application threads were stopped was high - ~10s.
>>>    - Not seeing failures or any kind (promotion or concurrent mark)
>>>    - Attached Visual VM: noted that heap usage was very low (~5% usage
>>>    and stable) and it didn't display hallmarks GC of activity. PermGen also
>>>    very stable
>>>    - Downloaded GC logs and examined in GC Viewer. Noted that:
>>>    - We had lots of pauses (again around 10s), but no full GC.
>>>       - From a 2,300s sample, just over 2,000s were spent with threads
>>>       paused
>>>       - Spotted many small GCs in the new space - realised that Xmn
>>>       value was very low (200M against a heap size of 3750M). Increased Xmn to
>>>       937M - no change in server behaviour (high load, high reads/s on disk, high
>>>       CPU wait)
>>>
>>> Current output of jstat:
>>>
>>>   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>>> 2 0.00  45.20  12.82  26.84  76.21   2333   63.684     2    0.039
>>> 63.724
>>> 3 63.58   0.00  33.68   8.04  75.19     14    1.812     2    0.103
>>>  1.915
>>>
>>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than
>>> 2 (which has normal load statistics).
>>>
>>> Anywhere else you can recommend we look?
>>>
>>> Griff
>>>
>>> On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in>
>>> wrote:
>>>
>>>> Ok. I saw dropped mutations on your cluster and full gc is a common
>>>> cause for that.
>>>> Can you just search the word GCInspector in system.log and share the
>>>> frequency of minor and full gc. Moreover, are you printing promotion
>>>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>>>> or concurrent mode failures?
>>>>
>>>> If you are on CMS, you need to fine tune your heap options to address
>>>> full gc.
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Anuj
>>>>
>>>> Sent from Yahoo Mail on Android
>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>
>>>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>>>> <ja...@idioplatform.com> wrote:
>>>> I think I was incorrect in assuming GC wasn't an issue due to the lack
>>>> of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>>>> differences, though
>>>> comparing the startup flags on the two machines show the GC config is
>>>> identical.:
>>>>
>>>> $ jstat -gcutil
>>>>    S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>>>> 2  5.08   0.00  55.72  18.24  59.90  25986  619.827    28    1.597
>>>>  621.424
>>>> 3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
>>>> 11283.361
>>>>
>>>> Here's typical output for iostat on nodes 2 & 3 as well:
>>>>
>>>> $ iostat -dmx md0
>>>>
>>>>   Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>> 2 md0               0.00     0.00  339.00    0.00     9.77     0.00
>>>>  59.00     0.00    0.00    0.00    0.00   0.00   0.00
>>>> 3 md0               0.00     0.00 2069.00    1.00    85.85     0.00
>>>>  84.94     0.00    0.00    0.00    0.00   0.00   0.00
>>>>
>>>> Griff
>>>>
>>>> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in>
>>>> wrote:
>>>>
>>>>> Node 2 has slightly higher data but that should be ok. Not sure how
>>>>> read ops are so high when no IO intensive activity such as repair and
>>>>> compaction is running on node 3.May be you can try investigating logs to
>>>>> see whats happening.
>>>>>
>>>>> Others on the mailing list could also share their views on the
>>>>> situation.
>>>>>
>>>>> Thanks
>>>>> Anuj
>>>>>
>>>>>
>>>>>
>>>>> Sent from Yahoo Mail on Android
>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>
>>>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>>>> <ja...@idioplatform.com> wrote:
>>>>> Hi Anuj,
>>>>>
>>>>> Below is the output of nodetool status. The nodes were replaced
>>>>> following the instructions in Datastax documentation for replacing running
>>>>> nodes since the nodes were running fine, it was that the servers had been
>>>>> incorrectly initialised and they thus had less disk space. The status below
>>>>> shows 2 has significantly higher load, however as I say 2 is operating
>>>>> normally and is running compactions, so I guess that's not an issue?
>>>>>
>>>>> Datacenter: datacenter1
>>>>> =======================
>>>>> Status=Up/Down
>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>> --  Address         Load       Tokens  Owns   Host ID
>>>>>               Rack
>>>>> UN  1               253.59 GB  256     31.7%
>>>>>  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>> UN  2               302.23 GB  256     35.3%
>>>>>  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>> UN  3               265.02 GB  256     33.1%
>>>>>  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
>>>>>
>>>>> Griff
>>>>>
>>>>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Revisiting the thread I can see that nodetool status had both good
>>>>>> and bad nodes at same time. How do you replace nodes? When you say bad
>>>>>> node..I understand that the node is no more usable even though Cassandra is
>>>>>> UP? Is that correct?
>>>>>>
>>>>>> If a node is in bad shape and not working, adding new node may
>>>>>> trigger streaming huge data from bad node too. Have you considered using
>>>>>> the procedure for replacing a dead node?
>>>>>>
>>>>>> Please share Latest nodetool status.
>>>>>>
>>>>>> nodetool output shared earlier:
>>>>>>
>>>>>>  `nodetool status` output:
>>>>>>
>>>>>>     Status=Up/Down
>>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>>> ID                               Rack
>>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Anuj
>>>>>>
>>>>>> Sent from Yahoo Mail on Android
>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>
>>>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>>>>> <ja...@idioplatform.com> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> We’ve spent a few days running things but are in the same position.
>>>>>> To add some more flavour:
>>>>>>
>>>>>>
>>>>>>    - We have a 3-node ring, replication factor = 3. We’ve been
>>>>>>    running in this configuration for a few years without any real issues
>>>>>>    - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>>>>    brought in to replace two other nodes which had failed RAID0 configuration
>>>>>>    and thus were lacking in disk space.
>>>>>>    - When node 2 was brought into the ring, it exhibited high CPU
>>>>>>    wait, IO and load metrics
>>>>>>    - We subsequently brought 3 into the ring: as soon as 3 was fully
>>>>>>    bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>>>>    levels. Those same stats on 3, however, sky-rocketed
>>>>>>    - We’ve confirmed configuration across all three nodes are
>>>>>>    identical and in line with the recommended production settings
>>>>>>    - We’ve run a full repair
>>>>>>    - Node 2 is currently running compactions, 1 & 3 aren’t and have
>>>>>>    no pending
>>>>>>    - There is no GC happening from what I can see. Node 1 has a GC
>>>>>>    log, but that’s not been written to since May last year
>>>>>>
>>>>>>
>>>>>> What we’re seeing at the moment is similar and normal stats on nodes
>>>>>> 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>>>>
>>>>>>
>>>>>>    1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>>>>    2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>>>>    3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>>>>
>>>>>>
>>>>>> Can you recommend any next steps?
>>>>>>
>>>>>> Griff
>>>>>>
>>>>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Vickrum,
>>>>>>>
>>>>>>> I would have proceeded with diagnosis as follows:
>>>>>>>
>>>>>>> 1. Analysis of sar report to check system health -cpu memory swap
>>>>>>> disk etc.
>>>>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>>>>
>>>>>>> 2. Make sure that  all recommended Cassandra production settings
>>>>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>>>>
>>>>>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>>>>>> maximum token range but has significant lower data.I doubt that
>>>>>>> bootstrapping happened properly.
>>>>>>>
>>>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>>>>
>>>>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>>>>
>>>>>>> 6. Look for other common performance problems such as GC pauses to
>>>>>>> make sure that dropped mutations are not caused by GC pauses.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Anuj
>>>>>>>
>>>>>>> Sent from Yahoo Mail on Android
>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>
>>>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>>>>> <vi...@idioplatform.com> wrote:
>>>>>>> # nodetool compactionstats
>>>>>>> pending tasks: 22
>>>>>>>           compaction type        keyspace           table
>>>>>>> completed           total      unit  progress
>>>>>>>                Compactionproduction_analytics    interactions
>>>>>>> 240410213    161172668724     bytes     0.15%
>>>>>>>
>>>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>>>>> 120815385       226295183     bytes    53.39%
>>>>>>> Active compaction remaining time :   2h39m58s
>>>>>>>
>>>>>>> Worth mentioning that compactions haven't been running on this node
>>>>>>> particularly often. The node's been performing badly regardless of whether
>>>>>>> it's compacting or not.
>>>>>>>
>>>>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> What’s your output of `nodetool compactionstats`?
>>>>>>>>
>>>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <
>>>>>>>> vickrum.loi@idioplatform.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We recently added a new node to our cluster in order to replace a
>>>>>>>> node that died (hardware failure we believe). For the next two weeks it had
>>>>>>>> high disk and network activity. We replaced the server, but it's happened
>>>>>>>> again. We've looked into memory allowances, disk performance, number of
>>>>>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>>>>>> issue.
>>>>>>>>
>>>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>>>>>>> comparison to the rest of the cluster, but that's likely a symptom, not a
>>>>>>>> cause.
>>>>>>>>
>>>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The
>>>>>>>> bad node (D) has less data.
>>>>>>>>
>>>>>>>> Disk Activity[2] and Network activity[3] on this node is far higher
>>>>>>>> than the rest.
>>>>>>>>
>>>>>>>> The only other difference this node has to the rest of the cluster
>>>>>>>> is that its on the ext4 filesystem, whereas the rest are ext3, but we've
>>>>>>>> done plenty of testing there and can't see how that would affect
>>>>>>>> performance on this node so much.
>>>>>>>>
>>>>>>>> Nothing of note in system.log.
>>>>>>>>
>>>>>>>> What should our next step be in trying to diagnose this issue?
>>>>>>>>
>>>>>>>> Best wishes,
>>>>>>>> Vic
>>>>>>>>
>>>>>>>> [0] `nodetool tpstats` output:
>>>>>>>>
>>>>>>>> Good node:
>>>>>>>>     Pool Name                    Active   Pending      Completed
>>>>>>>> Blocked  All time blocked
>>>>>>>>     ReadStage                         0         0
>>>>>>>> 46311521         0                 0
>>>>>>>>     RequestResponseStage              0         0
>>>>>>>> 23817366         0                 0
>>>>>>>>     MutationStage                     0         0
>>>>>>>> 47389269         0                 0
>>>>>>>>     ReadRepairStage                   0         0
>>>>>>>> 11108         0                 0
>>>>>>>>     ReplicateOnWriteStage             0         0
>>>>>>>> 0         0                 0
>>>>>>>>     GossipStage                       0         0
>>>>>>>> 5259908         0                 0
>>>>>>>>     CacheCleanupExecutor              0         0
>>>>>>>> 0         0                 0
>>>>>>>>     MigrationStage                    0         0
>>>>>>>> 30         0                 0
>>>>>>>>     MemoryMeter                       0         0
>>>>>>>> 16563         0                 0
>>>>>>>>     FlushWriter                       0         0
>>>>>>>> 39637         0                26
>>>>>>>>     ValidationExecutor                0         0
>>>>>>>> 19013         0                 0
>>>>>>>>     InternalResponseStage             0         0
>>>>>>>> 9         0                 0
>>>>>>>>     AntiEntropyStage                  0         0
>>>>>>>> 38026         0                 0
>>>>>>>>     MemtablePostFlusher               0         0
>>>>>>>> 81740         0                 0
>>>>>>>>     MiscStage                         0         0
>>>>>>>> 19196         0                 0
>>>>>>>>     PendingRangeCalculator            0         0
>>>>>>>> 23         0                 0
>>>>>>>>     CompactionExecutor                0         0
>>>>>>>> 61629         0                 0
>>>>>>>>     commitlog_archiver                0         0
>>>>>>>> 0         0                 0
>>>>>>>>     HintedHandoff                     0         0
>>>>>>>> 63         0                 0
>>>>>>>>
>>>>>>>>     Message type           Dropped
>>>>>>>>     RANGE_SLICE                  0
>>>>>>>>     READ_REPAIR                  0
>>>>>>>>     PAGED_RANGE                  0
>>>>>>>>     BINARY                       0
>>>>>>>>     READ                       640
>>>>>>>>     MUTATION                     0
>>>>>>>>     _TRACE                       0
>>>>>>>>     REQUEST_RESPONSE             0
>>>>>>>>     COUNTER_MUTATION             0
>>>>>>>>
>>>>>>>> Bad node:
>>>>>>>>     Pool Name                    Active   Pending      Completed
>>>>>>>> Blocked  All time blocked
>>>>>>>>     ReadStage                        32       113
>>>>>>>> 52216         0                 0
>>>>>>>>     RequestResponseStage              0         0
>>>>>>>> 4167         0                 0
>>>>>>>>     MutationStage                     0         0
>>>>>>>> 127559         0                 0
>>>>>>>>     ReadRepairStage                   0         0
>>>>>>>> 125         0                 0
>>>>>>>>     ReplicateOnWriteStage             0         0
>>>>>>>> 0         0                 0
>>>>>>>>     GossipStage                       0         0
>>>>>>>> 9965         0                 0
>>>>>>>>     CacheCleanupExecutor              0         0
>>>>>>>> 0         0                 0
>>>>>>>>     MigrationStage                    0         0
>>>>>>>> 0         0                 0
>>>>>>>>     MemoryMeter                       0         0
>>>>>>>> 24         0                 0
>>>>>>>>     FlushWriter                       0         0
>>>>>>>> 27         0                 1
>>>>>>>>     ValidationExecutor                0         0
>>>>>>>> 0         0                 0
>>>>>>>>     InternalResponseStage             0         0
>>>>>>>> 0         0                 0
>>>>>>>>     AntiEntropyStage                  0         0
>>>>>>>> 0         0                 0
>>>>>>>>     MemtablePostFlusher               0         0
>>>>>>>> 96         0                 0
>>>>>>>>     MiscStage                         0         0
>>>>>>>> 0         0                 0
>>>>>>>>     PendingRangeCalculator            0         0
>>>>>>>> 10         0                 0
>>>>>>>>     CompactionExecutor                1         1
>>>>>>>> 73         0                 0
>>>>>>>>     commitlog_archiver                0         0
>>>>>>>> 0         0                 0
>>>>>>>>     HintedHandoff                     0         0
>>>>>>>> 15         0                 0
>>>>>>>>
>>>>>>>>     Message type           Dropped
>>>>>>>>     RANGE_SLICE                130
>>>>>>>>     READ_REPAIR                  1
>>>>>>>>     PAGED_RANGE                  0
>>>>>>>>     BINARY                       0
>>>>>>>>     READ                     31032
>>>>>>>>     MUTATION                   865
>>>>>>>>     _TRACE                       0
>>>>>>>>     REQUEST_RESPONSE             7
>>>>>>>>     COUNTER_MUTATION             0
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] `nodetool status` output:
>>>>>>>>
>>>>>>>>     Status=Up/Down
>>>>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>>>>> ID                               Rack
>>>>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>>>
>>>>>>>> [2] Disk read/write ops:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>>>>
>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>>>>
>>>>>>>> [3] Network in/out:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>>>>
>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: New node has high network and disk usage.

Posted by James Griffin <ja...@idioplatform.com>.
Hi Kai,

Below - nothing going on that I can see

$ nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a         0           6326
Responses                       n/a         0         219356



Best wishes,

Griff

[image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
CTO
Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 | Twitter:
@imaginaryroots <http://twitter.com/imaginaryroots> | Skype: j.s.griffin
idio helps major brands and publishers to build closer relationships with
their customers and prospects by learning from their content consumption
and acting on that insight. We call it Content Intelligence, and it
integrates with your existing marketing technology to provide detailed
customer interest profiles in real-time across all channels, and to
personalize content into every channel for every customer. See
http://idioplatform.com
<https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F>
for
more information.

On 14 January 2016 at 14:22, Kai Wang <de...@gmail.com> wrote:

> James,
>
> Can you post the result of "nodetool netstats" on the bad node?
>
> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
> james.griffin@idioplatform.com> wrote:
>
>> A summary of what we've done this morning:
>>
>>    - Noted that there are no GCInspector lines in system.log on bad node
>>    (there are GCInspector logs on other healthy nodes)
>>    - Turned on GC logging, noted that we had logs which stated out total
>>    time for which application threads were stopped was high - ~10s.
>>    - Not seeing failures or any kind (promotion or concurrent mark)
>>    - Attached Visual VM: noted that heap usage was very low (~5% usage
>>    and stable) and it didn't display hallmarks GC of activity. PermGen also
>>    very stable
>>    - Downloaded GC logs and examined in GC Viewer. Noted that:
>>    - We had lots of pauses (again around 10s), but no full GC.
>>       - From a 2,300s sample, just over 2,000s were spent with threads
>>       paused
>>       - Spotted many small GCs in the new space - realised that Xmn
>>       value was very low (200M against a heap size of 3750M). Increased Xmn to
>>       937M - no change in server behaviour (high load, high reads/s on disk, high
>>       CPU wait)
>>
>> Current output of jstat:
>>
>>   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>> 2 0.00  45.20  12.82  26.84  76.21   2333   63.684     2    0.039   63.724
>> 3 63.58   0.00  33.68   8.04  75.19     14    1.812     2    0.103
>>  1.915
>>
>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than
>> 2 (which has normal load statistics).
>>
>> Anywhere else you can recommend we look?
>>
>> Griff
>>
>> On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in> wrote:
>>
>>> Ok. I saw dropped mutations on your cluster and full gc is a common
>>> cause for that.
>>> Can you just search the word GCInspector in system.log and share the
>>> frequency of minor and full gc. Moreover, are you printing promotion
>>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>>> or concurrent mode failures?
>>>
>>> If you are on CMS, you need to fine tune your heap options to address
>>> full gc.
>>>
>>>
>>>
>>> Thanks
>>> Anuj
>>>
>>> Sent from Yahoo Mail on Android
>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>
>>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>>> <ja...@idioplatform.com> wrote:
>>> I think I was incorrect in assuming GC wasn't an issue due to the lack
>>> of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>>> differences, though
>>> comparing the startup flags on the two machines show the GC config is
>>> identical.:
>>>
>>> $ jstat -gcutil
>>>    S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>>> 2  5.08   0.00  55.72  18.24  59.90  25986  619.827    28    1.597
>>>  621.424
>>> 3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
>>> 11283.361
>>>
>>> Here's typical output for iostat on nodes 2 & 3 as well:
>>>
>>> $ iostat -dmx md0
>>>
>>>   Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>> 2 md0               0.00     0.00  339.00    0.00     9.77     0.00
>>>  59.00     0.00    0.00    0.00    0.00   0.00   0.00
>>> 3 md0               0.00     0.00 2069.00    1.00    85.85     0.00
>>>  84.94     0.00    0.00    0.00    0.00   0.00   0.00
>>>
>>> Griff
>>>
>>> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in>
>>> wrote:
>>>
>>>> Node 2 has slightly higher data but that should be ok. Not sure how
>>>> read ops are so high when no IO intensive activity such as repair and
>>>> compaction is running on node 3.May be you can try investigating logs to
>>>> see whats happening.
>>>>
>>>> Others on the mailing list could also share their views on the
>>>> situation.
>>>>
>>>> Thanks
>>>> Anuj
>>>>
>>>>
>>>>
>>>> Sent from Yahoo Mail on Android
>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>
>>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>>> <ja...@idioplatform.com> wrote:
>>>> Hi Anuj,
>>>>
>>>> Below is the output of nodetool status. The nodes were replaced
>>>> following the instructions in Datastax documentation for replacing running
>>>> nodes since the nodes were running fine, it was that the servers had been
>>>> incorrectly initialised and they thus had less disk space. The status below
>>>> shows 2 has significantly higher load, however as I say 2 is operating
>>>> normally and is running compactions, so I guess that's not an issue?
>>>>
>>>> Datacenter: datacenter1
>>>> =======================
>>>> Status=Up/Down
>>>> |/ State=Normal/Leaving/Joining/Moving
>>>> --  Address         Load       Tokens  Owns   Host ID
>>>>             Rack
>>>> UN  1               253.59 GB  256     31.7%
>>>>  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>> UN  2               302.23 GB  256     35.3%
>>>>  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>> UN  3               265.02 GB  256     33.1%
>>>>  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
>>>>
>>>> Griff
>>>>
>>>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Revisiting the thread I can see that nodetool status had both good and
>>>>> bad nodes at same time. How do you replace nodes? When you say bad node..I
>>>>> understand that the node is no more usable even though Cassandra is UP? Is
>>>>> that correct?
>>>>>
>>>>> If a node is in bad shape and not working, adding new node may trigger
>>>>> streaming huge data from bad node too. Have you considered using the
>>>>> procedure for replacing a dead node?
>>>>>
>>>>> Please share Latest nodetool status.
>>>>>
>>>>> nodetool output shared earlier:
>>>>>
>>>>>  `nodetool status` output:
>>>>>
>>>>>     Status=Up/Down
>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>> ID                               Rack
>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>> Anuj
>>>>>
>>>>> Sent from Yahoo Mail on Android
>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>
>>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>>>> <ja...@idioplatform.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> We’ve spent a few days running things but are in the same position. To
>>>>> add some more flavour:
>>>>>
>>>>>
>>>>>    - We have a 3-node ring, replication factor = 3. We’ve been
>>>>>    running in this configuration for a few years without any real issues
>>>>>    - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>>>    brought in to replace two other nodes which had failed RAID0 configuration
>>>>>    and thus were lacking in disk space.
>>>>>    - When node 2 was brought into the ring, it exhibited high CPU
>>>>>    wait, IO and load metrics
>>>>>    - We subsequently brought 3 into the ring: as soon as 3 was fully
>>>>>    bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>>>    levels. Those same stats on 3, however, sky-rocketed
>>>>>    - We’ve confirmed configuration across all three nodes are
>>>>>    identical and in line with the recommended production settings
>>>>>    - We’ve run a full repair
>>>>>    - Node 2 is currently running compactions, 1 & 3 aren’t and have
>>>>>    no pending
>>>>>    - There is no GC happening from what I can see. Node 1 has a GC
>>>>>    log, but that’s not been written to since May last year
>>>>>
>>>>>
>>>>> What we’re seeing at the moment is similar and normal stats on nodes 1
>>>>> & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>>>
>>>>>
>>>>>    1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>>>    2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>>>    3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>>>
>>>>>
>>>>> Can you recommend any next steps?
>>>>>
>>>>> Griff
>>>>>
>>>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in>
>>>>> wrote:
>>>>>
>>>>>> Hi Vickrum,
>>>>>>
>>>>>> I would have proceeded with diagnosis as follows:
>>>>>>
>>>>>> 1. Analysis of sar report to check system health -cpu memory swap
>>>>>> disk etc.
>>>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>>>
>>>>>> 2. Make sure that  all recommended Cassandra production settings
>>>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>>>
>>>>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>>>>> maximum token range but has significant lower data.I doubt that
>>>>>> bootstrapping happened properly.
>>>>>>
>>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>>>
>>>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>>>
>>>>>> 6. Look for other common performance problems such as GC pauses to
>>>>>> make sure that dropped mutations are not caused by GC pauses.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Anuj
>>>>>>
>>>>>> Sent from Yahoo Mail on Android
>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>
>>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>>>> <vi...@idioplatform.com> wrote:
>>>>>> # nodetool compactionstats
>>>>>> pending tasks: 22
>>>>>>           compaction type        keyspace           table
>>>>>> completed           total      unit  progress
>>>>>>                Compactionproduction_analytics    interactions
>>>>>> 240410213    161172668724     bytes     0.15%
>>>>>>
>>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>>>> 120815385       226295183     bytes    53.39%
>>>>>> Active compaction remaining time :   2h39m58s
>>>>>>
>>>>>> Worth mentioning that compactions haven't been running on this node
>>>>>> particularly often. The node's been performing badly regardless of whether
>>>>>> it's compacting or not.
>>>>>>
>>>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
>>>>>>
>>>>>>> What’s your output of `nodetool compactionstats`?
>>>>>>>
>>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <
>>>>>>> vickrum.loi@idioplatform.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We recently added a new node to our cluster in order to replace a
>>>>>>> node that died (hardware failure we believe). For the next two weeks it had
>>>>>>> high disk and network activity. We replaced the server, but it's happened
>>>>>>> again. We've looked into memory allowances, disk performance, number of
>>>>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>>>>> issue.
>>>>>>>
>>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>>>>>> comparison to the rest of the cluster, but that's likely a symptom, not a
>>>>>>> cause.
>>>>>>>
>>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad
>>>>>>> node (D) has less data.
>>>>>>>
>>>>>>> Disk Activity[2] and Network activity[3] on this node is far higher
>>>>>>> than the rest.
>>>>>>>
>>>>>>> The only other difference this node has to the rest of the cluster
>>>>>>> is that its on the ext4 filesystem, whereas the rest are ext3, but we've
>>>>>>> done plenty of testing there and can't see how that would affect
>>>>>>> performance on this node so much.
>>>>>>>
>>>>>>> Nothing of note in system.log.
>>>>>>>
>>>>>>> What should our next step be in trying to diagnose this issue?
>>>>>>>
>>>>>>> Best wishes,
>>>>>>> Vic
>>>>>>>
>>>>>>> [0] `nodetool tpstats` output:
>>>>>>>
>>>>>>> Good node:
>>>>>>>     Pool Name                    Active   Pending      Completed
>>>>>>> Blocked  All time blocked
>>>>>>>     ReadStage                         0         0
>>>>>>> 46311521         0                 0
>>>>>>>     RequestResponseStage              0         0
>>>>>>> 23817366         0                 0
>>>>>>>     MutationStage                     0         0
>>>>>>> 47389269         0                 0
>>>>>>>     ReadRepairStage                   0         0
>>>>>>> 11108         0                 0
>>>>>>>     ReplicateOnWriteStage             0         0
>>>>>>> 0         0                 0
>>>>>>>     GossipStage                       0         0
>>>>>>> 5259908         0                 0
>>>>>>>     CacheCleanupExecutor              0         0
>>>>>>> 0         0                 0
>>>>>>>     MigrationStage                    0         0
>>>>>>> 30         0                 0
>>>>>>>     MemoryMeter                       0         0
>>>>>>> 16563         0                 0
>>>>>>>     FlushWriter                       0         0
>>>>>>> 39637         0                26
>>>>>>>     ValidationExecutor                0         0
>>>>>>> 19013         0                 0
>>>>>>>     InternalResponseStage             0         0
>>>>>>> 9         0                 0
>>>>>>>     AntiEntropyStage                  0         0
>>>>>>> 38026         0                 0
>>>>>>>     MemtablePostFlusher               0         0
>>>>>>> 81740         0                 0
>>>>>>>     MiscStage                         0         0
>>>>>>> 19196         0                 0
>>>>>>>     PendingRangeCalculator            0         0
>>>>>>> 23         0                 0
>>>>>>>     CompactionExecutor                0         0
>>>>>>> 61629         0                 0
>>>>>>>     commitlog_archiver                0         0
>>>>>>> 0         0                 0
>>>>>>>     HintedHandoff                     0         0
>>>>>>> 63         0                 0
>>>>>>>
>>>>>>>     Message type           Dropped
>>>>>>>     RANGE_SLICE                  0
>>>>>>>     READ_REPAIR                  0
>>>>>>>     PAGED_RANGE                  0
>>>>>>>     BINARY                       0
>>>>>>>     READ                       640
>>>>>>>     MUTATION                     0
>>>>>>>     _TRACE                       0
>>>>>>>     REQUEST_RESPONSE             0
>>>>>>>     COUNTER_MUTATION             0
>>>>>>>
>>>>>>> Bad node:
>>>>>>>     Pool Name                    Active   Pending      Completed
>>>>>>> Blocked  All time blocked
>>>>>>>     ReadStage                        32       113
>>>>>>> 52216         0                 0
>>>>>>>     RequestResponseStage              0         0
>>>>>>> 4167         0                 0
>>>>>>>     MutationStage                     0         0
>>>>>>> 127559         0                 0
>>>>>>>     ReadRepairStage                   0         0
>>>>>>> 125         0                 0
>>>>>>>     ReplicateOnWriteStage             0         0
>>>>>>> 0         0                 0
>>>>>>>     GossipStage                       0         0
>>>>>>> 9965         0                 0
>>>>>>>     CacheCleanupExecutor              0         0
>>>>>>> 0         0                 0
>>>>>>>     MigrationStage                    0         0
>>>>>>> 0         0                 0
>>>>>>>     MemoryMeter                       0         0
>>>>>>> 24         0                 0
>>>>>>>     FlushWriter                       0         0
>>>>>>> 27         0                 1
>>>>>>>     ValidationExecutor                0         0
>>>>>>> 0         0                 0
>>>>>>>     InternalResponseStage             0         0
>>>>>>> 0         0                 0
>>>>>>>     AntiEntropyStage                  0         0
>>>>>>> 0         0                 0
>>>>>>>     MemtablePostFlusher               0         0
>>>>>>> 96         0                 0
>>>>>>>     MiscStage                         0         0
>>>>>>> 0         0                 0
>>>>>>>     PendingRangeCalculator            0         0
>>>>>>> 10         0                 0
>>>>>>>     CompactionExecutor                1         1
>>>>>>> 73         0                 0
>>>>>>>     commitlog_archiver                0         0
>>>>>>> 0         0                 0
>>>>>>>     HintedHandoff                     0         0
>>>>>>> 15         0                 0
>>>>>>>
>>>>>>>     Message type           Dropped
>>>>>>>     RANGE_SLICE                130
>>>>>>>     READ_REPAIR                  1
>>>>>>>     PAGED_RANGE                  0
>>>>>>>     BINARY                       0
>>>>>>>     READ                     31032
>>>>>>>     MUTATION                   865
>>>>>>>     _TRACE                       0
>>>>>>>     REQUEST_RESPONSE             7
>>>>>>>     COUNTER_MUTATION             0
>>>>>>>
>>>>>>>
>>>>>>> [1] `nodetool status` output:
>>>>>>>
>>>>>>>     Status=Up/Down
>>>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>>>> ID                               Rack
>>>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>>
>>>>>>> [2] Disk read/write ops:
>>>>>>>
>>>>>>>
>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>>>
>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>>>
>>>>>>> [3] Network in/out:
>>>>>>>
>>>>>>>
>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>>>
>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: New node has high network and disk usage.

Posted by Kai Wang <de...@gmail.com>.
James,

Can you post the result of "nodetool netstats" on the bad node?

On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
james.griffin@idioplatform.com> wrote:

> A summary of what we've done this morning:
>
>    - Noted that there are no GCInspector lines in system.log on bad node
>    (there are GCInspector logs on other healthy nodes)
>    - Turned on GC logging, noted that we had logs which stated out total
>    time for which application threads were stopped was high - ~10s.
>    - Not seeing failures or any kind (promotion or concurrent mark)
>    - Attached Visual VM: noted that heap usage was very low (~5% usage
>    and stable) and it didn't display hallmarks GC of activity. PermGen also
>    very stable
>    - Downloaded GC logs and examined in GC Viewer. Noted that:
>    - We had lots of pauses (again around 10s), but no full GC.
>       - From a 2,300s sample, just over 2,000s were spent with threads
>       paused
>       - Spotted many small GCs in the new space - realised that Xmn value
>       was very low (200M against a heap size of 3750M). Increased Xmn to 937M -
>       no change in server behaviour (high load, high reads/s on disk, high CPU
>       wait)
>
> Current output of jstat:
>
>   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
> 2 0.00  45.20  12.82  26.84  76.21   2333   63.684     2    0.039   63.724
> 3 63.58   0.00  33.68   8.04  75.19     14    1.812     2    0.103    1.915
>
> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than 2
> (which has normal load statistics).
>
> Anywhere else you can recommend we look?
>
> Griff
>
> On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in> wrote:
>
>> Ok. I saw dropped mutations on your cluster and full gc is a common cause
>> for that.
>> Can you just search the word GCInspector in system.log and share the
>> frequency of minor and full gc. Moreover, are you printing promotion
>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>> or concurrent mode failures?
>>
>> If you are on CMS, you need to fine tune your heap options to address
>> full gc.
>>
>>
>>
>> Thanks
>> Anuj
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>
>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>> <ja...@idioplatform.com> wrote:
>> I think I was incorrect in assuming GC wasn't an issue due to the lack of
>> logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>> differences, though
>> comparing the startup flags on the two machines show the GC config is
>> identical.:
>>
>> $ jstat -gcutil
>>    S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>> 2  5.08   0.00  55.72  18.24  59.90  25986  619.827    28    1.597
>>  621.424
>> 3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
>> 11283.361
>>
>> Here's typical output for iostat on nodes 2 & 3 as well:
>>
>> $ iostat -dmx md0
>>
>>   Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>> 2 md0               0.00     0.00  339.00    0.00     9.77     0.00
>>  59.00     0.00    0.00    0.00    0.00   0.00   0.00
>> 3 md0               0.00     0.00 2069.00    1.00    85.85     0.00
>>  84.94     0.00    0.00    0.00    0.00   0.00   0.00
>>
>> Griff
>>
>> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in> wrote:
>>
>>> Node 2 has slightly higher data but that should be ok. Not sure how read
>>> ops are so high when no IO intensive activity such as repair and compaction
>>> is running on node 3.May be you can try investigating logs to see whats
>>> happening.
>>>
>>> Others on the mailing list could also share their views on the situation.
>>>
>>> Thanks
>>> Anuj
>>>
>>>
>>>
>>> Sent from Yahoo Mail on Android
>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>
>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>> <ja...@idioplatform.com> wrote:
>>> Hi Anuj,
>>>
>>> Below is the output of nodetool status. The nodes were replaced
>>> following the instructions in Datastax documentation for replacing running
>>> nodes since the nodes were running fine, it was that the servers had been
>>> incorrectly initialised and they thus had less disk space. The status below
>>> shows 2 has significantly higher load, however as I say 2 is operating
>>> normally and is running compactions, so I guess that's not an issue?
>>>
>>> Datacenter: datacenter1
>>> =======================
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address         Load       Tokens  Owns   Host ID
>>>             Rack
>>> UN  1               253.59 GB  256     31.7%
>>>  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>> UN  2               302.23 GB  256     35.3%
>>>  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>> UN  3               265.02 GB  256     33.1%
>>>  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
>>>
>>> Griff
>>>
>>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Revisiting the thread I can see that nodetool status had both good and
>>>> bad nodes at same time. How do you replace nodes? When you say bad node..I
>>>> understand that the node is no more usable even though Cassandra is UP? Is
>>>> that correct?
>>>>
>>>> If a node is in bad shape and not working, adding new node may trigger
>>>> streaming huge data from bad node too. Have you considered using the
>>>> procedure for replacing a dead node?
>>>>
>>>> Please share Latest nodetool status.
>>>>
>>>> nodetool output shared earlier:
>>>>
>>>>  `nodetool status` output:
>>>>
>>>>     Status=Up/Down
>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>     --  Address         Load       Tokens  Owns   Host
>>>> ID                               Rack
>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Anuj
>>>>
>>>> Sent from Yahoo Mail on Android
>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>
>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>>> <ja...@idioplatform.com> wrote:
>>>> Hi all,
>>>>
>>>> We’ve spent a few days running things but are in the same position. To
>>>> add some more flavour:
>>>>
>>>>
>>>>    - We have a 3-node ring, replication factor = 3. We’ve been running
>>>>    in this configuration for a few years without any real issues
>>>>    - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>>    brought in to replace two other nodes which had failed RAID0 configuration
>>>>    and thus were lacking in disk space.
>>>>    - When node 2 was brought into the ring, it exhibited high CPU
>>>>    wait, IO and load metrics
>>>>    - We subsequently brought 3 into the ring: as soon as 3 was fully
>>>>    bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>>    levels. Those same stats on 3, however, sky-rocketed
>>>>    - We’ve confirmed configuration across all three nodes are
>>>>    identical and in line with the recommended production settings
>>>>    - We’ve run a full repair
>>>>    - Node 2 is currently running compactions, 1 & 3 aren’t and have no
>>>>    pending
>>>>    - There is no GC happening from what I can see. Node 1 has a GC
>>>>    log, but that’s not been written to since May last year
>>>>
>>>>
>>>> What we’re seeing at the moment is similar and normal stats on nodes 1
>>>> & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>>
>>>>
>>>>    1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>>    2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>>    3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>>
>>>>
>>>> Can you recommend any next steps?
>>>>
>>>> Griff
>>>>
>>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in>
>>>> wrote:
>>>>
>>>>> Hi Vickrum,
>>>>>
>>>>> I would have proceeded with diagnosis as follows:
>>>>>
>>>>> 1. Analysis of sar report to check system health -cpu memory swap
>>>>> disk etc.
>>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>>
>>>>> 2. Make sure that  all recommended Cassandra production settings
>>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>>
>>>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>>>> maximum token range but has significant lower data.I doubt that
>>>>> bootstrapping happened properly.
>>>>>
>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>>
>>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>>
>>>>> 6. Look for other common performance problems such as GC pauses to
>>>>> make sure that dropped mutations are not caused by GC pauses.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Anuj
>>>>>
>>>>> Sent from Yahoo Mail on Android
>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>
>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>>> <vi...@idioplatform.com> wrote:
>>>>> # nodetool compactionstats
>>>>> pending tasks: 22
>>>>>           compaction type        keyspace           table
>>>>> completed           total      unit  progress
>>>>>                Compactionproduction_analytics    interactions
>>>>> 240410213    161172668724     bytes     0.15%
>>>>>
>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>>> 120815385       226295183     bytes    53.39%
>>>>> Active compaction remaining time :   2h39m58s
>>>>>
>>>>> Worth mentioning that compactions haven't been running on this node
>>>>> particularly often. The node's been performing badly regardless of whether
>>>>> it's compacting or not.
>>>>>
>>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
>>>>>
>>>>>> What’s your output of `nodetool compactionstats`?
>>>>>>
>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We recently added a new node to our cluster in order to replace a
>>>>>> node that died (hardware failure we believe). For the next two weeks it had
>>>>>> high disk and network activity. We replaced the server, but it's happened
>>>>>> again. We've looked into memory allowances, disk performance, number of
>>>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>>>> issue.
>>>>>>
>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>>>>> comparison to the rest of the cluster, but that's likely a symptom, not a
>>>>>> cause.
>>>>>>
>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad
>>>>>> node (D) has less data.
>>>>>>
>>>>>> Disk Activity[2] and Network activity[3] on this node is far higher
>>>>>> than the rest.
>>>>>>
>>>>>> The only other difference this node has to the rest of the cluster is
>>>>>> that its on the ext4 filesystem, whereas the rest are ext3, but we've done
>>>>>> plenty of testing there and can't see how that would affect performance on
>>>>>> this node so much.
>>>>>>
>>>>>> Nothing of note in system.log.
>>>>>>
>>>>>> What should our next step be in trying to diagnose this issue?
>>>>>>
>>>>>> Best wishes,
>>>>>> Vic
>>>>>>
>>>>>> [0] `nodetool tpstats` output:
>>>>>>
>>>>>> Good node:
>>>>>>     Pool Name                    Active   Pending      Completed
>>>>>> Blocked  All time blocked
>>>>>>     ReadStage                         0         0
>>>>>> 46311521         0                 0
>>>>>>     RequestResponseStage              0         0
>>>>>> 23817366         0                 0
>>>>>>     MutationStage                     0         0
>>>>>> 47389269         0                 0
>>>>>>     ReadRepairStage                   0         0
>>>>>> 11108         0                 0
>>>>>>     ReplicateOnWriteStage             0         0
>>>>>> 0         0                 0
>>>>>>     GossipStage                       0         0
>>>>>> 5259908         0                 0
>>>>>>     CacheCleanupExecutor              0         0
>>>>>> 0         0                 0
>>>>>>     MigrationStage                    0         0
>>>>>> 30         0                 0
>>>>>>     MemoryMeter                       0         0
>>>>>> 16563         0                 0
>>>>>>     FlushWriter                       0         0
>>>>>> 39637         0                26
>>>>>>     ValidationExecutor                0         0
>>>>>> 19013         0                 0
>>>>>>     InternalResponseStage             0         0
>>>>>> 9         0                 0
>>>>>>     AntiEntropyStage                  0         0
>>>>>> 38026         0                 0
>>>>>>     MemtablePostFlusher               0         0
>>>>>> 81740         0                 0
>>>>>>     MiscStage                         0         0
>>>>>> 19196         0                 0
>>>>>>     PendingRangeCalculator            0         0
>>>>>> 23         0                 0
>>>>>>     CompactionExecutor                0         0
>>>>>> 61629         0                 0
>>>>>>     commitlog_archiver                0         0
>>>>>> 0         0                 0
>>>>>>     HintedHandoff                     0         0
>>>>>> 63         0                 0
>>>>>>
>>>>>>     Message type           Dropped
>>>>>>     RANGE_SLICE                  0
>>>>>>     READ_REPAIR                  0
>>>>>>     PAGED_RANGE                  0
>>>>>>     BINARY                       0
>>>>>>     READ                       640
>>>>>>     MUTATION                     0
>>>>>>     _TRACE                       0
>>>>>>     REQUEST_RESPONSE             0
>>>>>>     COUNTER_MUTATION             0
>>>>>>
>>>>>> Bad node:
>>>>>>     Pool Name                    Active   Pending      Completed
>>>>>> Blocked  All time blocked
>>>>>>     ReadStage                        32       113
>>>>>> 52216         0                 0
>>>>>>     RequestResponseStage              0         0
>>>>>> 4167         0                 0
>>>>>>     MutationStage                     0         0
>>>>>> 127559         0                 0
>>>>>>     ReadRepairStage                   0         0
>>>>>> 125         0                 0
>>>>>>     ReplicateOnWriteStage             0         0
>>>>>> 0         0                 0
>>>>>>     GossipStage                       0         0
>>>>>> 9965         0                 0
>>>>>>     CacheCleanupExecutor              0         0
>>>>>> 0         0                 0
>>>>>>     MigrationStage                    0         0
>>>>>> 0         0                 0
>>>>>>     MemoryMeter                       0         0
>>>>>> 24         0                 0
>>>>>>     FlushWriter                       0         0
>>>>>> 27         0                 1
>>>>>>     ValidationExecutor                0         0
>>>>>> 0         0                 0
>>>>>>     InternalResponseStage             0         0
>>>>>> 0         0                 0
>>>>>>     AntiEntropyStage                  0         0
>>>>>> 0         0                 0
>>>>>>     MemtablePostFlusher               0         0
>>>>>> 96         0                 0
>>>>>>     MiscStage                         0         0
>>>>>> 0         0                 0
>>>>>>     PendingRangeCalculator            0         0
>>>>>> 10         0                 0
>>>>>>     CompactionExecutor                1         1
>>>>>> 73         0                 0
>>>>>>     commitlog_archiver                0         0
>>>>>> 0         0                 0
>>>>>>     HintedHandoff                     0         0
>>>>>> 15         0                 0
>>>>>>
>>>>>>     Message type           Dropped
>>>>>>     RANGE_SLICE                130
>>>>>>     READ_REPAIR                  1
>>>>>>     PAGED_RANGE                  0
>>>>>>     BINARY                       0
>>>>>>     READ                     31032
>>>>>>     MUTATION                   865
>>>>>>     _TRACE                       0
>>>>>>     REQUEST_RESPONSE             7
>>>>>>     COUNTER_MUTATION             0
>>>>>>
>>>>>>
>>>>>> [1] `nodetool status` output:
>>>>>>
>>>>>>     Status=Up/Down
>>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>>> ID                               Rack
>>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>
>>>>>> [2] Disk read/write ops:
>>>>>>
>>>>>>
>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>>
>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>>
>>>>>> [3] Network in/out:
>>>>>>
>>>>>>
>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>>
>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: New node has high network and disk usage.

Posted by James Griffin <ja...@idioplatform.com>.
A summary of what we've done this morning:

   - Noted that there are no GCInspector lines in system.log on bad node
   (there are GCInspector logs on other healthy nodes)
   - Turned on GC logging, noted that we had logs which stated out total
   time for which application threads were stopped was high - ~10s.
   - Not seeing failures or any kind (promotion or concurrent mark)
   - Attached Visual VM: noted that heap usage was very low (~5% usage and
   stable) and it didn't display hallmarks GC of activity. PermGen also very
   stable
   - Downloaded GC logs and examined in GC Viewer. Noted that:
   - We had lots of pauses (again around 10s), but no full GC.
      - From a 2,300s sample, just over 2,000s were spent with threads
      paused
      - Spotted many small GCs in the new space - realised that Xmn value
      was very low (200M against a heap size of 3750M). Increased Xmn to 937M -
      no change in server behaviour (high load, high reads/s on disk, high CPU
      wait)

Current output of jstat:

  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
2 0.00  45.20  12.82  26.84  76.21   2333   63.684     2    0.039   63.724
3 63.58   0.00  33.68   8.04  75.19     14    1.812     2    0.103    1.915

Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than 2
(which has normal load statistics).

Anywhere else you can recommend we look?

Griff

On 14 January 2016 at 01:25, Anuj Wadehra <an...@yahoo.co.in> wrote:

> Ok. I saw dropped mutations on your cluster and full gc is a common cause
> for that.
> Can you just search the word GCInspector in system.log and share the
> frequency of minor and full gc. Moreover, are you printing promotion
> failures in gc logs?? Why full gc ia getting triggered??promotion failures
> or concurrent mode failures?
>
> If you are on CMS, you need to fine tune your heap options to address full
> gc.
>
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
> <ja...@idioplatform.com> wrote:
> I think I was incorrect in assuming GC wasn't an issue due to the lack of
> logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
> differences, though
> comparing the startup flags on the two machines show the GC config is
> identical.:
>
> $ jstat -gcutil
>    S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
> 2  5.08   0.00  55.72  18.24  59.90  25986  619.827    28    1.597  621.424
> 3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
> 11283.361
>
> Here's typical output for iostat on nodes 2 & 3 as well:
>
> $ iostat -dmx md0
>
>   Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> 2 md0               0.00     0.00  339.00    0.00     9.77     0.00
>  59.00     0.00    0.00    0.00    0.00   0.00   0.00
> 3 md0               0.00     0.00 2069.00    1.00    85.85     0.00
>  84.94     0.00    0.00    0.00    0.00   0.00   0.00
>
> Griff
>
> On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in> wrote:
>
>> Node 2 has slightly higher data but that should be ok. Not sure how read
>> ops are so high when no IO intensive activity such as repair and compaction
>> is running on node 3.May be you can try investigating logs to see whats
>> happening.
>>
>> Others on the mailing list could also share their views on the situation.
>>
>> Thanks
>> Anuj
>>
>>
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>
>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>> <ja...@idioplatform.com> wrote:
>> Hi Anuj,
>>
>> Below is the output of nodetool status. The nodes were replaced following
>> the instructions in Datastax documentation for replacing running nodes
>> since the nodes were running fine, it was that the servers had been
>> incorrectly initialised and they thus had less disk space. The status below
>> shows 2 has significantly higher load, however as I say 2 is operating
>> normally and is running compactions, so I guess that's not an issue?
>>
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load       Tokens  Owns   Host ID
>>           Rack
>> UN  1               253.59 GB  256     31.7%
>>  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>> UN  2               302.23 GB  256     35.3%
>>  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>> UN  3               265.02 GB  256     33.1%
>>  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
>>
>> Griff
>>
>> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in> wrote:
>>
>>> Hi,
>>>
>>> Revisiting the thread I can see that nodetool status had both good and
>>> bad nodes at same time. How do you replace nodes? When you say bad node..I
>>> understand that the node is no more usable even though Cassandra is UP? Is
>>> that correct?
>>>
>>> If a node is in bad shape and not working, adding new node may trigger
>>> streaming huge data from bad node too. Have you considered using the
>>> procedure for replacing a dead node?
>>>
>>> Please share Latest nodetool status.
>>>
>>> nodetool output shared earlier:
>>>
>>>  `nodetool status` output:
>>>
>>>     Status=Up/Down
>>>     |/ State=Normal/Leaving/Joining/Moving
>>>     --  Address         Load       Tokens  Owns   Host
>>> ID                               Rack
>>>     UN  A (Good)        252.37 GB  256     23.0%
>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>     UN  B (Good)        245.91 GB  256     24.4%
>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>     UN  C (Good)        254.79 GB  256     23.7%
>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>
>>>
>>>
>>> Thanks
>>> Anuj
>>>
>>> Sent from Yahoo Mail on Android
>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>
>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>> <ja...@idioplatform.com> wrote:
>>> Hi all,
>>>
>>> We’ve spent a few days running things but are in the same position. To
>>> add some more flavour:
>>>
>>>
>>>    - We have a 3-node ring, replication factor = 3. We’ve been running
>>>    in this configuration for a few years without any real issues
>>>    - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>    brought in to replace two other nodes which had failed RAID0 configuration
>>>    and thus were lacking in disk space.
>>>    - When node 2 was brought into the ring, it exhibited high CPU wait,
>>>    IO and load metrics
>>>    - We subsequently brought 3 into the ring: as soon as 3 was fully
>>>    bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>    levels. Those same stats on 3, however, sky-rocketed
>>>    - We’ve confirmed configuration across all three nodes are identical
>>>    and in line with the recommended production settings
>>>    - We’ve run a full repair
>>>    - Node 2 is currently running compactions, 1 & 3 aren’t and have no
>>>    pending
>>>    - There is no GC happening from what I can see. Node 1 has a GC log,
>>>    but that’s not been written to since May last year
>>>
>>>
>>> What we’re seeing at the moment is similar and normal stats on nodes 1 &
>>> 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>
>>>
>>>    1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>    2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>    3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>
>>>
>>> Can you recommend any next steps?
>>>
>>> Griff
>>>
>>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:
>>>
>>>> Hi Vickrum,
>>>>
>>>> I would have proceeded with diagnosis as follows:
>>>>
>>>> 1. Analysis of sar report to check system health -cpu memory swap disk
>>>> etc.
>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>
>>>> 2. Make sure that  all recommended Cassandra production settings
>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>
>>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>>> maximum token range but has significant lower data.I doubt that
>>>> bootstrapping happened properly.
>>>>
>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>
>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>
>>>> 6. Look for other common performance problems such as GC pauses to make
>>>> sure that dropped mutations are not caused by GC pauses.
>>>>
>>>>
>>>> Thanks
>>>> Anuj
>>>>
>>>> Sent from Yahoo Mail on Android
>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>
>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>> <vi...@idioplatform.com> wrote:
>>>> # nodetool compactionstats
>>>> pending tasks: 22
>>>>           compaction type        keyspace           table
>>>> completed           total      unit  progress
>>>>                Compactionproduction_analytics    interactions
>>>> 240410213    161172668724     bytes     0.15%
>>>>
>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>> 120815385       226295183     bytes    53.39%
>>>> Active compaction remaining time :   2h39m58s
>>>>
>>>> Worth mentioning that compactions haven't been running on this node
>>>> particularly often. The node's been performing badly regardless of whether
>>>> it's compacting or not.
>>>>
>>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
>>>>
>>>>> What’s your output of `nodetool compactionstats`?
>>>>>
>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We recently added a new node to our cluster in order to replace a node
>>>>> that died (hardware failure we believe). For the next two weeks it had high
>>>>> disk and network activity. We replaced the server, but it's happened again.
>>>>> We've looked into memory allowances, disk performance, number of
>>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>>> issue.
>>>>>
>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>>>> comparison to the rest of the cluster, but that's likely a symptom, not a
>>>>> cause.
>>>>>
>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad
>>>>> node (D) has less data.
>>>>>
>>>>> Disk Activity[2] and Network activity[3] on this node is far higher
>>>>> than the rest.
>>>>>
>>>>> The only other difference this node has to the rest of the cluster is
>>>>> that its on the ext4 filesystem, whereas the rest are ext3, but we've done
>>>>> plenty of testing there and can't see how that would affect performance on
>>>>> this node so much.
>>>>>
>>>>> Nothing of note in system.log.
>>>>>
>>>>> What should our next step be in trying to diagnose this issue?
>>>>>
>>>>> Best wishes,
>>>>> Vic
>>>>>
>>>>> [0] `nodetool tpstats` output:
>>>>>
>>>>> Good node:
>>>>>     Pool Name                    Active   Pending      Completed
>>>>> Blocked  All time blocked
>>>>>     ReadStage                         0         0
>>>>> 46311521         0                 0
>>>>>     RequestResponseStage              0         0
>>>>> 23817366         0                 0
>>>>>     MutationStage                     0         0
>>>>> 47389269         0                 0
>>>>>     ReadRepairStage                   0         0
>>>>> 11108         0                 0
>>>>>     ReplicateOnWriteStage             0         0
>>>>> 0         0                 0
>>>>>     GossipStage                       0         0
>>>>> 5259908         0                 0
>>>>>     CacheCleanupExecutor              0         0
>>>>> 0         0                 0
>>>>>     MigrationStage                    0         0
>>>>> 30         0                 0
>>>>>     MemoryMeter                       0         0
>>>>> 16563         0                 0
>>>>>     FlushWriter                       0         0
>>>>> 39637         0                26
>>>>>     ValidationExecutor                0         0
>>>>> 19013         0                 0
>>>>>     InternalResponseStage             0         0
>>>>> 9         0                 0
>>>>>     AntiEntropyStage                  0         0
>>>>> 38026         0                 0
>>>>>     MemtablePostFlusher               0         0
>>>>> 81740         0                 0
>>>>>     MiscStage                         0         0
>>>>> 19196         0                 0
>>>>>     PendingRangeCalculator            0         0
>>>>> 23         0                 0
>>>>>     CompactionExecutor                0         0
>>>>> 61629         0                 0
>>>>>     commitlog_archiver                0         0
>>>>> 0         0                 0
>>>>>     HintedHandoff                     0         0
>>>>> 63         0                 0
>>>>>
>>>>>     Message type           Dropped
>>>>>     RANGE_SLICE                  0
>>>>>     READ_REPAIR                  0
>>>>>     PAGED_RANGE                  0
>>>>>     BINARY                       0
>>>>>     READ                       640
>>>>>     MUTATION                     0
>>>>>     _TRACE                       0
>>>>>     REQUEST_RESPONSE             0
>>>>>     COUNTER_MUTATION             0
>>>>>
>>>>> Bad node:
>>>>>     Pool Name                    Active   Pending      Completed
>>>>> Blocked  All time blocked
>>>>>     ReadStage                        32       113
>>>>> 52216         0                 0
>>>>>     RequestResponseStage              0         0
>>>>> 4167         0                 0
>>>>>     MutationStage                     0         0
>>>>> 127559         0                 0
>>>>>     ReadRepairStage                   0         0
>>>>> 125         0                 0
>>>>>     ReplicateOnWriteStage             0         0
>>>>> 0         0                 0
>>>>>     GossipStage                       0         0
>>>>> 9965         0                 0
>>>>>     CacheCleanupExecutor              0         0
>>>>> 0         0                 0
>>>>>     MigrationStage                    0         0
>>>>> 0         0                 0
>>>>>     MemoryMeter                       0         0
>>>>> 24         0                 0
>>>>>     FlushWriter                       0         0
>>>>> 27         0                 1
>>>>>     ValidationExecutor                0         0
>>>>> 0         0                 0
>>>>>     InternalResponseStage             0         0
>>>>> 0         0                 0
>>>>>     AntiEntropyStage                  0         0
>>>>> 0         0                 0
>>>>>     MemtablePostFlusher               0         0
>>>>> 96         0                 0
>>>>>     MiscStage                         0         0
>>>>> 0         0                 0
>>>>>     PendingRangeCalculator            0         0
>>>>> 10         0                 0
>>>>>     CompactionExecutor                1         1
>>>>> 73         0                 0
>>>>>     commitlog_archiver                0         0
>>>>> 0         0                 0
>>>>>     HintedHandoff                     0         0
>>>>> 15         0                 0
>>>>>
>>>>>     Message type           Dropped
>>>>>     RANGE_SLICE                130
>>>>>     READ_REPAIR                  1
>>>>>     PAGED_RANGE                  0
>>>>>     BINARY                       0
>>>>>     READ                     31032
>>>>>     MUTATION                   865
>>>>>     _TRACE                       0
>>>>>     REQUEST_RESPONSE             7
>>>>>     COUNTER_MUTATION             0
>>>>>
>>>>>
>>>>> [1] `nodetool status` output:
>>>>>
>>>>>     Status=Up/Down
>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>> ID                               Rack
>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>
>>>>> [2] Disk read/write ops:
>>>>>
>>>>>
>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>
>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>
>>>>> [3] Network in/out:
>>>>>
>>>>>
>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>
>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: New node has high network and disk usage.

Posted by Anuj Wadehra <an...@yahoo.co.in>.
Ok. I saw dropped mutations on your cluster and full gc is a common cause for that.Can you just search the word GCInspector in system.log and share the frequency of minor and full gc. Moreover, are you printing promotion failures in gc logs?? Why full gc ia getting triggered??promotion failures or concurrent mode failures?
If you are on CMS, you need to fine tune your heap options to address full gc.


ThanksAnuj
Sent from Yahoo Mail on Android 
 
  On Thu, 14 Jan, 2016 at 12:57 am, James Griffin<ja...@idioplatform.com> wrote:   I think I was incorrect in assuming GC wasn't an issue due to the lack of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked differences, though 
comparing the startup flags on the two machines show the GC config is identical.:
$ jstat -gcutil   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT2  5.08   0.00  55.72  18.24  59.90  25986  619.827    28    1.597  621.4243  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383 11283.361
Here's typical output for iostat on nodes 2 & 3 as well:
$ iostat -dmx md0
  Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util2 md0               0.00     0.00  339.00    0.00     9.77     0.00    59.00     0.00    0.00    0.00    0.00   0.00   0.003 md0               0.00     0.00 2069.00    1.00    85.85     0.00    84.94     0.00    0.00    0.00    0.00   0.00   0.00
Griff

On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in> wrote:

Node 2 has slightly higher data but that should be ok. Not sure how read ops are so high when no IO intensive activity such as repair and compaction is running on node 3.May be you can try investigating logs to see whats happening.
Others on the mailing list could also share their views on the situation.

ThanksAnuj


Sent from Yahoo Mail on Android 
 
 On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin<ja...@idioplatform.com> wrote:   Hi Anuj, 
Below is the output of nodetool status. The nodes were replaced following the instructions in Datastax documentation for replacing running nodes since the nodes were running fine, it was that the servers had been incorrectly initialised and they thus had less disk space. The status below shows 2 has significantly higher load, however as I say 2 is operating normally and is running compactions, so I guess that's not an issue?
Datacenter: datacenter1=======================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--  Address         Load       Tokens  Owns   Host ID                               RackUN  1               253.59 GB  256     31.7%  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1UN  2               302.23 GB  256     35.3%  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1UN  3               265.02 GB  256     33.1%  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
Griff

On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in> wrote:

Hi,
Revisiting the thread I can see that nodetool status had both good and bad nodes at same time. How do you replace nodes? When you say bad node..I understand that the node is no more usable even though Cassandra is UP? Is that correct?
If a node is in bad shape and not working, adding new node may trigger streaming huge data from bad node too. Have you considered using the procedure for replacing a dead node?
Please share Latest nodetool status.
nodetool output shared earlier:
 `nodetool status` output:

    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address         Load       Tokens  Owns   Host ID                               Rack
    UN  A (Good)        252.37 GB  256     23.0%  9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
    UN  B (Good)        245.91 GB  256     24.4%  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
    UN  C (Good)        254.79 GB  256     23.7%  f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
    UN  D (Bad)         163.85 GB  256     28.8%  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1



ThanksAnuj
Sent from Yahoo Mail on Android 
 
 On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin<ja...@idioplatform.com> wrote:   Hi all, 
We’ve spent a few days running things but are in the same position. To add some more flavour:
   
   - We have a 3-node ring, replication factor = 3. We’ve been running in this configuration for a few years without any real issues
   - Nodes 2 & 3 are much newer than node 1. These two nodes were brought in to replace two other nodes which had failed RAID0 configuration and thus were lacking in disk space.
   - When node 2 was brought into the ring, it exhibited high CPU wait, IO and load metrics
   - We subsequently brought 3 into the ring: as soon as 3 was fully bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal levels. Those same stats on 3, however, sky-rocketed
   - We’ve confirmed configuration across all three nodes are identical and in line with the recommended production settings
   - We’ve run a full repair
   - Node 2 is currently running compactions, 1 & 3 aren’t and have no pending
   - There is no GC happening from what I can see. Node 1 has a GC log, but that’s not been written to since May last year

What we’re seeing at the moment is similar and normal stats on nodes 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
   
   - Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
   - Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s 
   - Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s 

Can you recommend any next steps? 
Griff

On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:

Hi Vickrum,
I would have proceeded with diagnosis as follows:
1. Analysis of sar report to check system health -cpu memory swap disk etc. System seems to be overloaded. This is evident from mutation drops.
2. Make sure that  all recommended Cassandra production settings available at Datastax site are applied ,disable zone reclaim and THP.
3.Run full Repair on bad node and check data size. Node is owner of maximum token range but has significant lower data.I doubt that bootstrapping happened properly.
4.Compactionstats shows 22 pending compactions. Try throttling compactions via reducing cincurent compactors or compaction throughput.
5.Analyze logs to make sure bootstrapping happened without errors.
6. Look for other common performance problems such as GC pauses to make sure that dropped mutations are not caused by GC pauses.

ThanksAnuj

Sent from Yahoo Mail on Android 
 
 On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi<vi...@idioplatform.com> wrote:   # nodetool compactionstats
pending tasks: 22
          compaction type        keyspace           table       completed           total      unit  progress
               Compactionproduction_analytics    interactions       240410213    161172668724     bytes     0.15%
               Compactionproduction_decisionsdecisions.decisions_q_idx       120815385       226295183     bytes    53.39%
Active compaction remaining time :   2h39m58s

Worth mentioning that compactions haven't been running on this node particularly often. The node's been performing badly regardless of whether it's compacting or not.

On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:

What’s your output of `nodetool compactionstats`?

On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com> wrote:
Hi,

We recently added a new node to our cluster in order to replace a node that died (hardware failure we believe). For the next two weeks it had high disk and network activity. We replaced the server, but it's happened again. We've looked into memory allowances, disk performance, number of connections, and all the nodetool stats, but can't find the cause of the issue.

`nodetool tpstats`[0] shows a lot of active and pending threads, in comparison to the rest of the cluster, but that's likely a symptom, not a cause.

`nodetool status`[1] shows the cluster isn't quite balanced. The bad node (D) has less data.

Disk Activity[2] and Network activity[3] on this node is far higher than the rest.

The only other difference this node has to the rest of the cluster is that its on the ext4 filesystem, whereas the rest are ext3, but we've done plenty of testing there and can't see how that would affect performance on this node so much.

Nothing of note in system.log.

What should our next step be in trying to diagnose this issue?

Best wishes,
Vic

[0] `nodetool tpstats` output:

Good node:
    Pool Name                    Active   Pending      Completed   Blocked  All time blocked
    ReadStage                         0         0       46311521         0                 0
    RequestResponseStage              0         0       23817366         0                 0
    MutationStage                     0         0       47389269         0                 0
    ReadRepairStage                   0         0          11108         0                 0
    ReplicateOnWriteStage             0         0              0         0                 0
    GossipStage                       0         0        5259908         0                 0
    CacheCleanupExecutor              0         0              0         0                 0
    MigrationStage                    0         0             30         0                 0
    MemoryMeter                       0         0          16563         0                 0
    FlushWriter                       0         0          39637         0                26
    ValidationExecutor                0         0          19013         0                 0
    InternalResponseStage             0         0              9         0                 0
    AntiEntropyStage                  0         0          38026         0                 0
    MemtablePostFlusher               0         0          81740         0                 0
    MiscStage                         0         0          19196         0                 0
    PendingRangeCalculator            0         0             23         0                 0
    CompactionExecutor                0         0          61629         0                 0
    commitlog_archiver                0         0              0         0                 0
    HintedHandoff                     0         0             63         0                 0

    Message type           Dropped
    RANGE_SLICE                  0
    READ_REPAIR                  0
    PAGED_RANGE                  0
    BINARY                       0
    READ                       640
    MUTATION                     0
    _TRACE                       0
    REQUEST_RESPONSE             0
    COUNTER_MUTATION             0

Bad node:
    Pool Name                    Active   Pending      Completed   Blocked  All time blocked
    ReadStage                        32       113          52216         0                 0
    RequestResponseStage              0         0           4167         0                 0
    MutationStage                     0         0         127559         0                 0
    ReadRepairStage                   0         0            125         0                 0
    ReplicateOnWriteStage             0         0              0         0                 0
    GossipStage                       0         0           9965         0                 0
    CacheCleanupExecutor              0         0              0         0                 0
    MigrationStage                    0         0              0         0                 0
    MemoryMeter                       0         0             24         0                 0
    FlushWriter                       0         0             27         0                 1
    ValidationExecutor                0         0              0         0                 0
    InternalResponseStage             0         0              0         0                 0
    AntiEntropyStage                  0         0              0         0                 0
    MemtablePostFlusher               0         0             96         0                 0
    MiscStage                         0         0              0         0                 0
    PendingRangeCalculator            0         0             10         0                 0
    CompactionExecutor                1         1             73         0                 0
    commitlog_archiver                0         0              0         0                 0
    HintedHandoff                     0         0             15         0                 0

    Message type           Dropped
    RANGE_SLICE                130
    READ_REPAIR                  1
    PAGED_RANGE                  0
    BINARY                       0
    READ                     31032
    MUTATION                   865
    _TRACE                       0
    REQUEST_RESPONSE             7
    COUNTER_MUTATION             0


[1] `nodetool status` output:

    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address         Load       Tokens  Owns   Host ID                               Rack
    UN  A (Good)        252.37 GB  256     23.0%  9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
    UN  B (Good)        245.91 GB  256     24.4%  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
    UN  C (Good)        254.79 GB  256     23.7%  f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
    UN  D (Bad)         163.85 GB  256     28.8%  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1

[2] Disk read/write ops:

    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png

[3] Network in/out:

    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png




  


  


  


  

Re: New node has high network and disk usage.

Posted by James Griffin <ja...@idioplatform.com>.
I think I was incorrect in assuming GC wasn't an issue due to the lack of
logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
differences, though
comparing the startup flags on the two machines show the GC config is
identical.:

$ jstat -gcutil
   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
2  5.08   0.00  55.72  18.24  59.90  25986  619.827    28    1.597  621.424
3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
11283.361

Here's typical output for iostat on nodes 2 & 3 as well:

$ iostat -dmx md0

  Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
2 md0               0.00     0.00  339.00    0.00     9.77     0.00
 59.00     0.00    0.00    0.00    0.00   0.00   0.00
3 md0               0.00     0.00 2069.00    1.00    85.85     0.00
 84.94     0.00    0.00    0.00    0.00   0.00   0.00

Griff

On 13 January 2016 at 18:36, Anuj Wadehra <an...@yahoo.co.in> wrote:

> Node 2 has slightly higher data but that should be ok. Not sure how read
> ops are so high when no IO intensive activity such as repair and compaction
> is running on node 3.May be you can try investigating logs to see whats
> happening.
>
> Others on the mailing list could also share their views on the situation.
>
> Thanks
> Anuj
>
>
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
> <ja...@idioplatform.com> wrote:
> Hi Anuj,
>
> Below is the output of nodetool status. The nodes were replaced following
> the instructions in Datastax documentation for replacing running nodes
> since the nodes were running fine, it was that the servers had been
> incorrectly initialised and they thus had less disk space. The status below
> shows 2 has significantly higher load, however as I say 2 is operating
> normally and is running compactions, so I guess that's not an issue?
>
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address         Load       Tokens  Owns   Host ID
>           Rack
> UN  1               253.59 GB  256     31.7%
>  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
> UN  2               302.23 GB  256     35.3%
>  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
> UN  3               265.02 GB  256     33.1%
>  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
>
> Griff
>
> On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in> wrote:
>
>> Hi,
>>
>> Revisiting the thread I can see that nodetool status had both good and
>> bad nodes at same time. How do you replace nodes? When you say bad node..I
>> understand that the node is no more usable even though Cassandra is UP? Is
>> that correct?
>>
>> If a node is in bad shape and not working, adding new node may trigger
>> streaming huge data from bad node too. Have you considered using the
>> procedure for replacing a dead node?
>>
>> Please share Latest nodetool status.
>>
>> nodetool output shared earlier:
>>
>>  `nodetool status` output:
>>
>>     Status=Up/Down
>>     |/ State=Normal/Leaving/Joining/Moving
>>     --  Address         Load       Tokens  Owns   Host
>> ID                               Rack
>>     UN  A (Good)        252.37 GB  256     23.0%
>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>     UN  B (Good)        245.91 GB  256     24.4%
>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>     UN  C (Good)        254.79 GB  256     23.7%
>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>     UN  D (Bad)         163.85 GB  256     28.8%
>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>
>>
>>
>> Thanks
>> Anuj
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>
>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>> <ja...@idioplatform.com> wrote:
>> Hi all,
>>
>> We’ve spent a few days running things but are in the same position. To
>> add some more flavour:
>>
>>
>>    - We have a 3-node ring, replication factor = 3. We’ve been running
>>    in this configuration for a few years without any real issues
>>    - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>    brought in to replace two other nodes which had failed RAID0 configuration
>>    and thus were lacking in disk space.
>>    - When node 2 was brought into the ring, it exhibited high CPU wait,
>>    IO and load metrics
>>    - We subsequently brought 3 into the ring: as soon as 3 was fully
>>    bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>    levels. Those same stats on 3, however, sky-rocketed
>>    - We’ve confirmed configuration across all three nodes are identical
>>    and in line with the recommended production settings
>>    - We’ve run a full repair
>>    - Node 2 is currently running compactions, 1 & 3 aren’t and have no
>>    pending
>>    - There is no GC happening from what I can see. Node 1 has a GC log,
>>    but that’s not been written to since May last year
>>
>>
>> What we’re seeing at the moment is similar and normal stats on nodes 1 &
>> 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>
>>
>>    1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>    2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>    3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>
>>
>> Can you recommend any next steps?
>>
>> Griff
>>
>> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:
>>
>>> Hi Vickrum,
>>>
>>> I would have proceeded with diagnosis as follows:
>>>
>>> 1. Analysis of sar report to check system health -cpu memory swap disk
>>> etc.
>>> System seems to be overloaded. This is evident from mutation drops.
>>>
>>> 2. Make sure that  all recommended Cassandra production settings
>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>
>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>> maximum token range but has significant lower data.I doubt that
>>> bootstrapping happened properly.
>>>
>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>> compactions via reducing cincurent compactors or compaction throughput.
>>>
>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>
>>> 6. Look for other common performance problems such as GC pauses to make
>>> sure that dropped mutations are not caused by GC pauses.
>>>
>>>
>>> Thanks
>>> Anuj
>>>
>>> Sent from Yahoo Mail on Android
>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>
>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>> <vi...@idioplatform.com> wrote:
>>> # nodetool compactionstats
>>> pending tasks: 22
>>>           compaction type        keyspace           table
>>> completed           total      unit  progress
>>>                Compactionproduction_analytics    interactions
>>> 240410213    161172668724     bytes     0.15%
>>>
>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>> 120815385       226295183     bytes    53.39%
>>> Active compaction remaining time :   2h39m58s
>>>
>>> Worth mentioning that compactions haven't been running on this node
>>> particularly often. The node's been performing badly regardless of whether
>>> it's compacting or not.
>>>
>>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
>>>
>>>> What’s your output of `nodetool compactionstats`?
>>>>
>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We recently added a new node to our cluster in order to replace a node
>>>> that died (hardware failure we believe). For the next two weeks it had high
>>>> disk and network activity. We replaced the server, but it's happened again.
>>>> We've looked into memory allowances, disk performance, number of
>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>> issue.
>>>>
>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>>> comparison to the rest of the cluster, but that's likely a symptom, not a
>>>> cause.
>>>>
>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad
>>>> node (D) has less data.
>>>>
>>>> Disk Activity[2] and Network activity[3] on this node is far higher
>>>> than the rest.
>>>>
>>>> The only other difference this node has to the rest of the cluster is
>>>> that its on the ext4 filesystem, whereas the rest are ext3, but we've done
>>>> plenty of testing there and can't see how that would affect performance on
>>>> this node so much.
>>>>
>>>> Nothing of note in system.log.
>>>>
>>>> What should our next step be in trying to diagnose this issue?
>>>>
>>>> Best wishes,
>>>> Vic
>>>>
>>>> [0] `nodetool tpstats` output:
>>>>
>>>> Good node:
>>>>     Pool Name                    Active   Pending      Completed
>>>> Blocked  All time blocked
>>>>     ReadStage                         0         0
>>>> 46311521         0                 0
>>>>     RequestResponseStage              0         0
>>>> 23817366         0                 0
>>>>     MutationStage                     0         0
>>>> 47389269         0                 0
>>>>     ReadRepairStage                   0         0
>>>> 11108         0                 0
>>>>     ReplicateOnWriteStage             0         0
>>>> 0         0                 0
>>>>     GossipStage                       0         0
>>>> 5259908         0                 0
>>>>     CacheCleanupExecutor              0         0
>>>> 0         0                 0
>>>>     MigrationStage                    0         0
>>>> 30         0                 0
>>>>     MemoryMeter                       0         0
>>>> 16563         0                 0
>>>>     FlushWriter                       0         0
>>>> 39637         0                26
>>>>     ValidationExecutor                0         0
>>>> 19013         0                 0
>>>>     InternalResponseStage             0         0
>>>> 9         0                 0
>>>>     AntiEntropyStage                  0         0
>>>> 38026         0                 0
>>>>     MemtablePostFlusher               0         0
>>>> 81740         0                 0
>>>>     MiscStage                         0         0
>>>> 19196         0                 0
>>>>     PendingRangeCalculator            0         0
>>>> 23         0                 0
>>>>     CompactionExecutor                0         0
>>>> 61629         0                 0
>>>>     commitlog_archiver                0         0
>>>> 0         0                 0
>>>>     HintedHandoff                     0         0
>>>> 63         0                 0
>>>>
>>>>     Message type           Dropped
>>>>     RANGE_SLICE                  0
>>>>     READ_REPAIR                  0
>>>>     PAGED_RANGE                  0
>>>>     BINARY                       0
>>>>     READ                       640
>>>>     MUTATION                     0
>>>>     _TRACE                       0
>>>>     REQUEST_RESPONSE             0
>>>>     COUNTER_MUTATION             0
>>>>
>>>> Bad node:
>>>>     Pool Name                    Active   Pending      Completed
>>>> Blocked  All time blocked
>>>>     ReadStage                        32       113
>>>> 52216         0                 0
>>>>     RequestResponseStage              0         0
>>>> 4167         0                 0
>>>>     MutationStage                     0         0
>>>> 127559         0                 0
>>>>     ReadRepairStage                   0         0
>>>> 125         0                 0
>>>>     ReplicateOnWriteStage             0         0
>>>> 0         0                 0
>>>>     GossipStage                       0         0
>>>> 9965         0                 0
>>>>     CacheCleanupExecutor              0         0
>>>> 0         0                 0
>>>>     MigrationStage                    0         0
>>>> 0         0                 0
>>>>     MemoryMeter                       0         0
>>>> 24         0                 0
>>>>     FlushWriter                       0         0
>>>> 27         0                 1
>>>>     ValidationExecutor                0         0
>>>> 0         0                 0
>>>>     InternalResponseStage             0         0
>>>> 0         0                 0
>>>>     AntiEntropyStage                  0         0
>>>> 0         0                 0
>>>>     MemtablePostFlusher               0         0
>>>> 96         0                 0
>>>>     MiscStage                         0         0
>>>> 0         0                 0
>>>>     PendingRangeCalculator            0         0
>>>> 10         0                 0
>>>>     CompactionExecutor                1         1
>>>> 73         0                 0
>>>>     commitlog_archiver                0         0
>>>> 0         0                 0
>>>>     HintedHandoff                     0         0
>>>> 15         0                 0
>>>>
>>>>     Message type           Dropped
>>>>     RANGE_SLICE                130
>>>>     READ_REPAIR                  1
>>>>     PAGED_RANGE                  0
>>>>     BINARY                       0
>>>>     READ                     31032
>>>>     MUTATION                   865
>>>>     _TRACE                       0
>>>>     REQUEST_RESPONSE             7
>>>>     COUNTER_MUTATION             0
>>>>
>>>>
>>>> [1] `nodetool status` output:
>>>>
>>>>     Status=Up/Down
>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>     --  Address         Load       Tokens  Owns   Host
>>>> ID                               Rack
>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>
>>>> [2] Disk read/write ops:
>>>>
>>>>
>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>
>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>
>>>> [3] Network in/out:
>>>>
>>>>
>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>
>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>
>>>>
>>>>
>>>
>>
>

Re: New node has high network and disk usage.

Posted by Anuj Wadehra <an...@yahoo.co.in>.
Node 2 has slightly higher data but that should be ok. Not sure how read ops are so high when no IO intensive activity such as repair and compaction is running on node 3.May be you can try investigating logs to see whats happening.
Others on the mailing list could also share their views on the situation.

ThanksAnuj


Sent from Yahoo Mail on Android 
 
  On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin<ja...@idioplatform.com> wrote:   Hi Anuj, 
Below is the output of nodetool status. The nodes were replaced following the instructions in Datastax documentation for replacing running nodes since the nodes were running fine, it was that the servers had been incorrectly initialised and they thus had less disk space. The status below shows 2 has significantly higher load, however as I say 2 is operating normally and is running compactions, so I guess that's not an issue?
Datacenter: datacenter1=======================Status=Up/Down|/ State=Normal/Leaving/Joining/Moving--  Address         Load       Tokens  Owns   Host ID                               RackUN  1               253.59 GB  256     31.7%  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1UN  2               302.23 GB  256     35.3%  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1UN  3               265.02 GB  256     33.1%  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
Griff

On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in> wrote:

Hi,
Revisiting the thread I can see that nodetool status had both good and bad nodes at same time. How do you replace nodes? When you say bad node..I understand that the node is no more usable even though Cassandra is UP? Is that correct?
If a node is in bad shape and not working, adding new node may trigger streaming huge data from bad node too. Have you considered using the procedure for replacing a dead node?
Please share Latest nodetool status.
nodetool output shared earlier:
 `nodetool status` output:

    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address         Load       Tokens  Owns   Host ID                               Rack
    UN  A (Good)        252.37 GB  256     23.0%  9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
    UN  B (Good)        245.91 GB  256     24.4%  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
    UN  C (Good)        254.79 GB  256     23.7%  f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
    UN  D (Bad)         163.85 GB  256     28.8%  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1



ThanksAnuj
Sent from Yahoo Mail on Android 
 
 On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin<ja...@idioplatform.com> wrote:   Hi all, 
We’ve spent a few days running things but are in the same position. To add some more flavour:
   
   - We have a 3-node ring, replication factor = 3. We’ve been running in this configuration for a few years without any real issues
   - Nodes 2 & 3 are much newer than node 1. These two nodes were brought in to replace two other nodes which had failed RAID0 configuration and thus were lacking in disk space.
   - When node 2 was brought into the ring, it exhibited high CPU wait, IO and load metrics
   - We subsequently brought 3 into the ring: as soon as 3 was fully bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal levels. Those same stats on 3, however, sky-rocketed
   - We’ve confirmed configuration across all three nodes are identical and in line with the recommended production settings
   - We’ve run a full repair
   - Node 2 is currently running compactions, 1 & 3 aren’t and have no pending
   - There is no GC happening from what I can see. Node 1 has a GC log, but that’s not been written to since May last year

What we’re seeing at the moment is similar and normal stats on nodes 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
   
   - Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
   - Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s 
   - Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s 

Can you recommend any next steps? 
Griff

On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:

Hi Vickrum,
I would have proceeded with diagnosis as follows:
1. Analysis of sar report to check system health -cpu memory swap disk etc. System seems to be overloaded. This is evident from mutation drops.
2. Make sure that  all recommended Cassandra production settings available at Datastax site are applied ,disable zone reclaim and THP.
3.Run full Repair on bad node and check data size. Node is owner of maximum token range but has significant lower data.I doubt that bootstrapping happened properly.
4.Compactionstats shows 22 pending compactions. Try throttling compactions via reducing cincurent compactors or compaction throughput.
5.Analyze logs to make sure bootstrapping happened without errors.
6. Look for other common performance problems such as GC pauses to make sure that dropped mutations are not caused by GC pauses.

ThanksAnuj

Sent from Yahoo Mail on Android 
 
 On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi<vi...@idioplatform.com> wrote:   # nodetool compactionstats
pending tasks: 22
          compaction type        keyspace           table       completed           total      unit  progress
               Compactionproduction_analytics    interactions       240410213    161172668724     bytes     0.15%
               Compactionproduction_decisionsdecisions.decisions_q_idx       120815385       226295183     bytes    53.39%
Active compaction remaining time :   2h39m58s

Worth mentioning that compactions haven't been running on this node particularly often. The node's been performing badly regardless of whether it's compacting or not.

On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:

What’s your output of `nodetool compactionstats`?

On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com> wrote:
Hi,

We recently added a new node to our cluster in order to replace a node that died (hardware failure we believe). For the next two weeks it had high disk and network activity. We replaced the server, but it's happened again. We've looked into memory allowances, disk performance, number of connections, and all the nodetool stats, but can't find the cause of the issue.

`nodetool tpstats`[0] shows a lot of active and pending threads, in comparison to the rest of the cluster, but that's likely a symptom, not a cause.

`nodetool status`[1] shows the cluster isn't quite balanced. The bad node (D) has less data.

Disk Activity[2] and Network activity[3] on this node is far higher than the rest.

The only other difference this node has to the rest of the cluster is that its on the ext4 filesystem, whereas the rest are ext3, but we've done plenty of testing there and can't see how that would affect performance on this node so much.

Nothing of note in system.log.

What should our next step be in trying to diagnose this issue?

Best wishes,
Vic

[0] `nodetool tpstats` output:

Good node:
    Pool Name                    Active   Pending      Completed   Blocked  All time blocked
    ReadStage                         0         0       46311521         0                 0
    RequestResponseStage              0         0       23817366         0                 0
    MutationStage                     0         0       47389269         0                 0
    ReadRepairStage                   0         0          11108         0                 0
    ReplicateOnWriteStage             0         0              0         0                 0
    GossipStage                       0         0        5259908         0                 0
    CacheCleanupExecutor              0         0              0         0                 0
    MigrationStage                    0         0             30         0                 0
    MemoryMeter                       0         0          16563         0                 0
    FlushWriter                       0         0          39637         0                26
    ValidationExecutor                0         0          19013         0                 0
    InternalResponseStage             0         0              9         0                 0
    AntiEntropyStage                  0         0          38026         0                 0
    MemtablePostFlusher               0         0          81740         0                 0
    MiscStage                         0         0          19196         0                 0
    PendingRangeCalculator            0         0             23         0                 0
    CompactionExecutor                0         0          61629         0                 0
    commitlog_archiver                0         0              0         0                 0
    HintedHandoff                     0         0             63         0                 0

    Message type           Dropped
    RANGE_SLICE                  0
    READ_REPAIR                  0
    PAGED_RANGE                  0
    BINARY                       0
    READ                       640
    MUTATION                     0
    _TRACE                       0
    REQUEST_RESPONSE             0
    COUNTER_MUTATION             0

Bad node:
    Pool Name                    Active   Pending      Completed   Blocked  All time blocked
    ReadStage                        32       113          52216         0                 0
    RequestResponseStage              0         0           4167         0                 0
    MutationStage                     0         0         127559         0                 0
    ReadRepairStage                   0         0            125         0                 0
    ReplicateOnWriteStage             0         0              0         0                 0
    GossipStage                       0         0           9965         0                 0
    CacheCleanupExecutor              0         0              0         0                 0
    MigrationStage                    0         0              0         0                 0
    MemoryMeter                       0         0             24         0                 0
    FlushWriter                       0         0             27         0                 1
    ValidationExecutor                0         0              0         0                 0
    InternalResponseStage             0         0              0         0                 0
    AntiEntropyStage                  0         0              0         0                 0
    MemtablePostFlusher               0         0             96         0                 0
    MiscStage                         0         0              0         0                 0
    PendingRangeCalculator            0         0             10         0                 0
    CompactionExecutor                1         1             73         0                 0
    commitlog_archiver                0         0              0         0                 0
    HintedHandoff                     0         0             15         0                 0

    Message type           Dropped
    RANGE_SLICE                130
    READ_REPAIR                  1
    PAGED_RANGE                  0
    BINARY                       0
    READ                     31032
    MUTATION                   865
    _TRACE                       0
    REQUEST_RESPONSE             7
    COUNTER_MUTATION             0


[1] `nodetool status` output:

    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address         Load       Tokens  Owns   Host ID                               Rack
    UN  A (Good)        252.37 GB  256     23.0%  9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
    UN  B (Good)        245.91 GB  256     24.4%  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
    UN  C (Good)        254.79 GB  256     23.7%  f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
    UN  D (Bad)         163.85 GB  256     28.8%  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1

[2] Disk read/write ops:

    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png

[3] Network in/out:

    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png




  


  


  

Re: New node has high network and disk usage.

Posted by James Griffin <ja...@idioplatform.com>.
Hi Anuj,

Below is the output of nodetool status. The nodes were replaced following
the instructions in Datastax documentation for replacing running nodes
since the nodes were running fine, it was that the servers had been
incorrectly initialised and they thus had less disk space. The status below
shows 2 has significantly higher load, however as I say 2 is operating
normally and is running compactions, so I guess that's not an issue?

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns   Host ID
        Rack
UN  1               253.59 GB  256     31.7%
 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
UN  2               302.23 GB  256     35.3%
 faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
UN  3               265.02 GB  256     33.1%
 74b15507-db5c-45df-81db-6e5bcb7438a3  rack1

Griff

On 13 January 2016 at 18:12, Anuj Wadehra <an...@yahoo.co.in> wrote:

> Hi,
>
> Revisiting the thread I can see that nodetool status had both good and bad
> nodes at same time. How do you replace nodes? When you say bad node..I
> understand that the node is no more usable even though Cassandra is UP? Is
> that correct?
>
> If a node is in bad shape and not working, adding new node may trigger
> streaming huge data from bad node too. Have you considered using the
> procedure for replacing a dead node?
>
> Please share Latest nodetool status.
>
> nodetool output shared earlier:
>
>  `nodetool status` output:
>
>     Status=Up/Down
>     |/ State=Normal/Leaving/Joining/Moving
>     --  Address         Load       Tokens  Owns   Host
> ID                               Rack
>     UN  A (Good)        252.37 GB  256     23.0%
> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>     UN  B (Good)        245.91 GB  256     24.4%
> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>     UN  C (Good)        254.79 GB  256     23.7%
> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>     UN  D (Bad)         163.85 GB  256     28.8%
> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
> <ja...@idioplatform.com> wrote:
> Hi all,
>
> We’ve spent a few days running things but are in the same position. To add
> some more flavour:
>
>
>    - We have a 3-node ring, replication factor = 3. We’ve been running in
>    this configuration for a few years without any real issues
>    - Nodes 2 & 3 are much newer than node 1. These two nodes were brought
>    in to replace two other nodes which had failed RAID0 configuration and thus
>    were lacking in disk space.
>    - When node 2 was brought into the ring, it exhibited high CPU wait,
>    IO and load metrics
>    - We subsequently brought 3 into the ring: as soon as 3 was fully
>    bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>    levels. Those same stats on 3, however, sky-rocketed
>    - We’ve confirmed configuration across all three nodes are identical
>    and in line with the recommended production settings
>    - We’ve run a full repair
>    - Node 2 is currently running compactions, 1 & 3 aren’t and have no
>    pending
>    - There is no GC happening from what I can see. Node 1 has a GC log,
>    but that’s not been written to since May last year
>
>
> What we’re seeing at the moment is similar and normal stats on nodes 1 &
> 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>
>
>    1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>    2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>    3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>
>
> Can you recommend any next steps?
>
> Griff
>
> On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:
>
>> Hi Vickrum,
>>
>> I would have proceeded with diagnosis as follows:
>>
>> 1. Analysis of sar report to check system health -cpu memory swap disk
>> etc.
>> System seems to be overloaded. This is evident from mutation drops.
>>
>> 2. Make sure that  all recommended Cassandra production settings
>> available at Datastax site are applied ,disable zone reclaim and THP.
>>
>> 3.Run full Repair on bad node and check data size. Node is owner of
>> maximum token range but has significant lower data.I doubt that
>> bootstrapping happened properly.
>>
>> 4.Compactionstats shows 22 pending compactions. Try throttling
>> compactions via reducing cincurent compactors or compaction throughput.
>>
>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>
>> 6. Look for other common performance problems such as GC pauses to make
>> sure that dropped mutations are not caused by GC pauses.
>>
>>
>> Thanks
>> Anuj
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>
>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>> <vi...@idioplatform.com> wrote:
>> # nodetool compactionstats
>> pending tasks: 22
>>           compaction type        keyspace           table
>> completed           total      unit  progress
>>                Compactionproduction_analytics    interactions
>> 240410213    161172668724     bytes     0.15%
>>
>> Compactionproduction_decisionsdecisions.decisions_q_idx
>> 120815385       226295183     bytes    53.39%
>> Active compaction remaining time :   2h39m58s
>>
>> Worth mentioning that compactions haven't been running on this node
>> particularly often. The node's been performing badly regardless of whether
>> it's compacting or not.
>>
>> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
>>
>>> What’s your output of `nodetool compactionstats`?
>>>
>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> We recently added a new node to our cluster in order to replace a node
>>> that died (hardware failure we believe). For the next two weeks it had high
>>> disk and network activity. We replaced the server, but it's happened again.
>>> We've looked into memory allowances, disk performance, number of
>>> connections, and all the nodetool stats, but can't find the cause of the
>>> issue.
>>>
>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>> comparison to the rest of the cluster, but that's likely a symptom, not a
>>> cause.
>>>
>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad
>>> node (D) has less data.
>>>
>>> Disk Activity[2] and Network activity[3] on this node is far higher than
>>> the rest.
>>>
>>> The only other difference this node has to the rest of the cluster is
>>> that its on the ext4 filesystem, whereas the rest are ext3, but we've done
>>> plenty of testing there and can't see how that would affect performance on
>>> this node so much.
>>>
>>> Nothing of note in system.log.
>>>
>>> What should our next step be in trying to diagnose this issue?
>>>
>>> Best wishes,
>>> Vic
>>>
>>> [0] `nodetool tpstats` output:
>>>
>>> Good node:
>>>     Pool Name                    Active   Pending      Completed
>>> Blocked  All time blocked
>>>     ReadStage                         0         0       46311521
>>> 0                 0
>>>     RequestResponseStage              0         0       23817366
>>> 0                 0
>>>     MutationStage                     0         0       47389269
>>> 0                 0
>>>     ReadRepairStage                   0         0          11108
>>> 0                 0
>>>     ReplicateOnWriteStage             0         0              0
>>> 0                 0
>>>     GossipStage                       0         0        5259908
>>> 0                 0
>>>     CacheCleanupExecutor              0         0              0
>>> 0                 0
>>>     MigrationStage                    0         0             30
>>> 0                 0
>>>     MemoryMeter                       0         0          16563
>>> 0                 0
>>>     FlushWriter                       0         0          39637
>>> 0                26
>>>     ValidationExecutor                0         0          19013
>>> 0                 0
>>>     InternalResponseStage             0         0              9
>>> 0                 0
>>>     AntiEntropyStage                  0         0          38026
>>> 0                 0
>>>     MemtablePostFlusher               0         0          81740
>>> 0                 0
>>>     MiscStage                         0         0          19196
>>> 0                 0
>>>     PendingRangeCalculator            0         0             23
>>> 0                 0
>>>     CompactionExecutor                0         0          61629
>>> 0                 0
>>>     commitlog_archiver                0         0              0
>>> 0                 0
>>>     HintedHandoff                     0         0             63
>>> 0                 0
>>>
>>>     Message type           Dropped
>>>     RANGE_SLICE                  0
>>>     READ_REPAIR                  0
>>>     PAGED_RANGE                  0
>>>     BINARY                       0
>>>     READ                       640
>>>     MUTATION                     0
>>>     _TRACE                       0
>>>     REQUEST_RESPONSE             0
>>>     COUNTER_MUTATION             0
>>>
>>> Bad node:
>>>     Pool Name                    Active   Pending      Completed
>>> Blocked  All time blocked
>>>     ReadStage                        32       113          52216
>>> 0                 0
>>>     RequestResponseStage              0         0           4167
>>> 0                 0
>>>     MutationStage                     0         0         127559
>>> 0                 0
>>>     ReadRepairStage                   0         0            125
>>> 0                 0
>>>     ReplicateOnWriteStage             0         0              0
>>> 0                 0
>>>     GossipStage                       0         0           9965
>>> 0                 0
>>>     CacheCleanupExecutor              0         0              0
>>> 0                 0
>>>     MigrationStage                    0         0              0
>>> 0                 0
>>>     MemoryMeter                       0         0             24
>>> 0                 0
>>>     FlushWriter                       0         0             27
>>> 0                 1
>>>     ValidationExecutor                0         0              0
>>> 0                 0
>>>     InternalResponseStage             0         0              0
>>> 0                 0
>>>     AntiEntropyStage                  0         0              0
>>> 0                 0
>>>     MemtablePostFlusher               0         0             96
>>> 0                 0
>>>     MiscStage                         0         0              0
>>> 0                 0
>>>     PendingRangeCalculator            0         0             10
>>> 0                 0
>>>     CompactionExecutor                1         1             73
>>> 0                 0
>>>     commitlog_archiver                0         0              0
>>> 0                 0
>>>     HintedHandoff                     0         0             15
>>> 0                 0
>>>
>>>     Message type           Dropped
>>>     RANGE_SLICE                130
>>>     READ_REPAIR                  1
>>>     PAGED_RANGE                  0
>>>     BINARY                       0
>>>     READ                     31032
>>>     MUTATION                   865
>>>     _TRACE                       0
>>>     REQUEST_RESPONSE             7
>>>     COUNTER_MUTATION             0
>>>
>>>
>>> [1] `nodetool status` output:
>>>
>>>     Status=Up/Down
>>>     |/ State=Normal/Leaving/Joining/Moving
>>>     --  Address         Load       Tokens  Owns   Host
>>> ID                               Rack
>>>     UN  A (Good)        252.37 GB  256     23.0%
>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>     UN  B (Good)        245.91 GB  256     24.4%
>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>     UN  C (Good)        254.79 GB  256     23.7%
>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>
>>> [2] Disk read/write ops:
>>>
>>>
>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>
>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>
>>> [3] Network in/out:
>>>
>>>
>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>
>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>
>>>
>>>
>>
>

Re: New node has high network and disk usage.

Posted by Anuj Wadehra <an...@yahoo.co.in>.
Hi,
Revisiting the thread I can see that nodetool status had both good and bad nodes at same time. How do you replace nodes? When you say bad node..I understand that the node is no more usable even though Cassandra is UP? Is that correct?
If a node is in bad shape and not working, adding new node may trigger streaming huge data from bad node too. Have you considered using the procedure for replacing a dead node?
Please share Latest nodetool status.
nodetool output shared earlier:
 `nodetool status` output:

    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address         Load       Tokens  Owns   Host ID                               Rack
    UN  A (Good)        252.37 GB  256     23.0%  9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
    UN  B (Good)        245.91 GB  256     24.4%  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
    UN  C (Good)        254.79 GB  256     23.7%  f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
    UN  D (Bad)         163.85 GB  256     28.8%  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1



ThanksAnuj
Sent from Yahoo Mail on Android 
 
  On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin<ja...@idioplatform.com> wrote:   Hi all, 
We’ve spent a few days running things but are in the same position. To add some more flavour:
   
   - We have a 3-node ring, replication factor = 3. We’ve been running in this configuration for a few years without any real issues
   - Nodes 2 & 3 are much newer than node 1. These two nodes were brought in to replace two other nodes which had failed RAID0 configuration and thus were lacking in disk space.
   - When node 2 was brought into the ring, it exhibited high CPU wait, IO and load metrics
   - We subsequently brought 3 into the ring: as soon as 3 was fully bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal levels. Those same stats on 3, however, sky-rocketed
   - We’ve confirmed configuration across all three nodes are identical and in line with the recommended production settings
   - We’ve run a full repair
   - Node 2 is currently running compactions, 1 & 3 aren’t and have no pending
   - There is no GC happening from what I can see. Node 1 has a GC log, but that’s not been written to since May last year

What we’re seeing at the moment is similar and normal stats on nodes 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
   
   - Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
   - Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s 
   - Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s 

Can you recommend any next steps? 
Griff

On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:

Hi Vickrum,
I would have proceeded with diagnosis as follows:
1. Analysis of sar report to check system health -cpu memory swap disk etc. System seems to be overloaded. This is evident from mutation drops.
2. Make sure that  all recommended Cassandra production settings available at Datastax site are applied ,disable zone reclaim and THP.
3.Run full Repair on bad node and check data size. Node is owner of maximum token range but has significant lower data.I doubt that bootstrapping happened properly.
4.Compactionstats shows 22 pending compactions. Try throttling compactions via reducing cincurent compactors or compaction throughput.
5.Analyze logs to make sure bootstrapping happened without errors.
6. Look for other common performance problems such as GC pauses to make sure that dropped mutations are not caused by GC pauses.

ThanksAnuj

Sent from Yahoo Mail on Android 
 
 On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi<vi...@idioplatform.com> wrote:   # nodetool compactionstats
pending tasks: 22
          compaction type        keyspace           table       completed           total      unit  progress
               Compactionproduction_analytics    interactions       240410213    161172668724     bytes     0.15%
               Compactionproduction_decisionsdecisions.decisions_q_idx       120815385       226295183     bytes    53.39%
Active compaction remaining time :   2h39m58s

Worth mentioning that compactions haven't been running on this node particularly often. The node's been performing badly regardless of whether it's compacting or not.

On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:

What’s your output of `nodetool compactionstats`?

On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com> wrote:
Hi,

We recently added a new node to our cluster in order to replace a node that died (hardware failure we believe). For the next two weeks it had high disk and network activity. We replaced the server, but it's happened again. We've looked into memory allowances, disk performance, number of connections, and all the nodetool stats, but can't find the cause of the issue.

`nodetool tpstats`[0] shows a lot of active and pending threads, in comparison to the rest of the cluster, but that's likely a symptom, not a cause.

`nodetool status`[1] shows the cluster isn't quite balanced. The bad node (D) has less data.

Disk Activity[2] and Network activity[3] on this node is far higher than the rest.

The only other difference this node has to the rest of the cluster is that its on the ext4 filesystem, whereas the rest are ext3, but we've done plenty of testing there and can't see how that would affect performance on this node so much.

Nothing of note in system.log.

What should our next step be in trying to diagnose this issue?

Best wishes,
Vic

[0] `nodetool tpstats` output:

Good node:
    Pool Name                    Active   Pending      Completed   Blocked  All time blocked
    ReadStage                         0         0       46311521         0                 0
    RequestResponseStage              0         0       23817366         0                 0
    MutationStage                     0         0       47389269         0                 0
    ReadRepairStage                   0         0          11108         0                 0
    ReplicateOnWriteStage             0         0              0         0                 0
    GossipStage                       0         0        5259908         0                 0
    CacheCleanupExecutor              0         0              0         0                 0
    MigrationStage                    0         0             30         0                 0
    MemoryMeter                       0         0          16563         0                 0
    FlushWriter                       0         0          39637         0                26
    ValidationExecutor                0         0          19013         0                 0
    InternalResponseStage             0         0              9         0                 0
    AntiEntropyStage                  0         0          38026         0                 0
    MemtablePostFlusher               0         0          81740         0                 0
    MiscStage                         0         0          19196         0                 0
    PendingRangeCalculator            0         0             23         0                 0
    CompactionExecutor                0         0          61629         0                 0
    commitlog_archiver                0         0              0         0                 0
    HintedHandoff                     0         0             63         0                 0

    Message type           Dropped
    RANGE_SLICE                  0
    READ_REPAIR                  0
    PAGED_RANGE                  0
    BINARY                       0
    READ                       640
    MUTATION                     0
    _TRACE                       0
    REQUEST_RESPONSE             0
    COUNTER_MUTATION             0

Bad node:
    Pool Name                    Active   Pending      Completed   Blocked  All time blocked
    ReadStage                        32       113          52216         0                 0
    RequestResponseStage              0         0           4167         0                 0
    MutationStage                     0         0         127559         0                 0
    ReadRepairStage                   0         0            125         0                 0
    ReplicateOnWriteStage             0         0              0         0                 0
    GossipStage                       0         0           9965         0                 0
    CacheCleanupExecutor              0         0              0         0                 0
    MigrationStage                    0         0              0         0                 0
    MemoryMeter                       0         0             24         0                 0
    FlushWriter                       0         0             27         0                 1
    ValidationExecutor                0         0              0         0                 0
    InternalResponseStage             0         0              0         0                 0
    AntiEntropyStage                  0         0              0         0                 0
    MemtablePostFlusher               0         0             96         0                 0
    MiscStage                         0         0              0         0                 0
    PendingRangeCalculator            0         0             10         0                 0
    CompactionExecutor                1         1             73         0                 0
    commitlog_archiver                0         0              0         0                 0
    HintedHandoff                     0         0             15         0                 0

    Message type           Dropped
    RANGE_SLICE                130
    READ_REPAIR                  1
    PAGED_RANGE                  0
    BINARY                       0
    READ                     31032
    MUTATION                   865
    _TRACE                       0
    REQUEST_RESPONSE             7
    COUNTER_MUTATION             0


[1] `nodetool status` output:

    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address         Load       Tokens  Owns   Host ID                               Rack
    UN  A (Good)        252.37 GB  256     23.0%  9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
    UN  B (Good)        245.91 GB  256     24.4%  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
    UN  C (Good)        254.79 GB  256     23.7%  f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
    UN  D (Bad)         163.85 GB  256     28.8%  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1

[2] Disk read/write ops:

    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png

[3] Network in/out:

    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png




  


  

Re: New node has high network and disk usage.

Posted by James Griffin <ja...@idioplatform.com>.
 Hi all,

We’ve spent a few days running things but are in the same position. To add
some more flavour:


   - We have a 3-node ring, replication factor = 3. We’ve been running in
   this configuration for a few years without any real issues
   - Nodes 2 & 3 are much newer than node 1. These two nodes were brought
   in to replace two other nodes which had failed RAID0 configuration and thus
   were lacking in disk space.
   - When node 2 was brought into the ring, it exhibited high CPU wait, IO
   and load metrics
   - We subsequently brought 3 into the ring: as soon as 3 was fully
   bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
   levels. Those same stats on 3, however, sky-rocketed
   - We’ve confirmed configuration across all three nodes are identical and
   in line with the recommended production settings
   - We’ve run a full repair
   - Node 2 is currently running compactions, 1 & 3 aren’t and have no
   pending
   - There is no GC happening from what I can see. Node 1 has a GC log, but
   that’s not been written to since May last year


What we’re seeing at the moment is similar and normal stats on nodes 1 & 2,
but high CPU wait, IO and load stats on 3. As a snapshot:


   1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
   2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
   3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s


Can you recommend any next steps?

Griff

On 6 January 2016 at 17:31, Anuj Wadehra <an...@yahoo.co.in> wrote:

> Hi Vickrum,
>
> I would have proceeded with diagnosis as follows:
>
> 1. Analysis of sar report to check system health -cpu memory swap disk
> etc.
> System seems to be overloaded. This is evident from mutation drops.
>
> 2. Make sure that  all recommended Cassandra production settings available
> at Datastax site are applied ,disable zone reclaim and THP.
>
> 3.Run full Repair on bad node and check data size. Node is owner of
> maximum token range but has significant lower data.I doubt that
> bootstrapping happened properly.
>
> 4.Compactionstats shows 22 pending compactions. Try throttling compactions
> via reducing cincurent compactors or compaction throughput.
>
> 5.Analyze logs to make sure bootstrapping happened without errors.
>
> 6. Look for other common performance problems such as GC pauses to make
> sure that dropped mutations are not caused by GC pauses.
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
> <vi...@idioplatform.com> wrote:
> # nodetool compactionstats
> pending tasks: 22
>           compaction type        keyspace           table
> completed           total      unit  progress
>                Compactionproduction_analytics    interactions
> 240410213    161172668724     bytes     0.15%
>
> Compactionproduction_decisionsdecisions.decisions_q_idx
> 120815385       226295183     bytes    53.39%
> Active compaction remaining time :   2h39m58s
>
> Worth mentioning that compactions haven't been running on this node
> particularly often. The node's been performing badly regardless of whether
> it's compacting or not.
>
> On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:
>
>> What’s your output of `nodetool compactionstats`?
>>
>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com>
>> wrote:
>>
>> Hi,
>>
>> We recently added a new node to our cluster in order to replace a node
>> that died (hardware failure we believe). For the next two weeks it had high
>> disk and network activity. We replaced the server, but it's happened again.
>> We've looked into memory allowances, disk performance, number of
>> connections, and all the nodetool stats, but can't find the cause of the
>> issue.
>>
>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>> comparison to the rest of the cluster, but that's likely a symptom, not a
>> cause.
>>
>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad node
>> (D) has less data.
>>
>> Disk Activity[2] and Network activity[3] on this node is far higher than
>> the rest.
>>
>> The only other difference this node has to the rest of the cluster is
>> that its on the ext4 filesystem, whereas the rest are ext3, but we've done
>> plenty of testing there and can't see how that would affect performance on
>> this node so much.
>>
>> Nothing of note in system.log.
>>
>> What should our next step be in trying to diagnose this issue?
>>
>> Best wishes,
>> Vic
>>
>> [0] `nodetool tpstats` output:
>>
>> Good node:
>>     Pool Name                    Active   Pending      Completed
>> Blocked  All time blocked
>>     ReadStage                         0         0       46311521
>> 0                 0
>>     RequestResponseStage              0         0       23817366
>> 0                 0
>>     MutationStage                     0         0       47389269
>> 0                 0
>>     ReadRepairStage                   0         0          11108
>> 0                 0
>>     ReplicateOnWriteStage             0         0              0
>> 0                 0
>>     GossipStage                       0         0        5259908
>> 0                 0
>>     CacheCleanupExecutor              0         0              0
>> 0                 0
>>     MigrationStage                    0         0             30
>> 0                 0
>>     MemoryMeter                       0         0          16563
>> 0                 0
>>     FlushWriter                       0         0          39637
>> 0                26
>>     ValidationExecutor                0         0          19013
>> 0                 0
>>     InternalResponseStage             0         0              9
>> 0                 0
>>     AntiEntropyStage                  0         0          38026
>> 0                 0
>>     MemtablePostFlusher               0         0          81740
>> 0                 0
>>     MiscStage                         0         0          19196
>> 0                 0
>>     PendingRangeCalculator            0         0             23
>> 0                 0
>>     CompactionExecutor                0         0          61629
>> 0                 0
>>     commitlog_archiver                0         0              0
>> 0                 0
>>     HintedHandoff                     0         0             63
>> 0                 0
>>
>>     Message type           Dropped
>>     RANGE_SLICE                  0
>>     READ_REPAIR                  0
>>     PAGED_RANGE                  0
>>     BINARY                       0
>>     READ                       640
>>     MUTATION                     0
>>     _TRACE                       0
>>     REQUEST_RESPONSE             0
>>     COUNTER_MUTATION             0
>>
>> Bad node:
>>     Pool Name                    Active   Pending      Completed
>> Blocked  All time blocked
>>     ReadStage                        32       113          52216
>> 0                 0
>>     RequestResponseStage              0         0           4167
>> 0                 0
>>     MutationStage                     0         0         127559
>> 0                 0
>>     ReadRepairStage                   0         0            125
>> 0                 0
>>     ReplicateOnWriteStage             0         0              0
>> 0                 0
>>     GossipStage                       0         0           9965
>> 0                 0
>>     CacheCleanupExecutor              0         0              0
>> 0                 0
>>     MigrationStage                    0         0              0
>> 0                 0
>>     MemoryMeter                       0         0             24
>> 0                 0
>>     FlushWriter                       0         0             27
>> 0                 1
>>     ValidationExecutor                0         0              0
>> 0                 0
>>     InternalResponseStage             0         0              0
>> 0                 0
>>     AntiEntropyStage                  0         0              0
>> 0                 0
>>     MemtablePostFlusher               0         0             96
>> 0                 0
>>     MiscStage                         0         0              0
>> 0                 0
>>     PendingRangeCalculator            0         0             10
>> 0                 0
>>     CompactionExecutor                1         1             73
>> 0                 0
>>     commitlog_archiver                0         0              0
>> 0                 0
>>     HintedHandoff                     0         0             15
>> 0                 0
>>
>>     Message type           Dropped
>>     RANGE_SLICE                130
>>     READ_REPAIR                  1
>>     PAGED_RANGE                  0
>>     BINARY                       0
>>     READ                     31032
>>     MUTATION                   865
>>     _TRACE                       0
>>     REQUEST_RESPONSE             7
>>     COUNTER_MUTATION             0
>>
>>
>> [1] `nodetool status` output:
>>
>>     Status=Up/Down
>>     |/ State=Normal/Leaving/Joining/Moving
>>     --  Address         Load       Tokens  Owns   Host
>> ID                               Rack
>>     UN  A (Good)        252.37 GB  256     23.0%
>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>     UN  B (Good)        245.91 GB  256     24.4%
>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>     UN  C (Good)        254.79 GB  256     23.7%
>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>     UN  D (Bad)         163.85 GB  256     28.8%
>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>
>> [2] Disk read/write ops:
>>
>>
>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>
>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>
>> [3] Network in/out:
>>
>>
>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>
>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>
>>
>>
>

Re: New node has high network and disk usage.

Posted by Anuj Wadehra <an...@yahoo.co.in>.
Hi Vickrum,
I would have proceeded with diagnosis as follows:
1. Analysis of sar report to check system health -cpu memory swap disk etc. System seems to be overloaded. This is evident from mutation drops.
2. Make sure that  all recommended Cassandra production settings available at Datastax site are applied ,disable zone reclaim and THP.
3.Run full Repair on bad node and check data size. Node is owner of maximum token range but has significant lower data.I doubt that bootstrapping happened properly.
4.Compactionstats shows 22 pending compactions. Try throttling compactions via reducing cincurent compactors or compaction throughput.
5.Analyze logs to make sure bootstrapping happened without errors.
6. Look for other common performance problems such as GC pauses to make sure that dropped mutations are not caused by GC pauses.

ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi<vi...@idioplatform.com> wrote:   # nodetool compactionstats
pending tasks: 22
          compaction type        keyspace           table       completed           total      unit  progress
               Compactionproduction_analytics    interactions       240410213    161172668724     bytes     0.15%
               Compactionproduction_decisionsdecisions.decisions_q_idx       120815385       226295183     bytes    53.39%
Active compaction remaining time :   2h39m58s

Worth mentioning that compactions haven't been running on this node particularly often. The node's been performing badly regardless of whether it's compacting or not.

On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:

What’s your output of `nodetool compactionstats`?

On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com> wrote:
Hi,

We recently added a new node to our cluster in order to replace a node that died (hardware failure we believe). For the next two weeks it had high disk and network activity. We replaced the server, but it's happened again. We've looked into memory allowances, disk performance, number of connections, and all the nodetool stats, but can't find the cause of the issue.

`nodetool tpstats`[0] shows a lot of active and pending threads, in comparison to the rest of the cluster, but that's likely a symptom, not a cause.

`nodetool status`[1] shows the cluster isn't quite balanced. The bad node (D) has less data.

Disk Activity[2] and Network activity[3] on this node is far higher than the rest.

The only other difference this node has to the rest of the cluster is that its on the ext4 filesystem, whereas the rest are ext3, but we've done plenty of testing there and can't see how that would affect performance on this node so much.

Nothing of note in system.log.

What should our next step be in trying to diagnose this issue?

Best wishes,
Vic

[0] `nodetool tpstats` output:

Good node:
    Pool Name                    Active   Pending      Completed   Blocked  All time blocked
    ReadStage                         0         0       46311521         0                 0
    RequestResponseStage              0         0       23817366         0                 0
    MutationStage                     0         0       47389269         0                 0
    ReadRepairStage                   0         0          11108         0                 0
    ReplicateOnWriteStage             0         0              0         0                 0
    GossipStage                       0         0        5259908         0                 0
    CacheCleanupExecutor              0         0              0         0                 0
    MigrationStage                    0         0             30         0                 0
    MemoryMeter                       0         0          16563         0                 0
    FlushWriter                       0         0          39637         0                26
    ValidationExecutor                0         0          19013         0                 0
    InternalResponseStage             0         0              9         0                 0
    AntiEntropyStage                  0         0          38026         0                 0
    MemtablePostFlusher               0         0          81740         0                 0
    MiscStage                         0         0          19196         0                 0
    PendingRangeCalculator            0         0             23         0                 0
    CompactionExecutor                0         0          61629         0                 0
    commitlog_archiver                0         0              0         0                 0
    HintedHandoff                     0         0             63         0                 0

    Message type           Dropped
    RANGE_SLICE                  0
    READ_REPAIR                  0
    PAGED_RANGE                  0
    BINARY                       0
    READ                       640
    MUTATION                     0
    _TRACE                       0
    REQUEST_RESPONSE             0
    COUNTER_MUTATION             0

Bad node:
    Pool Name                    Active   Pending      Completed   Blocked  All time blocked
    ReadStage                        32       113          52216         0                 0
    RequestResponseStage              0         0           4167         0                 0
    MutationStage                     0         0         127559         0                 0
    ReadRepairStage                   0         0            125         0                 0
    ReplicateOnWriteStage             0         0              0         0                 0
    GossipStage                       0         0           9965         0                 0
    CacheCleanupExecutor              0         0              0         0                 0
    MigrationStage                    0         0              0         0                 0
    MemoryMeter                       0         0             24         0                 0
    FlushWriter                       0         0             27         0                 1
    ValidationExecutor                0         0              0         0                 0
    InternalResponseStage             0         0              0         0                 0
    AntiEntropyStage                  0         0              0         0                 0
    MemtablePostFlusher               0         0             96         0                 0
    MiscStage                         0         0              0         0                 0
    PendingRangeCalculator            0         0             10         0                 0
    CompactionExecutor                1         1             73         0                 0
    commitlog_archiver                0         0              0         0                 0
    HintedHandoff                     0         0             15         0                 0

    Message type           Dropped
    RANGE_SLICE                130
    READ_REPAIR                  1
    PAGED_RANGE                  0
    BINARY                       0
    READ                     31032
    MUTATION                   865
    _TRACE                       0
    REQUEST_RESPONSE             7
    COUNTER_MUTATION             0


[1] `nodetool status` output:

    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address         Load       Tokens  Owns   Host ID                               Rack
    UN  A (Good)        252.37 GB  256     23.0%  9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
    UN  B (Good)        245.91 GB  256     24.4%  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
    UN  C (Good)        254.79 GB  256     23.7%  f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
    UN  D (Bad)         163.85 GB  256     28.8%  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1

[2] Disk read/write ops:

    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png

[3] Network in/out:

    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
    https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png




  

Re: New node has high network and disk usage.

Posted by Vickrum Loi <vi...@idioplatform.com>.
# nodetool compactionstats
pending tasks: 22
          compaction type        keyspace           table
completed           total      unit  progress
               Compactionproduction_analytics    interactions
240410213    161172668724     bytes     0.15%

Compactionproduction_decisionsdecisions.decisions_q_idx
120815385       226295183     bytes    53.39%
Active compaction remaining time :   2h39m58s

Worth mentioning that compactions haven't been running on this node
particularly often. The node's been performing badly regardless of whether
it's compacting or not.

On 6 January 2016 at 16:35, Jeff Ferland <jb...@tubularlabs.com> wrote:

> What’s your output of `nodetool compactionstats`?
>
> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com>
> wrote:
>
> Hi,
>
> We recently added a new node to our cluster in order to replace a node
> that died (hardware failure we believe). For the next two weeks it had high
> disk and network activity. We replaced the server, but it's happened again.
> We've looked into memory allowances, disk performance, number of
> connections, and all the nodetool stats, but can't find the cause of the
> issue.
>
> `nodetool tpstats`[0] shows a lot of active and pending threads, in
> comparison to the rest of the cluster, but that's likely a symptom, not a
> cause.
>
> `nodetool status`[1] shows the cluster isn't quite balanced. The bad node
> (D) has less data.
>
> Disk Activity[2] and Network activity[3] on this node is far higher than
> the rest.
>
> The only other difference this node has to the rest of the cluster is that
> its on the ext4 filesystem, whereas the rest are ext3, but we've done
> plenty of testing there and can't see how that would affect performance on
> this node so much.
>
> Nothing of note in system.log.
>
> What should our next step be in trying to diagnose this issue?
>
> Best wishes,
> Vic
>
> [0] `nodetool tpstats` output:
>
> Good node:
>     Pool Name                    Active   Pending      Completed
> Blocked  All time blocked
>     ReadStage                         0         0       46311521
> 0                 0
>     RequestResponseStage              0         0       23817366
> 0                 0
>     MutationStage                     0         0       47389269
> 0                 0
>     ReadRepairStage                   0         0          11108
> 0                 0
>     ReplicateOnWriteStage             0         0              0
> 0                 0
>     GossipStage                       0         0        5259908
> 0                 0
>     CacheCleanupExecutor              0         0              0
> 0                 0
>     MigrationStage                    0         0             30
> 0                 0
>     MemoryMeter                       0         0          16563
> 0                 0
>     FlushWriter                       0         0          39637
> 0                26
>     ValidationExecutor                0         0          19013
> 0                 0
>     InternalResponseStage             0         0              9
> 0                 0
>     AntiEntropyStage                  0         0          38026
> 0                 0
>     MemtablePostFlusher               0         0          81740
> 0                 0
>     MiscStage                         0         0          19196
> 0                 0
>     PendingRangeCalculator            0         0             23
> 0                 0
>     CompactionExecutor                0         0          61629
> 0                 0
>     commitlog_archiver                0         0              0
> 0                 0
>     HintedHandoff                     0         0             63
> 0                 0
>
>     Message type           Dropped
>     RANGE_SLICE                  0
>     READ_REPAIR                  0
>     PAGED_RANGE                  0
>     BINARY                       0
>     READ                       640
>     MUTATION                     0
>     _TRACE                       0
>     REQUEST_RESPONSE             0
>     COUNTER_MUTATION             0
>
> Bad node:
>     Pool Name                    Active   Pending      Completed
> Blocked  All time blocked
>     ReadStage                        32       113          52216
> 0                 0
>     RequestResponseStage              0         0           4167
> 0                 0
>     MutationStage                     0         0         127559
> 0                 0
>     ReadRepairStage                   0         0            125
> 0                 0
>     ReplicateOnWriteStage             0         0              0
> 0                 0
>     GossipStage                       0         0           9965
> 0                 0
>     CacheCleanupExecutor              0         0              0
> 0                 0
>     MigrationStage                    0         0              0
> 0                 0
>     MemoryMeter                       0         0             24
> 0                 0
>     FlushWriter                       0         0             27
> 0                 1
>     ValidationExecutor                0         0              0
> 0                 0
>     InternalResponseStage             0         0              0
> 0                 0
>     AntiEntropyStage                  0         0              0
> 0                 0
>     MemtablePostFlusher               0         0             96
> 0                 0
>     MiscStage                         0         0              0
> 0                 0
>     PendingRangeCalculator            0         0             10
> 0                 0
>     CompactionExecutor                1         1             73
> 0                 0
>     commitlog_archiver                0         0              0
> 0                 0
>     HintedHandoff                     0         0             15
> 0                 0
>
>     Message type           Dropped
>     RANGE_SLICE                130
>     READ_REPAIR                  1
>     PAGED_RANGE                  0
>     BINARY                       0
>     READ                     31032
>     MUTATION                   865
>     _TRACE                       0
>     REQUEST_RESPONSE             7
>     COUNTER_MUTATION             0
>
>
> [1] `nodetool status` output:
>
>     Status=Up/Down
>     |/ State=Normal/Leaving/Joining/Moving
>     --  Address         Load       Tokens  Owns   Host
> ID                               Rack
>     UN  A (Good)        252.37 GB  256     23.0%
> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>     UN  B (Good)        245.91 GB  256     24.4%
> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>     UN  C (Good)        254.79 GB  256     23.7%
> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>     UN  D (Bad)         163.85 GB  256     28.8%
> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>
> [2] Disk read/write ops:
>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>
> [3] Network in/out:
>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>
>
>

Re: New node has high network and disk usage.

Posted by Jeff Ferland <jb...@tubularlabs.com>.
What’s your output of `nodetool compactionstats`?

> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vi...@idioplatform.com> wrote:
> 
> Hi,
> 
> We recently added a new node to our cluster in order to replace a node that died (hardware failure we believe). For the next two weeks it had high disk and network activity. We replaced the server, but it's happened again. We've looked into memory allowances, disk performance, number of connections, and all the nodetool stats, but can't find the cause of the issue.
> 
> `nodetool tpstats`[0] shows a lot of active and pending threads, in comparison to the rest of the cluster, but that's likely a symptom, not a cause.
> 
> `nodetool status`[1] shows the cluster isn't quite balanced. The bad node (D) has less data.
> 
> Disk Activity[2] and Network activity[3] on this node is far higher than the rest.
> 
> The only other difference this node has to the rest of the cluster is that its on the ext4 filesystem, whereas the rest are ext3, but we've done plenty of testing there and can't see how that would affect performance on this node so much.
> 
> Nothing of note in system.log.
> 
> What should our next step be in trying to diagnose this issue?
> 
> Best wishes,
> Vic
> 
> [0] `nodetool tpstats` output:
> 
> Good node:
>     Pool Name                    Active   Pending      Completed   Blocked  All time blocked
>     ReadStage                         0         0       46311521         0                 0
>     RequestResponseStage              0         0       23817366         0                 0
>     MutationStage                     0         0       47389269         0                 0
>     ReadRepairStage                   0         0          11108         0                 0
>     ReplicateOnWriteStage             0         0              0         0                 0
>     GossipStage                       0         0        5259908         0                 0
>     CacheCleanupExecutor              0         0              0         0                 0
>     MigrationStage                    0         0             30         0                 0
>     MemoryMeter                       0         0          16563         0                 0
>     FlushWriter                       0         0          39637         0                26
>     ValidationExecutor                0         0          19013         0                 0
>     InternalResponseStage             0         0              9         0                 0
>     AntiEntropyStage                  0         0          38026         0                 0
>     MemtablePostFlusher               0         0          81740         0                 0
>     MiscStage                         0         0          19196         0                 0
>     PendingRangeCalculator            0         0             23         0                 0
>     CompactionExecutor                0         0          61629         0                 0
>     commitlog_archiver                0         0              0         0                 0
>     HintedHandoff                     0         0             63         0                 0
> 
>     Message type           Dropped
>     RANGE_SLICE                  0
>     READ_REPAIR                  0
>     PAGED_RANGE                  0
>     BINARY                       0
>     READ                       640
>     MUTATION                     0
>     _TRACE                       0
>     REQUEST_RESPONSE             0
>     COUNTER_MUTATION             0
> 
> Bad node:
>     Pool Name                    Active   Pending      Completed   Blocked  All time blocked
>     ReadStage                        32       113          52216         0                 0
>     RequestResponseStage              0         0           4167         0                 0
>     MutationStage                     0         0         127559         0                 0
>     ReadRepairStage                   0         0            125         0                 0
>     ReplicateOnWriteStage             0         0              0         0                 0
>     GossipStage                       0         0           9965         0                 0
>     CacheCleanupExecutor              0         0              0         0                 0
>     MigrationStage                    0         0              0         0                 0
>     MemoryMeter                       0         0             24         0                 0
>     FlushWriter                       0         0             27         0                 1
>     ValidationExecutor                0         0              0         0                 0
>     InternalResponseStage             0         0              0         0                 0
>     AntiEntropyStage                  0         0              0         0                 0
>     MemtablePostFlusher               0         0             96         0                 0
>     MiscStage                         0         0              0         0                 0
>     PendingRangeCalculator            0         0             10         0                 0
>     CompactionExecutor                1         1             73         0                 0
>     commitlog_archiver                0         0              0         0                 0
>     HintedHandoff                     0         0             15         0                 0
> 
>     Message type           Dropped
>     RANGE_SLICE                130
>     READ_REPAIR                  1
>     PAGED_RANGE                  0
>     BINARY                       0
>     READ                     31032
>     MUTATION                   865
>     _TRACE                       0
>     REQUEST_RESPONSE             7
>     COUNTER_MUTATION             0
> 
> 
> [1] `nodetool status` output:
> 
>     Status=Up/Down
>     |/ State=Normal/Leaving/Joining/Moving
>     --  Address         Load       Tokens  Owns   Host ID                               Rack
>     UN  A (Good)        252.37 GB  256     23.0%  9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>     UN  B (Good)        245.91 GB  256     24.4%  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>     UN  C (Good)        254.79 GB  256     23.7%  f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>     UN  D (Bad)         163.85 GB  256     28.8%  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
> 
> [2] Disk read/write ops:
> 
>     https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png>
>     https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png>
> 
> [3] Network in/out:
> 
>     https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png>
>     https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png>


Re: New node has high network and disk usage.

Posted by Vickrum Loi <vi...@idioplatform.com>.
I should probably have mentioned that we're on Cassandra 2.0.10.

On 6 January 2016 at 15:26, Vickrum Loi <vi...@idioplatform.com>
wrote:

> Hi,
>
> We recently added a new node to our cluster in order to replace a node
> that died (hardware failure we believe). For the next two weeks it had high
> disk and network activity. We replaced the server, but it's happened again.
> We've looked into memory allowances, disk performance, number of
> connections, and all the nodetool stats, but can't find the cause of the
> issue.
>
> `nodetool tpstats`[0] shows a lot of active and pending threads, in
> comparison to the rest of the cluster, but that's likely a symptom, not a
> cause.
>
> `nodetool status`[1] shows the cluster isn't quite balanced. The bad node
> (D) has less data.
>
> Disk Activity[2] and Network activity[3] on this node is far higher than
> the rest.
>
> The only other difference this node has to the rest of the cluster is that
> its on the ext4 filesystem, whereas the rest are ext3, but we've done
> plenty of testing there and can't see how that would affect performance on
> this node so much.
>
> Nothing of note in system.log.
>
> What should our next step be in trying to diagnose this issue?
>
> Best wishes,
> Vic
>
> [0] `nodetool tpstats` output:
>
> Good node:
>     Pool Name                    Active   Pending      Completed
> Blocked  All time blocked
>     ReadStage                         0         0       46311521
> 0                 0
>     RequestResponseStage              0         0       23817366
> 0                 0
>     MutationStage                     0         0       47389269
> 0                 0
>     ReadRepairStage                   0         0          11108
> 0                 0
>     ReplicateOnWriteStage             0         0              0
> 0                 0
>     GossipStage                       0         0        5259908
> 0                 0
>     CacheCleanupExecutor              0         0              0
> 0                 0
>     MigrationStage                    0         0             30
> 0                 0
>     MemoryMeter                       0         0          16563
> 0                 0
>     FlushWriter                       0         0          39637
> 0                26
>     ValidationExecutor                0         0          19013
> 0                 0
>     InternalResponseStage             0         0              9
> 0                 0
>     AntiEntropyStage                  0         0          38026
> 0                 0
>     MemtablePostFlusher               0         0          81740
> 0                 0
>     MiscStage                         0         0          19196
> 0                 0
>     PendingRangeCalculator            0         0             23
> 0                 0
>     CompactionExecutor                0         0          61629
> 0                 0
>     commitlog_archiver                0         0              0
> 0                 0
>     HintedHandoff                     0         0             63
> 0                 0
>
>     Message type           Dropped
>     RANGE_SLICE                  0
>     READ_REPAIR                  0
>     PAGED_RANGE                  0
>     BINARY                       0
>     READ                       640
>     MUTATION                     0
>     _TRACE                       0
>     REQUEST_RESPONSE             0
>     COUNTER_MUTATION             0
>
> Bad node:
>     Pool Name                    Active   Pending      Completed
> Blocked  All time blocked
>     ReadStage                        32       113          52216
> 0                 0
>     RequestResponseStage              0         0           4167
> 0                 0
>     MutationStage                     0         0         127559
> 0                 0
>     ReadRepairStage                   0         0            125
> 0                 0
>     ReplicateOnWriteStage             0         0              0
> 0                 0
>     GossipStage                       0         0           9965
> 0                 0
>     CacheCleanupExecutor              0         0              0
> 0                 0
>     MigrationStage                    0         0              0
> 0                 0
>     MemoryMeter                       0         0             24
> 0                 0
>     FlushWriter                       0         0             27
> 0                 1
>     ValidationExecutor                0         0              0
> 0                 0
>     InternalResponseStage             0         0              0
> 0                 0
>     AntiEntropyStage                  0         0              0
> 0                 0
>     MemtablePostFlusher               0         0             96
> 0                 0
>     MiscStage                         0         0              0
> 0                 0
>     PendingRangeCalculator            0         0             10
> 0                 0
>     CompactionExecutor                1         1             73
> 0                 0
>     commitlog_archiver                0         0              0
> 0                 0
>     HintedHandoff                     0         0             15
> 0                 0
>
>     Message type           Dropped
>     RANGE_SLICE                130
>     READ_REPAIR                  1
>     PAGED_RANGE                  0
>     BINARY                       0
>     READ                     31032
>     MUTATION                   865
>     _TRACE                       0
>     REQUEST_RESPONSE             7
>     COUNTER_MUTATION             0
>
>
> [1] `nodetool status` output:
>
>     Status=Up/Down
>     |/ State=Normal/Leaving/Joining/Moving
>     --  Address         Load       Tokens  Owns   Host
> ID                               Rack
>     UN  A (Good)        252.37 GB  256     23.0%
> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>     UN  B (Good)        245.91 GB  256     24.4%
> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>     UN  C (Good)        254.79 GB  256     23.7%
> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>     UN  D (Bad)         163.85 GB  256     28.8%
> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>
> [2] Disk read/write ops:
>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>
> [3] Network in/out:
>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>