You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by varun saluja <sa...@gmail.com> on 2017/05/11 16:50:18 UTC
Dropped Mutation and Read messages.
Hi Experts,
Seeking your help on a production issue. We were running high write intensive job on our 3 node cassandra cluster V 2.1.7.
TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29.
System log reports:
INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms
INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms
The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload
Are these hintedhandoff mutations? Can we stop these.
Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity.
Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help.
PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops.
Thanks and Regards,
Varun Saluja
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org
Re: Dropped Mutation and Read messages.
Posted by Oskar Kjellin <os...@gmail.com>.
Indeed, sorry. Subscribed to both so missed which one this was.
Sent from my iPhone
> On 11 May 2017, at 19:56, Michael Kjellman <mk...@internalcircle.com> wrote:
>
> This discussion should be on the C* user mailing list. Thanks!
>
> best,
> kjellman
>
>> On May 11, 2017, at 10:53 AM, Oskar Kjellin <os...@gmail.com> wrote:
>>
>> That seems way too low. Depending on what type of disk you have it should be closer to 1-200MB.
>> That's probably causing your problems. It would still take a while for you to compact all your data tho
>>
>> Sent from my iPhone
>>
>>> On 11 May 2017, at 19:50, varun saluja <sa...@gmail.com> wrote:
>>>
>>> nodetool getcompactionthrougput
>>>
>>> ./nodetool getcompactionthroughput
>>> Current compaction throughput: 16 MB/s
>>>
>>> Regards,
>>> Varun Saluja
>>>
>>>> On 11 May 2017 at 23:18, varun saluja <sa...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> PFB results for same. Numbers are scary here.
>>>>
>>>> [root@WA-CASSDB2 bin]# ./nodetool compactionstats
>>>> pending tasks: 137
>>>> compaction type keyspace table completed total unit progress
>>>> Compaction system hints 5762711108 837522028005 bytes 0.69%
>>>> Compaction walletkeyspace user_txn_history_v2 101477894 4722068388 bytes 2.15%
>>>> Compaction walletkeyspace user_txn_history_v2 1511866634 753221762663 bytes 0.20%
>>>> Compaction walletkeyspace user_txn_history_v2 3664734135 18605501268 bytes 19.70%
>>>> Active compaction remaining time : 26h32m28s
>>>>
>>>>
>>>>
>>>>> On 11 May 2017 at 23:15, Oskar Kjellin <os...@gmail.com> wrote:
>>>>> What does nodetool compactionstats show?
>>>>>
>>>>> I meant compaction throttling. nodetool getcompactionthrougput
>>>>>
>>>>>
>>>>>> On 11 May 2017, at 19:41, varun saluja <sa...@gmail.com> wrote:
>>>>>>
>>>>>> Hi Oskar,
>>>>>>
>>>>>> Thanks for response.
>>>>>>
>>>>>> Yes, could see lot of threads for compaction. Actually we are loading around 400GB data per node on 3 node cassandra cluster.
>>>>>> Throttling was set to write around 7k TPS per node. Job ran fine for 2 days and then, we start getting Mutation drops , longer GC and very high load on system.
>>>>>>
>>>>>> System log reports:
>>>>>> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) off-heap
>>>>>>
>>>>>> The job was stopped 12 hours back. But, still these failures can be seen. Can you Please let me know how shall i proceed further. If possible, Please suggest some parameters for high write intensive jobs.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Varun Saluja
>>>>>>
>>>>>>
>>>>>>> On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:
>>>>>>> Do you have a lot of compactions going on? It sounds like you might've built up a huge backlog. Is your throttling configured properly?
>>>>>>>
>>>>>>>> On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Experts,
>>>>>>>>
>>>>>>>> Seeking your help on a production issue. We were running high write intensive job on our 3 node cassandra cluster V 2.1.7.
>>>>>>>>
>>>>>>>> TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29.
>>>>>>>>
>>>>>>>> System log reports:
>>>>>>>>
>>>>>>>> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>>>>>>>> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>>>>>>>> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms
>>>>>>>>
>>>>>>>> The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload
>>>>>>>>
>>>>>>>> Are these hintedhandoff mutations? Can we stop these.
>>>>>>>> Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity.
>>>>>>>>
>>>>>>>> Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help.
>>>>>>>>
>>>>>>>> PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops.
>>>>>>>>
>>>>>>>> Thanks and Regards,
>>>>>>>> Varun Saluja
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>
>>>>>>
>>>>
>>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org
Re: Dropped Mutation and Read messages.
Posted by Michael Kjellman <mk...@internalcircle.com>.
This discussion should be on the C* user mailing list. Thanks!
best,
kjellman
> On May 11, 2017, at 10:53 AM, Oskar Kjellin <os...@gmail.com> wrote:
>
> That seems way too low. Depending on what type of disk you have it should be closer to 1-200MB.
> That's probably causing your problems. It would still take a while for you to compact all your data tho
>
> Sent from my iPhone
>
>> On 11 May 2017, at 19:50, varun saluja <sa...@gmail.com> wrote:
>>
>> nodetool getcompactionthrougput
>>
>> ./nodetool getcompactionthroughput
>> Current compaction throughput: 16 MB/s
>>
>> Regards,
>> Varun Saluja
>>
>>> On 11 May 2017 at 23:18, varun saluja <sa...@gmail.com> wrote:
>>> Hi,
>>>
>>> PFB results for same. Numbers are scary here.
>>>
>>> [root@WA-CASSDB2 bin]# ./nodetool compactionstats
>>> pending tasks: 137
>>> compaction type keyspace table completed total unit progress
>>> Compaction system hints 5762711108 837522028005 bytes 0.69%
>>> Compaction walletkeyspace user_txn_history_v2 101477894 4722068388 bytes 2.15%
>>> Compaction walletkeyspace user_txn_history_v2 1511866634 753221762663 bytes 0.20%
>>> Compaction walletkeyspace user_txn_history_v2 3664734135 18605501268 bytes 19.70%
>>> Active compaction remaining time : 26h32m28s
>>>
>>>
>>>
>>>> On 11 May 2017 at 23:15, Oskar Kjellin <os...@gmail.com> wrote:
>>>> What does nodetool compactionstats show?
>>>>
>>>> I meant compaction throttling. nodetool getcompactionthrougput
>>>>
>>>>
>>>>> On 11 May 2017, at 19:41, varun saluja <sa...@gmail.com> wrote:
>>>>>
>>>>> Hi Oskar,
>>>>>
>>>>> Thanks for response.
>>>>>
>>>>> Yes, could see lot of threads for compaction. Actually we are loading around 400GB data per node on 3 node cassandra cluster.
>>>>> Throttling was set to write around 7k TPS per node. Job ran fine for 2 days and then, we start getting Mutation drops , longer GC and very high load on system.
>>>>>
>>>>> System log reports:
>>>>> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) off-heap
>>>>>
>>>>> The job was stopped 12 hours back. But, still these failures can be seen. Can you Please let me know how shall i proceed further. If possible, Please suggest some parameters for high write intensive jobs.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Varun Saluja
>>>>>
>>>>>
>>>>>> On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:
>>>>>> Do you have a lot of compactions going on? It sounds like you might've built up a huge backlog. Is your throttling configured properly?
>>>>>>
>>>>>>> On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Experts,
>>>>>>>
>>>>>>> Seeking your help on a production issue. We were running high write intensive job on our 3 node cassandra cluster V 2.1.7.
>>>>>>>
>>>>>>> TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29.
>>>>>>>
>>>>>>> System log reports:
>>>>>>>
>>>>>>> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>>>>>>> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>>>>>>> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms
>>>>>>>
>>>>>>> The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload
>>>>>>>
>>>>>>> Are these hintedhandoff mutations? Can we stop these.
>>>>>>> Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity.
>>>>>>>
>>>>>>> Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help.
>>>>>>>
>>>>>>> PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops.
>>>>>>>
>>>>>>> Thanks and Regards,
>>>>>>> Varun Saluja
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>
>>>>>
>>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org
Re: Dropped Mutation and Read messages.
Posted by Oskar Kjellin <os...@gmail.com>.
That seems way too low. Depending on what type of disk you have it should be closer to 1-200MB.
That's probably causing your problems. It would still take a while for you to compact all your data tho
Sent from my iPhone
> On 11 May 2017, at 19:50, varun saluja <sa...@gmail.com> wrote:
>
> nodetool getcompactionthrougput
>
> ./nodetool getcompactionthroughput
> Current compaction throughput: 16 MB/s
>
> Regards,
> Varun Saluja
>
>> On 11 May 2017 at 23:18, varun saluja <sa...@gmail.com> wrote:
>> Hi,
>>
>> PFB results for same. Numbers are scary here.
>>
>> [root@WA-CASSDB2 bin]# ./nodetool compactionstats
>> pending tasks: 137
>> compaction type keyspace table completed total unit progress
>> Compaction system hints 5762711108 837522028005 bytes 0.69%
>> Compaction walletkeyspace user_txn_history_v2 101477894 4722068388 bytes 2.15%
>> Compaction walletkeyspace user_txn_history_v2 1511866634 753221762663 bytes 0.20%
>> Compaction walletkeyspace user_txn_history_v2 3664734135 18605501268 bytes 19.70%
>> Active compaction remaining time : 26h32m28s
>>
>>
>>
>>> On 11 May 2017 at 23:15, Oskar Kjellin <os...@gmail.com> wrote:
>>> What does nodetool compactionstats show?
>>>
>>> I meant compaction throttling. nodetool getcompactionthrougput
>>>
>>>
>>>> On 11 May 2017, at 19:41, varun saluja <sa...@gmail.com> wrote:
>>>>
>>>> Hi Oskar,
>>>>
>>>> Thanks for response.
>>>>
>>>> Yes, could see lot of threads for compaction. Actually we are loading around 400GB data per node on 3 node cassandra cluster.
>>>> Throttling was set to write around 7k TPS per node. Job ran fine for 2 days and then, we start getting Mutation drops , longer GC and very high load on system.
>>>>
>>>> System log reports:
>>>> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) off-heap
>>>>
>>>> The job was stopped 12 hours back. But, still these failures can be seen. Can you Please let me know how shall i proceed further. If possible, Please suggest some parameters for high write intensive jobs.
>>>>
>>>>
>>>> Regards,
>>>> Varun Saluja
>>>>
>>>>
>>>>> On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:
>>>>> Do you have a lot of compactions going on? It sounds like you might've built up a huge backlog. Is your throttling configured properly?
>>>>>
>>>>> > On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
>>>>> >
>>>>> > Hi Experts,
>>>>> >
>>>>> > Seeking your help on a production issue. We were running high write intensive job on our 3 node cassandra cluster V 2.1.7.
>>>>> >
>>>>> > TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29.
>>>>> >
>>>>> > System log reports:
>>>>> >
>>>>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>>>>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>>>>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms
>>>>> >
>>>>> > The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload
>>>>> >
>>>>> > Are these hintedhandoff mutations? Can we stop these.
>>>>> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity.
>>>>> >
>>>>> > Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help.
>>>>> >
>>>>> > PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops.
>>>>> >
>>>>> > Thanks and Regards,
>>>>> > Varun Saluja
>>>>> >
>>>>> > ---------------------------------------------------------------------
>>>>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> >
>>>>
>>
>
Re: Dropped Mutation and Read messages.
Posted by varun saluja <sa...@gmail.com>.
*nodetool getcompactionthrougput*
./nodetool getcompactionthroughput
Current compaction throughput: 16 MB/s
Regards,
Varun Saluja
On 11 May 2017 at 23:18, varun saluja <sa...@gmail.com> wrote:
> Hi,
>
> PFB results for same. Numbers are scary here.
>
> [root@WA-CASSDB2 bin]# ./nodetool compactionstats
> pending tasks: 137
> compaction type keyspace table completed
> total unit progress
> Compaction system hints 5762711108
> 837522028005 bytes 0.69%
> Compaction walletkeyspace user_txn_history_v2 101477894
> 4722068388 bytes 2.15%
> Compaction walletkeyspace user_txn_history_v2 1511866634
> 753221762663 bytes 0.20%
> Compaction walletkeyspace user_txn_history_v2 3664734135
> 18605501268 bytes 19.70%
> Active compaction remaining time : *26h32m28s*
>
>
>
> On 11 May 2017 at 23:15, Oskar Kjellin <os...@gmail.com> wrote:
>
>> What does nodetool compactionstats show?
>>
>> I meant compaction throttling. nodetool getcompactionthrougput
>>
>>
>> On 11 May 2017, at 19:41, varun saluja <sa...@gmail.com> wrote:
>>
>> Hi Oskar,
>>
>> Thanks for response.
>>
>> Yes, could see lot of threads for compaction. Actually we are loading
>> around 400GB data per node on 3 node cassandra cluster.
>> Throttling was set to write around 7k TPS per node. Job ran fine for 2
>> days and then, we start getting Mutation drops , longer GC and very high
>> load on system.
>>
>> System log reports:
>> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%)
>> off-heap
>>
>> The job was stopped 12 hours back. But, still these failures can be
>> seen. Can you Please let me know how shall i proceed further. If possible,
>> Please suggest some parameters for high write intensive jobs.
>>
>>
>> Regards,
>> Varun Saluja
>>
>>
>> On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:
>>
>>> Do you have a lot of compactions going on? It sounds like you might've
>>> built up a huge backlog. Is your throttling configured properly?
>>>
>>> > On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
>>> >
>>> > Hi Experts,
>>> >
>>> > Seeking your help on a production issue. We were running high write
>>> intensive job on our 3 node cassandra cluster V 2.1.7.
>>> >
>>> > TPS on nodes were high. Job ran for more than 2 days and thereafter,
>>> loadavg on 1 of the node increased to very high number like loadavg : 29.
>>> >
>>> > System log reports:
>>> >
>>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466
>>> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466
>>> MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466
>>> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last
>>> 5000ms
>>> >
>>> > The job was stopped due to heavy load. But sill after 12 hours , we
>>> can see mutation drops messages and sudden increase on avgload
>>> >
>>> > Are these hintedhandoff mutations? Can we stop these.
>>> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show
>>> any load or any such activity.
>>> >
>>> > Due to heavy load and GC , there are intermittent gossip failures
>>> among node. Can you someone Please help.
>>> >
>>> > PS: Load job was stopped on cluster. Everything ran fine for few hours
>>> and and Later issue started again like mutation messages drops.
>>> >
>>> > Thanks and Regards,
>>> > Varun Saluja
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>>> >
>>>
>>
>>
>
Re: Dropped Mutation and Read messages.
Posted by varun saluja <sa...@gmail.com>.
Hi,
PFB results for same. Numbers are scary here.
[root@WA-CASSDB2 bin]# ./nodetool compactionstats
pending tasks: 137
compaction type keyspace table completed
total unit progress
Compaction system hints 5762711108
837522028005 bytes 0.69%
Compaction walletkeyspace user_txn_history_v2 101477894
4722068388 bytes 2.15%
Compaction walletkeyspace user_txn_history_v2 1511866634
753221762663 bytes 0.20%
Compaction walletkeyspace user_txn_history_v2 3664734135
18605501268 bytes 19.70%
Active compaction remaining time : *26h32m28s*
On 11 May 2017 at 23:15, Oskar Kjellin <os...@gmail.com> wrote:
> What does nodetool compactionstats show?
>
> I meant compaction throttling. nodetool getcompactionthrougput
>
>
> On 11 May 2017, at 19:41, varun saluja <sa...@gmail.com> wrote:
>
> Hi Oskar,
>
> Thanks for response.
>
> Yes, could see lot of threads for compaction. Actually we are loading
> around 400GB data per node on 3 node cassandra cluster.
> Throttling was set to write around 7k TPS per node. Job ran fine for 2
> days and then, we start getting Mutation drops , longer GC and very high
> load on system.
>
> System log reports:
> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%)
> off-heap
>
> The job was stopped 12 hours back. But, still these failures can be seen.
> Can you Please let me know how shall i proceed further. If possible, Please
> suggest some parameters for high write intensive jobs.
>
>
> Regards,
> Varun Saluja
>
>
> On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:
>
>> Do you have a lot of compactions going on? It sounds like you might've
>> built up a huge backlog. Is your throttling configured properly?
>>
>> > On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
>> >
>> > Hi Experts,
>> >
>> > Seeking your help on a production issue. We were running high write
>> intensive job on our 3 node cassandra cluster V 2.1.7.
>> >
>> > TPS on nodes were high. Job ran for more than 2 days and thereafter,
>> loadavg on 1 of the node increased to very high number like loadavg : 29.
>> >
>> > System log reports:
>> >
>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466
>> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466
>> MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466
>> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last
>> 5000ms
>> >
>> > The job was stopped due to heavy load. But sill after 12 hours , we can
>> see mutation drops messages and sudden increase on avgload
>> >
>> > Are these hintedhandoff mutations? Can we stop these.
>> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show
>> any load or any such activity.
>> >
>> > Due to heavy load and GC , there are intermittent gossip failures among
>> node. Can you someone Please help.
>> >
>> > PS: Load job was stopped on cluster. Everything ran fine for few hours
>> and and Later issue started again like mutation messages drops.
>> >
>> > Thanks and Regards,
>> > Varun Saluja
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>> >
>>
>
>
Re: Dropped Mutation and Read messages.
Posted by Oskar Kjellin <os...@gmail.com>.
What does nodetool compactionstats show?
I meant compaction throttling. nodetool getcompactionthrougput
> On 11 May 2017, at 19:41, varun saluja <sa...@gmail.com> wrote:
>
> Hi Oskar,
>
> Thanks for response.
>
> Yes, could see lot of threads for compaction. Actually we are loading around 400GB data per node on 3 node cassandra cluster.
> Throttling was set to write around 7k TPS per node. Job ran fine for 2 days and then, we start getting Mutation drops , longer GC and very high load on system.
>
> System log reports:
> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) off-heap
>
> The job was stopped 12 hours back. But, still these failures can be seen. Can you Please let me know how shall i proceed further. If possible, Please suggest some parameters for high write intensive jobs.
>
>
> Regards,
> Varun Saluja
>
>
>> On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:
>> Do you have a lot of compactions going on? It sounds like you might've built up a huge backlog. Is your throttling configured properly?
>>
>> > On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
>> >
>> > Hi Experts,
>> >
>> > Seeking your help on a production issue. We were running high write intensive job on our 3 node cassandra cluster V 2.1.7.
>> >
>> > TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29.
>> >
>> > System log reports:
>> >
>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms
>> >
>> > The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload
>> >
>> > Are these hintedhandoff mutations? Can we stop these.
>> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity.
>> >
>> > Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help.
>> >
>> > PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops.
>> >
>> > Thanks and Regards,
>> > Varun Saluja
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>> >
>
Re: Dropped Mutation and Read messages.
Posted by varun saluja <sa...@gmail.com>.
Hi Oskar,
Thanks for response.
Yes, could see lot of threads for compaction. Actually we are loading
around 400GB data per node on 3 node cassandra cluster.
Throttling was set to write around 7k TPS per node. Job ran fine for 2 days
and then, we start getting Mutation drops , longer GC and very high load
on system.
System log reports:
Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%)
off-heap
The job was stopped 12 hours back. But, still these failures can be seen.
Can you Please let me know how shall i proceed further. If possible, Please
suggest some parameters for high write intensive jobs.
Regards,
Varun Saluja
On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:
> Do you have a lot of compactions going on? It sounds like you might've
> built up a huge backlog. Is your throttling configured properly?
>
> > On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
> >
> > Hi Experts,
> >
> > Seeking your help on a production issue. We were running high write
> intensive job on our 3 node cassandra cluster V 2.1.7.
> >
> > TPS on nodes were high. Job ran for more than 2 days and thereafter,
> loadavg on 1 of the node increased to very high number like loadavg : 29.
> >
> > System log reports:
> >
> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466
> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466
> MessagingService.java:888 - 2 READ messages dropped in last 5000ms
> > INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466
> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last
> 5000ms
> >
> > The job was stopped due to heavy load. But sill after 12 hours , we can
> see mutation drops messages and sudden increase on avgload
> >
> > Are these hintedhandoff mutations? Can we stop these.
> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show
> any load or any such activity.
> >
> > Due to heavy load and GC , there are intermittent gossip failures among
> node. Can you someone Please help.
> >
> > PS: Load job was stopped on cluster. Everything ran fine for few hours
> and and Later issue started again like mutation messages drops.
> >
> > Thanks and Regards,
> > Varun Saluja
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
Re: Dropped Mutation and Read messages.
Posted by Oskar Kjellin <os...@gmail.com>.
Do you have a lot of compactions going on? It sounds like you might've built up a huge backlog. Is your throttling configured properly?
> On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
>
> Hi Experts,
>
> Seeking your help on a production issue. We were running high write intensive job on our 3 node cassandra cluster V 2.1.7.
>
> TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29.
>
> System log reports:
>
> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms
> INFO [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms
>
> The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload
>
> Are these hintedhandoff mutations? Can we stop these.
> Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity.
>
> Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help.
>
> PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops.
>
> Thanks and Regards,
> Varun Saluja
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org