You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by varun saluja <sa...@gmail.com> on 2017/05/11 16:50:18 UTC

Dropped Mutation and Read messages.

Hi Experts,

Seeking your help on a production issue.  We were running high write intensive job on our 3 node cassandra cluster V 2.1.7.

TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29.

System log reports:

INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms
INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms

The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload 

Are these hintedhandoff mutations? Can we stop these.
Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity.

Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help. 

PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops.

Thanks and Regards,
Varun Saluja

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Dropped Mutation and Read messages.

Posted by Oskar Kjellin <os...@gmail.com>.

Indeed, sorry. Subscribed to both so missed which one this was. 

Sent from my iPhone

> On 11 May 2017, at 19:56, Michael Kjellman <mk...@internalcircle.com> wrote:
> 
> This discussion should be on the C* user mailing list. Thanks!
> 
> best,
> kjellman
> 
>> On May 11, 2017, at 10:53 AM, Oskar Kjellin <os...@gmail.com> wrote:
>> 
>> That seems way too low. Depending on what type of disk you have it should be closer to 1-200MB.
>> That's probably causing your problems. It would still take a while for you to compact all your data tho 
>> 
>> Sent from my iPhone
>> 
>>> On 11 May 2017, at 19:50, varun saluja <sa...@gmail.com> wrote:
>>> 
>>> nodetool getcompactionthrougput
>>> 
>>> ./nodetool getcompactionthroughput
>>> Current compaction throughput: 16 MB/s
>>> 
>>> Regards,
>>> Varun Saluja
>>> 
>>>> On 11 May 2017 at 23:18, varun saluja <sa...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> PFB results for same. Numbers are scary here.
>>>> 
>>>> [root@WA-CASSDB2 bin]# ./nodetool compactionstats
>>>> pending tasks: 137
>>>>  compaction type         keyspace                 table    completed          total    unit   progress
>>>>       Compaction           system                 hints   5762711108   837522028005   bytes      0.69%
>>>>       Compaction   walletkeyspace   user_txn_history_v2    101477894     4722068388   bytes      2.15%
>>>>       Compaction   walletkeyspace   user_txn_history_v2   1511866634   753221762663   bytes      0.20%
>>>>       Compaction   walletkeyspace   user_txn_history_v2   3664734135    18605501268   bytes     19.70%
>>>> Active compaction remaining time :  26h32m28s
>>>> 
>>>> 
>>>> 
>>>>> On 11 May 2017 at 23:15, Oskar Kjellin <os...@gmail.com> wrote:
>>>>> What does nodetool compactionstats show?
>>>>> 
>>>>> I meant compaction throttling. nodetool getcompactionthrougput
>>>>> 
>>>>> 
>>>>>> On 11 May 2017, at 19:41, varun saluja <sa...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi Oskar,
>>>>>> 
>>>>>> Thanks for response.
>>>>>> 
>>>>>> Yes, could see lot of threads for compaction. Actually we are loading around 400GB data  per node on 3 node cassandra cluster.
>>>>>> Throttling was set to write around 7k TPS per node. Job ran fine for 2 days and then, we start getting Mutation drops  , longer GC and very high load on system.
>>>>>> 
>>>>>> System log reports:
>>>>>> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) off-heap
>>>>>> 
>>>>>> The job was stopped 12 hours back. But, still these failures can be seen. Can you Please let me know how shall i proceed further. If possible, Please suggest some parameters for high write intensive jobs.
>>>>>> 
>>>>>> 
>>>>>> Regards,
>>>>>> Varun Saluja
>>>>>> 
>>>>>> 
>>>>>>> On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:
>>>>>>> Do you have a lot of compactions going on? It sounds like you might've built up a huge backlog. Is your throttling configured properly?
>>>>>>> 
>>>>>>>> On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi Experts,
>>>>>>>> 
>>>>>>>> Seeking your help on a production issue.  We were running high write intensive job on our 3 node cassandra cluster V 2.1.7.
>>>>>>>> 
>>>>>>>> TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29.
>>>>>>>> 
>>>>>>>> System log reports:
>>>>>>>> 
>>>>>>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>>>>>>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>>>>>>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms
>>>>>>>> 
>>>>>>>> The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload
>>>>>>>> 
>>>>>>>> Are these hintedhandoff mutations? Can we stop these.
>>>>>>>> Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity.
>>>>>>>> 
>>>>>>>> Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help.
>>>>>>>> 
>>>>>>>> PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops.
>>>>>>>> 
>>>>>>>> Thanks and Regards,
>>>>>>>> Varun Saluja
>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Dropped Mutation and Read messages.

Posted by Michael Kjellman <mk...@internalcircle.com>.

This discussion should be on the C* user mailing list. Thanks!

best,
kjellman

> On May 11, 2017, at 10:53 AM, Oskar Kjellin <os...@gmail.com> wrote:
> 
> That seems way too low. Depending on what type of disk you have it should be closer to 1-200MB.
> That's probably causing your problems. It would still take a while for you to compact all your data tho 
> 
> Sent from my iPhone
> 
>> On 11 May 2017, at 19:50, varun saluja <sa...@gmail.com> wrote:
>> 
>> nodetool getcompactionthrougput
>> 
>> ./nodetool getcompactionthroughput
>> Current compaction throughput: 16 MB/s
>> 
>> Regards,
>> Varun Saluja
>> 
>>> On 11 May 2017 at 23:18, varun saluja <sa...@gmail.com> wrote:
>>> Hi,
>>> 
>>> PFB results for same. Numbers are scary here.
>>> 
>>> [root@WA-CASSDB2 bin]# ./nodetool compactionstats
>>> pending tasks: 137
>>>   compaction type         keyspace                 table    completed          total    unit   progress
>>>        Compaction           system                 hints   5762711108   837522028005   bytes      0.69%
>>>        Compaction   walletkeyspace   user_txn_history_v2    101477894     4722068388   bytes      2.15%
>>>        Compaction   walletkeyspace   user_txn_history_v2   1511866634   753221762663   bytes      0.20%
>>>        Compaction   walletkeyspace   user_txn_history_v2   3664734135    18605501268   bytes     19.70%
>>> Active compaction remaining time :  26h32m28s
>>> 
>>> 
>>> 
>>>> On 11 May 2017 at 23:15, Oskar Kjellin <os...@gmail.com> wrote:
>>>> What does nodetool compactionstats show?
>>>> 
>>>> I meant compaction throttling. nodetool getcompactionthrougput
>>>> 
>>>> 
>>>>> On 11 May 2017, at 19:41, varun saluja <sa...@gmail.com> wrote:
>>>>> 
>>>>> Hi Oskar,
>>>>> 
>>>>> Thanks for response.
>>>>> 
>>>>> Yes, could see lot of threads for compaction. Actually we are loading around 400GB data  per node on 3 node cassandra cluster.
>>>>> Throttling was set to write around 7k TPS per node. Job ran fine for 2 days and then, we start getting Mutation drops  , longer GC and very high load on system.
>>>>> 
>>>>> System log reports:
>>>>> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) off-heap
>>>>> 
>>>>> The job was stopped 12 hours back. But, still these failures can be seen. Can you Please let me know how shall i proceed further. If possible, Please suggest some parameters for high write intensive jobs.
>>>>> 
>>>>> 
>>>>> Regards,
>>>>> Varun Saluja
>>>>> 
>>>>> 
>>>>>> On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:
>>>>>> Do you have a lot of compactions going on? It sounds like you might've built up a huge backlog. Is your throttling configured properly?
>>>>>> 
>>>>>>> On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi Experts,
>>>>>>> 
>>>>>>> Seeking your help on a production issue.  We were running high write intensive job on our 3 node cassandra cluster V 2.1.7.
>>>>>>> 
>>>>>>> TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29.
>>>>>>> 
>>>>>>> System log reports:
>>>>>>> 
>>>>>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>>>>>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>>>>>>> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms
>>>>>>> 
>>>>>>> The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload
>>>>>>> 
>>>>>>> Are these hintedhandoff mutations? Can we stop these.
>>>>>>> Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity.
>>>>>>> 
>>>>>>> Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help.
>>>>>>> 
>>>>>>> PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops.
>>>>>>> 
>>>>>>> Thanks and Regards,
>>>>>>> Varun Saluja
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>> 
>>>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Dropped Mutation and Read messages.

Posted by Oskar Kjellin <os...@gmail.com>.

That seems way too low. Depending on what type of disk you have it should be closer to 1-200MB.
That's probably causing your problems. It would still take a while for you to compact all your data tho 

Sent from my iPhone

> On 11 May 2017, at 19:50, varun saluja <sa...@gmail.com> wrote:
> 
> nodetool getcompactionthrougput
> 
> ./nodetool getcompactionthroughput
> Current compaction throughput: 16 MB/s
> 
> Regards,
> Varun Saluja
> 
>> On 11 May 2017 at 23:18, varun saluja <sa...@gmail.com> wrote:
>> Hi,
>> 
>> PFB results for same. Numbers are scary here.
>> 
>> [root@WA-CASSDB2 bin]# ./nodetool compactionstats
>> pending tasks: 137
>>    compaction type         keyspace                 table    completed          total    unit   progress
>>         Compaction           system                 hints   5762711108   837522028005   bytes      0.69%
>>         Compaction   walletkeyspace   user_txn_history_v2    101477894     4722068388   bytes      2.15%
>>         Compaction   walletkeyspace   user_txn_history_v2   1511866634   753221762663   bytes      0.20%
>>         Compaction   walletkeyspace   user_txn_history_v2   3664734135    18605501268   bytes     19.70%
>> Active compaction remaining time :  26h32m28s
>> 
>> 
>> 
>>> On 11 May 2017 at 23:15, Oskar Kjellin <os...@gmail.com> wrote:
>>> What does nodetool compactionstats show?
>>> 
>>> I meant compaction throttling. nodetool getcompactionthrougput
>>> 
>>> 
>>>> On 11 May 2017, at 19:41, varun saluja <sa...@gmail.com> wrote:
>>>> 
>>>> Hi Oskar,
>>>> 
>>>> Thanks for response.
>>>> 
>>>>  Yes, could see lot of threads for compaction. Actually we are loading around 400GB data  per node on 3 node cassandra cluster.
>>>> Throttling was set to write around 7k TPS per node. Job ran fine for 2 days and then, we start getting Mutation drops  , longer GC and very high load on system.
>>>> 
>>>> System log reports:
>>>> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) off-heap
>>>> 
>>>>  The job was stopped 12 hours back. But, still these failures can be seen. Can you Please let me know how shall i proceed further. If possible, Please suggest some parameters for high write intensive jobs.
>>>> 
>>>> 
>>>> Regards,
>>>> Varun Saluja
>>>> 
>>>> 
>>>>> On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:
>>>>> Do you have a lot of compactions going on? It sounds like you might've built up a huge backlog. Is your throttling configured properly?
>>>>> 
>>>>> > On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
>>>>> >
>>>>> > Hi Experts,
>>>>> >
>>>>> > Seeking your help on a production issue.  We were running high write intensive job on our 3 node cassandra cluster V 2.1.7.
>>>>> >
>>>>> > TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29.
>>>>> >
>>>>> > System log reports:
>>>>> >
>>>>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>>>>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>>>>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms
>>>>> >
>>>>> > The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload
>>>>> >
>>>>> > Are these hintedhandoff mutations? Can we stop these.
>>>>> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity.
>>>>> >
>>>>> > Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help.
>>>>> >
>>>>> > PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops.
>>>>> >
>>>>> > Thanks and Regards,
>>>>> > Varun Saluja
>>>>> >
>>>>> > ---------------------------------------------------------------------
>>>>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> >
>>>> 
>> 
>

Re: Dropped Mutation and Read messages.

Posted by varun saluja <sa...@gmail.com>.

*nodetool getcompactionthrougput*

./nodetool getcompactionthroughput
Current compaction throughput: 16 MB/s

Regards,
Varun Saluja

On 11 May 2017 at 23:18, varun saluja <sa...@gmail.com> wrote:

> Hi,
>
> PFB results for same. Numbers are scary here.
>
> [root@WA-CASSDB2 bin]# ./nodetool compactionstats
> pending tasks: 137
>    compaction type         keyspace                 table    completed
>      total    unit   progress
>         Compaction           system                 hints   5762711108
> 837522028005   bytes      0.69%
>         Compaction   walletkeyspace   user_txn_history_v2    101477894
> 4722068388   bytes      2.15%
>         Compaction   walletkeyspace   user_txn_history_v2   1511866634
> 753221762663   bytes      0.20%
>         Compaction   walletkeyspace   user_txn_history_v2   3664734135
>  18605501268   bytes     19.70%
> Active compaction remaining time :  *26h32m28s*
>
>
>
> On 11 May 2017 at 23:15, Oskar Kjellin <os...@gmail.com> wrote:
>
>> What does nodetool compactionstats show?
>>
>> I meant compaction throttling. nodetool getcompactionthrougput
>>
>>
>> On 11 May 2017, at 19:41, varun saluja <sa...@gmail.com> wrote:
>>
>> Hi Oskar,
>>
>> Thanks for response.
>>
>>  Yes, could see lot of threads for compaction. Actually we are loading
>> around 400GB data  per node on 3 node cassandra cluster.
>> Throttling was set to write around 7k TPS per node. Job ran fine for 2
>> days and then, we start getting Mutation drops  , longer GC and very high
>> load on system.
>>
>> System log reports:
>> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%)
>> off-heap
>>
>>  The job was stopped 12 hours back. But, still these failures can be
>> seen. Can you Please let me know how shall i proceed further. If possible,
>> Please suggest some parameters for high write intensive jobs.
>>
>>
>> Regards,
>> Varun Saluja
>>
>>
>> On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:
>>
>>> Do you have a lot of compactions going on? It sounds like you might've
>>> built up a huge backlog. Is your throttling configured properly?
>>>
>>> > On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
>>> >
>>> > Hi Experts,
>>> >
>>> > Seeking your help on a production issue.  We were running high write
>>> intensive job on our 3 node cassandra cluster V 2.1.7.
>>> >
>>> > TPS on nodes were high. Job ran for more than 2 days and thereafter,
>>> loadavg on 1 of the node increased to very high number like loadavg : 29.
>>> >
>>> > System log reports:
>>> >
>>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
>>> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
>>> MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
>>> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last
>>> 5000ms
>>> >
>>> > The job was stopped due to heavy load. But sill after 12 hours , we
>>> can see mutation drops messages and sudden increase on avgload
>>> >
>>> > Are these hintedhandoff mutations? Can we stop these.
>>> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show
>>> any load or any such activity.
>>> >
>>> > Due to heavy load and GC , there are intermittent gossip failures
>>> among node. Can you someone Please help.
>>> >
>>> > PS: Load job was stopped on cluster. Everything ran fine for few hours
>>> and and Later issue started again like mutation messages drops.
>>> >
>>> > Thanks and Regards,
>>> > Varun Saluja
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>>> >
>>>
>>
>>
>

Re: Dropped Mutation and Read messages.

Posted by varun saluja <sa...@gmail.com>.

Hi,

PFB results for same. Numbers are scary here.

[root@WA-CASSDB2 bin]# ./nodetool compactionstats
pending tasks: 137
   compaction type         keyspace                 table    completed
     total    unit   progress
        Compaction           system                 hints   5762711108
837522028005   bytes      0.69%
        Compaction   walletkeyspace   user_txn_history_v2    101477894
4722068388   bytes      2.15%
        Compaction   walletkeyspace   user_txn_history_v2   1511866634
753221762663   bytes      0.20%
        Compaction   walletkeyspace   user_txn_history_v2   3664734135
 18605501268   bytes     19.70%
Active compaction remaining time :  *26h32m28s*



On 11 May 2017 at 23:15, Oskar Kjellin <os...@gmail.com> wrote:

> What does nodetool compactionstats show?
>
> I meant compaction throttling. nodetool getcompactionthrougput
>
>
> On 11 May 2017, at 19:41, varun saluja <sa...@gmail.com> wrote:
>
> Hi Oskar,
>
> Thanks for response.
>
>  Yes, could see lot of threads for compaction. Actually we are loading
> around 400GB data  per node on 3 node cassandra cluster.
> Throttling was set to write around 7k TPS per node. Job ran fine for 2
> days and then, we start getting Mutation drops  , longer GC and very high
> load on system.
>
> System log reports:
> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%)
> off-heap
>
>  The job was stopped 12 hours back. But, still these failures can be seen.
> Can you Please let me know how shall i proceed further. If possible, Please
> suggest some parameters for high write intensive jobs.
>
>
> Regards,
> Varun Saluja
>
>
> On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:
>
>> Do you have a lot of compactions going on? It sounds like you might've
>> built up a huge backlog. Is your throttling configured properly?
>>
>> > On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
>> >
>> > Hi Experts,
>> >
>> > Seeking your help on a production issue.  We were running high write
>> intensive job on our 3 node cassandra cluster V 2.1.7.
>> >
>> > TPS on nodes were high. Job ran for more than 2 days and thereafter,
>> loadavg on 1 of the node increased to very high number like loadavg : 29.
>> >
>> > System log reports:
>> >
>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
>> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
>> MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
>> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last
>> 5000ms
>> >
>> > The job was stopped due to heavy load. But sill after 12 hours , we can
>> see mutation drops messages and sudden increase on avgload
>> >
>> > Are these hintedhandoff mutations? Can we stop these.
>> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show
>> any load or any such activity.
>> >
>> > Due to heavy load and GC , there are intermittent gossip failures among
>> node. Can you someone Please help.
>> >
>> > PS: Load job was stopped on cluster. Everything ran fine for few hours
>> and and Later issue started again like mutation messages drops.
>> >
>> > Thanks and Regards,
>> > Varun Saluja
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>> >
>>
>
>

Re: Dropped Mutation and Read messages.

Posted by Oskar Kjellin <os...@gmail.com>.

What does nodetool compactionstats show?

I meant compaction throttling. nodetool getcompactionthrougput


> On 11 May 2017, at 19:41, varun saluja <sa...@gmail.com> wrote:
> 
> Hi Oskar,
> 
> Thanks for response.
> 
>  Yes, could see lot of threads for compaction. Actually we are loading around 400GB data  per node on 3 node cassandra cluster.
> Throttling was set to write around 7k TPS per node. Job ran fine for 2 days and then, we start getting Mutation drops  , longer GC and very high load on system.
> 
> System log reports:
> Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%) off-heap
> 
>  The job was stopped 12 hours back. But, still these failures can be seen. Can you Please let me know how shall i proceed further. If possible, Please suggest some parameters for high write intensive jobs.
> 
> 
> Regards,
> Varun Saluja
> 
> 
>> On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:
>> Do you have a lot of compactions going on? It sounds like you might've built up a huge backlog. Is your throttling configured properly?
>> 
>> > On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
>> >
>> > Hi Experts,
>> >
>> > Seeking your help on a production issue.  We were running high write intensive job on our 3 node cassandra cluster V 2.1.7.
>> >
>> > TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29.
>> >
>> > System log reports:
>> >
>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms
>> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms
>> >
>> > The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload
>> >
>> > Are these hintedhandoff mutations? Can we stop these.
>> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity.
>> >
>> > Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help.
>> >
>> > PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops.
>> >
>> > Thanks and Regards,
>> > Varun Saluja
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>> >
>

Re: Dropped Mutation and Read messages.

Posted by varun saluja <sa...@gmail.com>.

Hi Oskar,

Thanks for response.

 Yes, could see lot of threads for compaction. Actually we are loading
around 400GB data  per node on 3 node cassandra cluster.
Throttling was set to write around 7k TPS per node. Job ran fine for 2 days
and then, we start getting Mutation drops  , longer GC and very high load
on system.

System log reports:
Enqueuing flush of compactions_in_progress: 1156 (0%) on-heap, 1132 (0%)
off-heap

 The job was stopped 12 hours back. But, still these failures can be seen.
Can you Please let me know how shall i proceed further. If possible, Please
suggest some parameters for high write intensive jobs.


Regards,
Varun Saluja


On 11 May 2017 at 23:01, Oskar Kjellin <os...@gmail.com> wrote:

> Do you have a lot of compactions going on? It sounds like you might've
> built up a huge backlog. Is your throttling configured properly?
>
> > On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
> >
> > Hi Experts,
> >
> > Seeking your help on a production issue.  We were running high write
> intensive job on our 3 node cassandra cluster V 2.1.7.
> >
> > TPS on nodes were high. Job ran for more than 2 days and thereafter,
> loadavg on 1 of the node increased to very high number like loadavg : 29.
> >
> > System log reports:
> >
> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
> MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
> MessagingService.java:888 - 2 READ messages dropped in last 5000ms
> > INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466
> MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last
> 5000ms
> >
> > The job was stopped due to heavy load. But sill after 12 hours , we can
> see mutation drops messages and sudden increase on avgload
> >
> > Are these hintedhandoff mutations? Can we stop these.
> > Strangely this behaviour is seen only on 2 nodes. Node 1 does not show
> any load or any such activity.
> >
> > Due to heavy load and GC , there are intermittent gossip failures among
> node. Can you someone Please help.
> >
> > PS: Load job was stopped on cluster. Everything ran fine for few hours
> and and Later issue started again like mutation messages drops.
> >
> > Thanks and Regards,
> > Varun Saluja
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>

Re: Dropped Mutation and Read messages.

Posted by Oskar Kjellin <os...@gmail.com>.

Do you have a lot of compactions going on? It sounds like you might've built up a huge backlog. Is your throttling configured properly?

> On 11 May 2017, at 18:50, varun saluja <sa...@gmail.com> wrote:
> 
> Hi Experts,
> 
> Seeking your help on a production issue.  We were running high write intensive job on our 3 node cassandra cluster V 2.1.7.
> 
> TPS on nodes were high. Job ran for more than 2 days and thereafter, loadavg on 1 of the node increased to very high number like loadavg : 29.
> 
> System log reports:
> 
> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 839 MUTATION messages dropped in last 5000ms
> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 2 READ messages dropped in last 5000ms
> INFO  [ScheduledTasks:1] 2017-05-11 22:11:04,466 MessagingService.java:888 - 1 REQUEST_RESPONSE messages dropped in last 5000ms
> 
> The job was stopped due to heavy load. But sill after 12 hours , we can see mutation drops messages and sudden increase on avgload 
> 
> Are these hintedhandoff mutations? Can we stop these.
> Strangely this behaviour is seen only on 2 nodes. Node 1 does not show any load or any such activity.
> 
> Due to heavy load and GC , there are intermittent gossip failures among node. Can you someone Please help. 
> 
> PS: Load job was stopped on cluster. Everything ran fine for few hours and and Later issue started again like mutation messages drops.
> 
> Thanks and Regards,
> Varun Saluja
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org