You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Marcelo Elias Del Valle <ma...@s1mbi0se.com.br> on 2014/06/18 17:13:59 UTC

Batch of prepared statements exceeding specified threshold

I have a 10 node cluster with cassandra 2.0.8.

I am taking this exceptions in the log when I run my code. What my code
does is just reading data from a CF and in some cases it writes new data.

 WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
BatchStatement.java (line 228) Batch of prepared statements for
[identification1.entity, identification1.entity_lookup] is of size 6165,
exceeding specified threshold of 5120 by 1045.
 WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
BatchStatement.java (line 228) Batch of prepared statements for
[identification1.entity, identification1.entity_lookup] is of size 21266,
exceeding specified threshold of 5120 by 16146.
 WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
BatchStatement.java (line 228) Batch of prepared statements for
[identification1.entity, identification1.entity_lookup] is of size 22978,
exceeding specified threshold of 5120 by 17858.
 INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481)
CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
1024 cells

After some time, one node of the cluster goes down. Then it goes back after
some seconds and another node goes down. It keeps happening and there is
always a node down in the cluster, when it goes back another one falls.

The only exceptions I see in the log is "connected reset by the peer",
which seems to be relative to gossip protocol, when a node goes down.

Any hint of what could I do to investigate this problem further?

Best regards,
Marcelo Valle.

Re: Batch of prepared statements exceeding specified threshold

Posted by Marcelo Elias Del Valle <ma...@s1mbi0se.com.br>.

Hi,

I think it's a bit late for this reply, but anyway...
We hired support from http://thelastpickle.com/ to solve our problem and
thanks to them we were able to solve our issue as well.
What was causing this behavior was a large query being executed by mistake
in our code.
It was needed to open the java heap with Cassandra source and find the
query taking the large amount of memory.
Lesson learned to me: Cassandra support is something really needed and
useful, it's not like Oracle's, IBM, etc...

Best regards,
Marcelo Valle.



2014-06-20 18:15 GMT-03:00 Pavel Kogan <pa...@cortica.com>:

> Ok, in my case it was straightforward. It is just warning, which however
> says that batches with large data size (above 5Kb) can sometimes lead to
> node instability (why?). This limit seems to be hard-coded, I didn't find
> anyway to configure it externally. Anyway, removing batch and giving up
> atomicity, resolved the issue for me.
>
>
> http://mail-archives.apache.org/mod_mbox/cassandra-commits/201404.mbox/%3Ceee5dd5bc4794ef0b5c5153fdb583ac0@git.apache.org%3E
>
>
> On Fri, Jun 20, 2014 at 3:55 PM, Pavel Kogan <pa...@cortica.com>
> wrote:
>
>> Logged batch.
>>
>>
>> On Fri, Jun 20, 2014 at 2:13 PM, DuyHai Doan <do...@gmail.com>
>> wrote:
>>
>>> I think some figures from "nodetool tpstats" and "nodetool
>>> compactionstats" may help seeing clearer
>>>
>>> And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED
>>> batch ?
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jun 20, 2014 at 8:02 PM, Marcelo Elias Del Valle <
>>> marcelo@s1mbi0se.com.br> wrote:
>>>
>>>> If you have 32 Gb RAM, the heap is probably 8Gb.
>>>> 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing
>>>> all writes of a replica goes to a single node.
>>>> I really don't see any reason why it should be filling up the heap.
>>>> Anyone else?
>>>>
>>>> But did you check the logs for the GCInspector?
>>>> In my case, nodes are falling because of the heap, in your case, maybe
>>>> it's something else.
>>>> Do you see increased times when looking for GCInspector in the logs?
>>>>
>>>> []s
>>>>
>>>>
>>>>
>>>> 2014-06-20 14:51 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>>>
>>>> Hi Marcelo,
>>>>>
>>>>> No pending write tasks, I am writing a lot, about 100-200 writes each
>>>>> up to 100Kb every 15[s].
>>>>> It is running on decent cluster of 5 identical nodes, quad cores i7
>>>>> with 32Gb RAM and 480Gb SSD.
>>>>>
>>>>> Regards,
>>>>>   Pavel
>>>>>
>>>>>
>>>>> On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle <
>>>>> marcelo@s1mbi0se.com.br> wrote:
>>>>>
>>>>>> Pavel,
>>>>>>
>>>>>> In my case, the heap was filling up faster than it was draining. I am
>>>>>> still looking for the cause of it, as I could drain really fast with SSD.
>>>>>>
>>>>>> However, in your case you could check (AFAIK) nodetool tpstats and
>>>>>> see if there are too many pending write tasks, for instance. Maybe you
>>>>>> really are writting more than the nodes are able to flush to disk.
>>>>>>
>>>>>> How many writes per second are you achieving?
>>>>>>
>>>>>> Also, I would look for GCInspector in the log:
>>>>>>
>>>>>> cat system.log* | grep GCInspector | wc -l
>>>>>> tail -1000 system.log | grep GCInspector
>>>>>>
>>>>>> Do you see it running a lot? Is it taking much more time to run each
>>>>>> time it runs?
>>>>>>
>>>>>> I am no Cassandra expert, but I would try these things first and post
>>>>>> the results here. Maybe other people in the list have more ideas.
>>>>>>
>>>>>> Best regards,
>>>>>> Marcelo.
>>>>>>
>>>>>>
>>>>>> 2014-06-20 8:50 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>>>>>
>>>>>> The cluster is new, so no updates were done. Version 2.0.8.
>>>>>>> It happened when I did many writes (no reads). Writes are done in
>>>>>>> small batches of 2 inserts (writing to 2 column families). The values are
>>>>>>> big blobs (up to 100Kb).
>>>>>>>
>>>>>>> Any clues?
>>>>>>>
>>>>>>> Pavel
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle <
>>>>>>> marcelo@s1mbi0se.com.br> wrote:
>>>>>>>
>>>>>>>> Pavel,
>>>>>>>>
>>>>>>>> Out of curiosity, did it start to happen before some update? Which
>>>>>>>> version of Cassandra are you using?
>>>>>>>>
>>>>>>>> []s
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-06-19 16:10 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>>>>>>>
>>>>>>>>> What a coincidence! Today happened in my cluster of 7 nodes as
>>>>>>>>> well.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>   Pavel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <
>>>>>>>>> marcelo@s1mbi0se.com.br> wrote:
>>>>>>>>>
>>>>>>>>>> I have a 10 node cluster with cassandra 2.0.8.
>>>>>>>>>>
>>>>>>>>>> I am taking this exceptions in the log when I run my code. What
>>>>>>>>>> my code does is just reading data from a CF and in some cases it writes new
>>>>>>>>>> data.
>>>>>>>>>>
>>>>>>>>>>  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
>>>>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 6165,
>>>>>>>>>> exceeding specified threshold of 5120 by 1045.
>>>>>>>>>>  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
>>>>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 21266,
>>>>>>>>>> exceeding specified threshold of 5120 by 16146.
>>>>>>>>>>  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
>>>>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 22978,
>>>>>>>>>> exceeding specified threshold of 5120 by 17858.
>>>>>>>>>>  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line
>>>>>>>>>> 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
>>>>>>>>>> 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
>>>>>>>>>> 1024 cells
>>>>>>>>>>
>>>>>>>>>> After some time, one node of the cluster goes down. Then it goes
>>>>>>>>>> back after some seconds and another node goes down. It keeps happening and
>>>>>>>>>> there is always a node down in the cluster, when it goes back another one
>>>>>>>>>> falls.
>>>>>>>>>>
>>>>>>>>>> The only exceptions I see in the log is "connected reset by the
>>>>>>>>>> peer", which seems to be relative to gossip protocol, when a node goes down.
>>>>>>>>>>
>>>>>>>>>> Any hint of what could I do to investigate this problem further?
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Marcelo Valle.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Batch of prepared statements exceeding specified threshold

Posted by Pavel Kogan <pa...@cortica.com>.

Ok, in my case it was straightforward. It is just warning, which however
says that batches with large data size (above 5Kb) can sometimes lead to
node instability (why?). This limit seems to be hard-coded, I didn't find
anyway to configure it externally. Anyway, removing batch and giving up
atomicity, resolved the issue for me.

http://mail-archives.apache.org/mod_mbox/cassandra-commits/201404.mbox/%3Ceee5dd5bc4794ef0b5c5153fdb583ac0@git.apache.org%3E


On Fri, Jun 20, 2014 at 3:55 PM, Pavel Kogan <pa...@cortica.com>
wrote:

> Logged batch.
>
>
> On Fri, Jun 20, 2014 at 2:13 PM, DuyHai Doan <do...@gmail.com> wrote:
>
>> I think some figures from "nodetool tpstats" and "nodetool
>> compactionstats" may help seeing clearer
>>
>> And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED
>> batch ?
>>
>>
>>
>>
>>
>> On Fri, Jun 20, 2014 at 8:02 PM, Marcelo Elias Del Valle <
>> marcelo@s1mbi0se.com.br> wrote:
>>
>>> If you have 32 Gb RAM, the heap is probably 8Gb.
>>> 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing
>>> all writes of a replica goes to a single node.
>>> I really don't see any reason why it should be filling up the heap.
>>> Anyone else?
>>>
>>> But did you check the logs for the GCInspector?
>>> In my case, nodes are falling because of the heap, in your case, maybe
>>> it's something else.
>>> Do you see increased times when looking for GCInspector in the logs?
>>>
>>> []s
>>>
>>>
>>>
>>> 2014-06-20 14:51 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>>
>>> Hi Marcelo,
>>>>
>>>> No pending write tasks, I am writing a lot, about 100-200 writes each
>>>> up to 100Kb every 15[s].
>>>> It is running on decent cluster of 5 identical nodes, quad cores i7
>>>> with 32Gb RAM and 480Gb SSD.
>>>>
>>>> Regards,
>>>>   Pavel
>>>>
>>>>
>>>> On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle <
>>>> marcelo@s1mbi0se.com.br> wrote:
>>>>
>>>>> Pavel,
>>>>>
>>>>> In my case, the heap was filling up faster than it was draining. I am
>>>>> still looking for the cause of it, as I could drain really fast with SSD.
>>>>>
>>>>> However, in your case you could check (AFAIK) nodetool tpstats and see
>>>>> if there are too many pending write tasks, for instance. Maybe you really
>>>>> are writting more than the nodes are able to flush to disk.
>>>>>
>>>>> How many writes per second are you achieving?
>>>>>
>>>>> Also, I would look for GCInspector in the log:
>>>>>
>>>>> cat system.log* | grep GCInspector | wc -l
>>>>> tail -1000 system.log | grep GCInspector
>>>>>
>>>>> Do you see it running a lot? Is it taking much more time to run each
>>>>> time it runs?
>>>>>
>>>>> I am no Cassandra expert, but I would try these things first and post
>>>>> the results here. Maybe other people in the list have more ideas.
>>>>>
>>>>> Best regards,
>>>>> Marcelo.
>>>>>
>>>>>
>>>>> 2014-06-20 8:50 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>>>>
>>>>> The cluster is new, so no updates were done. Version 2.0.8.
>>>>>> It happened when I did many writes (no reads). Writes are done in
>>>>>> small batches of 2 inserts (writing to 2 column families). The values are
>>>>>> big blobs (up to 100Kb).
>>>>>>
>>>>>> Any clues?
>>>>>>
>>>>>> Pavel
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle <
>>>>>> marcelo@s1mbi0se.com.br> wrote:
>>>>>>
>>>>>>> Pavel,
>>>>>>>
>>>>>>> Out of curiosity, did it start to happen before some update? Which
>>>>>>> version of Cassandra are you using?
>>>>>>>
>>>>>>> []s
>>>>>>>
>>>>>>>
>>>>>>> 2014-06-19 16:10 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>>>>>>
>>>>>>>> What a coincidence! Today happened in my cluster of 7 nodes as well.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>   Pavel
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <
>>>>>>>> marcelo@s1mbi0se.com.br> wrote:
>>>>>>>>
>>>>>>>>> I have a 10 node cluster with cassandra 2.0.8.
>>>>>>>>>
>>>>>>>>> I am taking this exceptions in the log when I run my code. What my
>>>>>>>>> code does is just reading data from a CF and in some cases it writes new
>>>>>>>>> data.
>>>>>>>>>
>>>>>>>>>  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
>>>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 6165,
>>>>>>>>> exceeding specified threshold of 5120 by 1045.
>>>>>>>>>  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
>>>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 21266,
>>>>>>>>> exceeding specified threshold of 5120 by 16146.
>>>>>>>>>  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
>>>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 22978,
>>>>>>>>> exceeding specified threshold of 5120 by 17858.
>>>>>>>>>  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line
>>>>>>>>> 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
>>>>>>>>> 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
>>>>>>>>> 1024 cells
>>>>>>>>>
>>>>>>>>> After some time, one node of the cluster goes down. Then it goes
>>>>>>>>> back after some seconds and another node goes down. It keeps happening and
>>>>>>>>> there is always a node down in the cluster, when it goes back another one
>>>>>>>>> falls.
>>>>>>>>>
>>>>>>>>> The only exceptions I see in the log is "connected reset by the
>>>>>>>>> peer", which seems to be relative to gossip protocol, when a node goes down.
>>>>>>>>>
>>>>>>>>> Any hint of what could I do to investigate this problem further?
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Marcelo Valle.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Batch of prepared statements exceeding specified threshold

Posted by Pavel Kogan <pa...@cortica.com>.

Logged batch.


On Fri, Jun 20, 2014 at 2:13 PM, DuyHai Doan <do...@gmail.com> wrote:

> I think some figures from "nodetool tpstats" and "nodetool
> compactionstats" may help seeing clearer
>
> And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED
> batch ?
>
>
>
>
>
> On Fri, Jun 20, 2014 at 8:02 PM, Marcelo Elias Del Valle <
> marcelo@s1mbi0se.com.br> wrote:
>
>> If you have 32 Gb RAM, the heap is probably 8Gb.
>> 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing
>> all writes of a replica goes to a single node.
>> I really don't see any reason why it should be filling up the heap.
>> Anyone else?
>>
>> But did you check the logs for the GCInspector?
>> In my case, nodes are falling because of the heap, in your case, maybe
>> it's something else.
>> Do you see increased times when looking for GCInspector in the logs?
>>
>> []s
>>
>>
>>
>> 2014-06-20 14:51 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>
>> Hi Marcelo,
>>>
>>> No pending write tasks, I am writing a lot, about 100-200 writes each up
>>> to 100Kb every 15[s].
>>> It is running on decent cluster of 5 identical nodes, quad cores i7 with
>>> 32Gb RAM and 480Gb SSD.
>>>
>>> Regards,
>>>   Pavel
>>>
>>>
>>> On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle <
>>> marcelo@s1mbi0se.com.br> wrote:
>>>
>>>> Pavel,
>>>>
>>>> In my case, the heap was filling up faster than it was draining. I am
>>>> still looking for the cause of it, as I could drain really fast with SSD.
>>>>
>>>> However, in your case you could check (AFAIK) nodetool tpstats and see
>>>> if there are too many pending write tasks, for instance. Maybe you really
>>>> are writting more than the nodes are able to flush to disk.
>>>>
>>>> How many writes per second are you achieving?
>>>>
>>>> Also, I would look for GCInspector in the log:
>>>>
>>>> cat system.log* | grep GCInspector | wc -l
>>>> tail -1000 system.log | grep GCInspector
>>>>
>>>> Do you see it running a lot? Is it taking much more time to run each
>>>> time it runs?
>>>>
>>>> I am no Cassandra expert, but I would try these things first and post
>>>> the results here. Maybe other people in the list have more ideas.
>>>>
>>>> Best regards,
>>>> Marcelo.
>>>>
>>>>
>>>> 2014-06-20 8:50 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>>>
>>>> The cluster is new, so no updates were done. Version 2.0.8.
>>>>> It happened when I did many writes (no reads). Writes are done in
>>>>> small batches of 2 inserts (writing to 2 column families). The values are
>>>>> big blobs (up to 100Kb).
>>>>>
>>>>> Any clues?
>>>>>
>>>>> Pavel
>>>>>
>>>>>
>>>>> On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle <
>>>>> marcelo@s1mbi0se.com.br> wrote:
>>>>>
>>>>>> Pavel,
>>>>>>
>>>>>> Out of curiosity, did it start to happen before some update? Which
>>>>>> version of Cassandra are you using?
>>>>>>
>>>>>> []s
>>>>>>
>>>>>>
>>>>>> 2014-06-19 16:10 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>>>>>
>>>>>>> What a coincidence! Today happened in my cluster of 7 nodes as well.
>>>>>>>
>>>>>>> Regards,
>>>>>>>   Pavel
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <
>>>>>>> marcelo@s1mbi0se.com.br> wrote:
>>>>>>>
>>>>>>>> I have a 10 node cluster with cassandra 2.0.8.
>>>>>>>>
>>>>>>>> I am taking this exceptions in the log when I run my code. What my
>>>>>>>> code does is just reading data from a CF and in some cases it writes new
>>>>>>>> data.
>>>>>>>>
>>>>>>>>  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
>>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 6165,
>>>>>>>> exceeding specified threshold of 5120 by 1045.
>>>>>>>>  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
>>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 21266,
>>>>>>>> exceeding specified threshold of 5120 by 16146.
>>>>>>>>  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
>>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 22978,
>>>>>>>> exceeding specified threshold of 5120 by 17858.
>>>>>>>>  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line
>>>>>>>> 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
>>>>>>>> 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
>>>>>>>> 1024 cells
>>>>>>>>
>>>>>>>> After some time, one node of the cluster goes down. Then it goes
>>>>>>>> back after some seconds and another node goes down. It keeps happening and
>>>>>>>> there is always a node down in the cluster, when it goes back another one
>>>>>>>> falls.
>>>>>>>>
>>>>>>>> The only exceptions I see in the log is "connected reset by the
>>>>>>>> peer", which seems to be relative to gossip protocol, when a node goes down.
>>>>>>>>
>>>>>>>> Any hint of what could I do to investigate this problem further?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Marcelo Valle.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Batch of prepared statements exceeding specified threshold

Posted by DuyHai Doan <do...@gmail.com>.

I think some figures from "nodetool tpstats" and "nodetool compactionstats"
may help seeing clearer

And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED batch
?





On Fri, Jun 20, 2014 at 8:02 PM, Marcelo Elias Del Valle <
marcelo@s1mbi0se.com.br> wrote:

> If you have 32 Gb RAM, the heap is probably 8Gb.
> 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing
> all writes of a replica goes to a single node.
> I really don't see any reason why it should be filling up the heap.
> Anyone else?
>
> But did you check the logs for the GCInspector?
> In my case, nodes are falling because of the heap, in your case, maybe
> it's something else.
> Do you see increased times when looking for GCInspector in the logs?
>
> []s
>
>
>
> 2014-06-20 14:51 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>
> Hi Marcelo,
>>
>> No pending write tasks, I am writing a lot, about 100-200 writes each up
>> to 100Kb every 15[s].
>> It is running on decent cluster of 5 identical nodes, quad cores i7 with
>> 32Gb RAM and 480Gb SSD.
>>
>> Regards,
>>   Pavel
>>
>>
>> On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle <
>> marcelo@s1mbi0se.com.br> wrote:
>>
>>> Pavel,
>>>
>>> In my case, the heap was filling up faster than it was draining. I am
>>> still looking for the cause of it, as I could drain really fast with SSD.
>>>
>>> However, in your case you could check (AFAIK) nodetool tpstats and see
>>> if there are too many pending write tasks, for instance. Maybe you really
>>> are writting more than the nodes are able to flush to disk.
>>>
>>> How many writes per second are you achieving?
>>>
>>> Also, I would look for GCInspector in the log:
>>>
>>> cat system.log* | grep GCInspector | wc -l
>>> tail -1000 system.log | grep GCInspector
>>>
>>> Do you see it running a lot? Is it taking much more time to run each
>>> time it runs?
>>>
>>> I am no Cassandra expert, but I would try these things first and post
>>> the results here. Maybe other people in the list have more ideas.
>>>
>>> Best regards,
>>> Marcelo.
>>>
>>>
>>> 2014-06-20 8:50 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>>
>>> The cluster is new, so no updates were done. Version 2.0.8.
>>>> It happened when I did many writes (no reads). Writes are done in small
>>>> batches of 2 inserts (writing to 2 column families). The values are big
>>>> blobs (up to 100Kb).
>>>>
>>>> Any clues?
>>>>
>>>> Pavel
>>>>
>>>>
>>>> On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle <
>>>> marcelo@s1mbi0se.com.br> wrote:
>>>>
>>>>> Pavel,
>>>>>
>>>>> Out of curiosity, did it start to happen before some update? Which
>>>>> version of Cassandra are you using?
>>>>>
>>>>> []s
>>>>>
>>>>>
>>>>> 2014-06-19 16:10 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>>>>
>>>>>> What a coincidence! Today happened in my cluster of 7 nodes as well.
>>>>>>
>>>>>> Regards,
>>>>>>   Pavel
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <
>>>>>> marcelo@s1mbi0se.com.br> wrote:
>>>>>>
>>>>>>> I have a 10 node cluster with cassandra 2.0.8.
>>>>>>>
>>>>>>> I am taking this exceptions in the log when I run my code. What my
>>>>>>> code does is just reading data from a CF and in some cases it writes new
>>>>>>> data.
>>>>>>>
>>>>>>>  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 6165,
>>>>>>> exceeding specified threshold of 5120 by 1045.
>>>>>>>  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 21266,
>>>>>>> exceeding specified threshold of 5120 by 16146.
>>>>>>>  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 22978,
>>>>>>> exceeding specified threshold of 5120 by 17858.
>>>>>>>  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line
>>>>>>> 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
>>>>>>> 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
>>>>>>> 1024 cells
>>>>>>>
>>>>>>> After some time, one node of the cluster goes down. Then it goes
>>>>>>> back after some seconds and another node goes down. It keeps happening and
>>>>>>> there is always a node down in the cluster, when it goes back another one
>>>>>>> falls.
>>>>>>>
>>>>>>> The only exceptions I see in the log is "connected reset by the
>>>>>>> peer", which seems to be relative to gossip protocol, when a node goes down.
>>>>>>>
>>>>>>> Any hint of what could I do to investigate this problem further?
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Marcelo Valle.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Batch of prepared statements exceeding specified threshold

Posted by Marcelo Elias Del Valle <ma...@s1mbi0se.com.br>.

If you have 32 Gb RAM, the heap is probably 8Gb.
200 writes of 100 kb / s would be 20MB / s in the worst case, supposing all
writes of a replica goes to a single node.
I really don't see any reason why it should be filling up the heap.
Anyone else?

But did you check the logs for the GCInspector?
In my case, nodes are falling because of the heap, in your case, maybe it's
something else.
Do you see increased times when looking for GCInspector in the logs?

[]s



2014-06-20 14:51 GMT-03:00 Pavel Kogan <pa...@cortica.com>:

> Hi Marcelo,
>
> No pending write tasks, I am writing a lot, about 100-200 writes each up
> to 100Kb every 15[s].
> It is running on decent cluster of 5 identical nodes, quad cores i7 with
> 32Gb RAM and 480Gb SSD.
>
> Regards,
>   Pavel
>
>
> On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle <
> marcelo@s1mbi0se.com.br> wrote:
>
>> Pavel,
>>
>> In my case, the heap was filling up faster than it was draining. I am
>> still looking for the cause of it, as I could drain really fast with SSD.
>>
>> However, in your case you could check (AFAIK) nodetool tpstats and see if
>> there are too many pending write tasks, for instance. Maybe you really are
>> writting more than the nodes are able to flush to disk.
>>
>> How many writes per second are you achieving?
>>
>> Also, I would look for GCInspector in the log:
>>
>> cat system.log* | grep GCInspector | wc -l
>> tail -1000 system.log | grep GCInspector
>>
>> Do you see it running a lot? Is it taking much more time to run each time
>> it runs?
>>
>> I am no Cassandra expert, but I would try these things first and post the
>> results here. Maybe other people in the list have more ideas.
>>
>> Best regards,
>> Marcelo.
>>
>>
>> 2014-06-20 8:50 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>
>> The cluster is new, so no updates were done. Version 2.0.8.
>>> It happened when I did many writes (no reads). Writes are done in small
>>> batches of 2 inserts (writing to 2 column families). The values are big
>>> blobs (up to 100Kb).
>>>
>>> Any clues?
>>>
>>> Pavel
>>>
>>>
>>> On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle <
>>> marcelo@s1mbi0se.com.br> wrote:
>>>
>>>> Pavel,
>>>>
>>>> Out of curiosity, did it start to happen before some update? Which
>>>> version of Cassandra are you using?
>>>>
>>>> []s
>>>>
>>>>
>>>> 2014-06-19 16:10 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>>>
>>>>> What a coincidence! Today happened in my cluster of 7 nodes as well.
>>>>>
>>>>> Regards,
>>>>>   Pavel
>>>>>
>>>>>
>>>>> On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <
>>>>> marcelo@s1mbi0se.com.br> wrote:
>>>>>
>>>>>> I have a 10 node cluster with cassandra 2.0.8.
>>>>>>
>>>>>> I am taking this exceptions in the log when I run my code. What my
>>>>>> code does is just reading data from a CF and in some cases it writes new
>>>>>> data.
>>>>>>
>>>>>>  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>> [identification1.entity, identification1.entity_lookup] is of size 6165,
>>>>>> exceeding specified threshold of 5120 by 1045.
>>>>>>  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>> [identification1.entity, identification1.entity_lookup] is of size 21266,
>>>>>> exceeding specified threshold of 5120 by 16146.
>>>>>>  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>> [identification1.entity, identification1.entity_lookup] is of size 22978,
>>>>>> exceeding specified threshold of 5120 by 17858.
>>>>>>  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line
>>>>>> 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
>>>>>> 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
>>>>>> 1024 cells
>>>>>>
>>>>>> After some time, one node of the cluster goes down. Then it goes back
>>>>>> after some seconds and another node goes down. It keeps happening and there
>>>>>> is always a node down in the cluster, when it goes back another one falls.
>>>>>>
>>>>>> The only exceptions I see in the log is "connected reset by the
>>>>>> peer", which seems to be relative to gossip protocol, when a node goes down.
>>>>>>
>>>>>> Any hint of what could I do to investigate this problem further?
>>>>>>
>>>>>> Best regards,
>>>>>> Marcelo Valle.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Batch of prepared statements exceeding specified threshold

Posted by Pavel Kogan <pa...@cortica.com>.

Hi Marcelo,

No pending write tasks, I am writing a lot, about 100-200 writes each up to
100Kb every 15[s].
It is running on decent cluster of 5 identical nodes, quad cores i7 with
32Gb RAM and 480Gb SSD.

Regards,
  Pavel


On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle <
marcelo@s1mbi0se.com.br> wrote:

> Pavel,
>
> In my case, the heap was filling up faster than it was draining. I am
> still looking for the cause of it, as I could drain really fast with SSD.
>
> However, in your case you could check (AFAIK) nodetool tpstats and see if
> there are too many pending write tasks, for instance. Maybe you really are
> writting more than the nodes are able to flush to disk.
>
> How many writes per second are you achieving?
>
> Also, I would look for GCInspector in the log:
>
> cat system.log* | grep GCInspector | wc -l
> tail -1000 system.log | grep GCInspector
>
> Do you see it running a lot? Is it taking much more time to run each time
> it runs?
>
> I am no Cassandra expert, but I would try these things first and post the
> results here. Maybe other people in the list have more ideas.
>
> Best regards,
> Marcelo.
>
>
> 2014-06-20 8:50 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>
> The cluster is new, so no updates were done. Version 2.0.8.
>> It happened when I did many writes (no reads). Writes are done in small
>> batches of 2 inserts (writing to 2 column families). The values are big
>> blobs (up to 100Kb).
>>
>> Any clues?
>>
>> Pavel
>>
>>
>> On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle <
>> marcelo@s1mbi0se.com.br> wrote:
>>
>>> Pavel,
>>>
>>> Out of curiosity, did it start to happen before some update? Which
>>> version of Cassandra are you using?
>>>
>>> []s
>>>
>>>
>>> 2014-06-19 16:10 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>>
>>>> What a coincidence! Today happened in my cluster of 7 nodes as well.
>>>>
>>>> Regards,
>>>>   Pavel
>>>>
>>>>
>>>> On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <
>>>> marcelo@s1mbi0se.com.br> wrote:
>>>>
>>>>> I have a 10 node cluster with cassandra 2.0.8.
>>>>>
>>>>> I am taking this exceptions in the log when I run my code. What my
>>>>> code does is just reading data from a CF and in some cases it writes new
>>>>> data.
>>>>>
>>>>>  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>> [identification1.entity, identification1.entity_lookup] is of size 6165,
>>>>> exceeding specified threshold of 5120 by 1045.
>>>>>  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>> [identification1.entity, identification1.entity_lookup] is of size 21266,
>>>>> exceeding specified threshold of 5120 by 16146.
>>>>>  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>> [identification1.entity, identification1.entity_lookup] is of size 22978,
>>>>> exceeding specified threshold of 5120 by 17858.
>>>>>  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481)
>>>>> CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
>>>>> 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
>>>>> 1024 cells
>>>>>
>>>>> After some time, one node of the cluster goes down. Then it goes back
>>>>> after some seconds and another node goes down. It keeps happening and there
>>>>> is always a node down in the cluster, when it goes back another one falls.
>>>>>
>>>>> The only exceptions I see in the log is "connected reset by the peer",
>>>>> which seems to be relative to gossip protocol, when a node goes down.
>>>>>
>>>>> Any hint of what could I do to investigate this problem further?
>>>>>
>>>>> Best regards,
>>>>> Marcelo Valle.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Batch of prepared statements exceeding specified threshold

Posted by Marcelo Elias Del Valle <ma...@s1mbi0se.com.br>.

Pavel,

In my case, the heap was filling up faster than it was draining. I am still
looking for the cause of it, as I could drain really fast with SSD.

However, in your case you could check (AFAIK) nodetool tpstats and see if
there are too many pending write tasks, for instance. Maybe you really are
writting more than the nodes are able to flush to disk.

How many writes per second are you achieving?

Also, I would look for GCInspector in the log:

cat system.log* | grep GCInspector | wc -l
tail -1000 system.log | grep GCInspector

Do you see it running a lot? Is it taking much more time to run each time
it runs?

I am no Cassandra expert, but I would try these things first and post the
results here. Maybe other people in the list have more ideas.

Best regards,
Marcelo.


2014-06-20 8:50 GMT-03:00 Pavel Kogan <pa...@cortica.com>:

> The cluster is new, so no updates were done. Version 2.0.8.
> It happened when I did many writes (no reads). Writes are done in small
> batches of 2 inserts (writing to 2 column families). The values are big
> blobs (up to 100Kb).
>
> Any clues?
>
> Pavel
>
>
> On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle <
> marcelo@s1mbi0se.com.br> wrote:
>
>> Pavel,
>>
>> Out of curiosity, did it start to happen before some update? Which
>> version of Cassandra are you using?
>>
>> []s
>>
>>
>> 2014-06-19 16:10 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>>
>>> What a coincidence! Today happened in my cluster of 7 nodes as well.
>>>
>>> Regards,
>>>   Pavel
>>>
>>>
>>> On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <
>>> marcelo@s1mbi0se.com.br> wrote:
>>>
>>>> I have a 10 node cluster with cassandra 2.0.8.
>>>>
>>>> I am taking this exceptions in the log when I run my code. What my code
>>>> does is just reading data from a CF and in some cases it writes new data.
>>>>
>>>>  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>> [identification1.entity, identification1.entity_lookup] is of size 6165,
>>>> exceeding specified threshold of 5120 by 1045.
>>>>  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>> [identification1.entity, identification1.entity_lookup] is of size 21266,
>>>> exceeding specified threshold of 5120 by 16146.
>>>>  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>> [identification1.entity, identification1.entity_lookup] is of size 22978,
>>>> exceeding specified threshold of 5120 by 17858.
>>>>  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481)
>>>> CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
>>>> 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
>>>> 1024 cells
>>>>
>>>> After some time, one node of the cluster goes down. Then it goes back
>>>> after some seconds and another node goes down. It keeps happening and there
>>>> is always a node down in the cluster, when it goes back another one falls.
>>>>
>>>> The only exceptions I see in the log is "connected reset by the peer",
>>>> which seems to be relative to gossip protocol, when a node goes down.
>>>>
>>>> Any hint of what could I do to investigate this problem further?
>>>>
>>>> Best regards,
>>>> Marcelo Valle.
>>>>
>>>
>>>
>>
>

Re: Batch of prepared statements exceeding specified threshold

Posted by Pavel Kogan <pa...@cortica.com>.

The cluster is new, so no updates were done. Version 2.0.8.
It happened when I did many writes (no reads). Writes are done in small
batches of 2 inserts (writing to 2 column families). The values are big
blobs (up to 100Kb).

Any clues?

Pavel


On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle <
marcelo@s1mbi0se.com.br> wrote:

> Pavel,
>
> Out of curiosity, did it start to happen before some update? Which version
> of Cassandra are you using?
>
> []s
>
>
> 2014-06-19 16:10 GMT-03:00 Pavel Kogan <pa...@cortica.com>:
>
>> What a coincidence! Today happened in my cluster of 7 nodes as well.
>>
>> Regards,
>>   Pavel
>>
>>
>> On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <
>> marcelo@s1mbi0se.com.br> wrote:
>>
>>> I have a 10 node cluster with cassandra 2.0.8.
>>>
>>> I am taking this exceptions in the log when I run my code. What my code
>>> does is just reading data from a CF and in some cases it writes new data.
>>>
>>>  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
>>> BatchStatement.java (line 228) Batch of prepared statements for
>>> [identification1.entity, identification1.entity_lookup] is of size 6165,
>>> exceeding specified threshold of 5120 by 1045.
>>>  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
>>> BatchStatement.java (line 228) Batch of prepared statements for
>>> [identification1.entity, identification1.entity_lookup] is of size 21266,
>>> exceeding specified threshold of 5120 by 16146.
>>>  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
>>> BatchStatement.java (line 228) Batch of prepared statements for
>>> [identification1.entity, identification1.entity_lookup] is of size 22978,
>>> exceeding specified threshold of 5120 by 17858.
>>>  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481)
>>> CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
>>> 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
>>> 1024 cells
>>>
>>> After some time, one node of the cluster goes down. Then it goes back
>>> after some seconds and another node goes down. It keeps happening and there
>>> is always a node down in the cluster, when it goes back another one falls.
>>>
>>> The only exceptions I see in the log is "connected reset by the peer",
>>> which seems to be relative to gossip protocol, when a node goes down.
>>>
>>> Any hint of what could I do to investigate this problem further?
>>>
>>> Best regards,
>>> Marcelo Valle.
>>>
>>
>>
>

Re: Batch of prepared statements exceeding specified threshold

Posted by Marcelo Elias Del Valle <ma...@s1mbi0se.com.br>.

Pavel,

Out of curiosity, did it start to happen before some update? Which version
of Cassandra are you using?

[]s


2014-06-19 16:10 GMT-03:00 Pavel Kogan <pa...@cortica.com>:

> What a coincidence! Today happened in my cluster of 7 nodes as well.
>
> Regards,
>   Pavel
>
>
> On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <
> marcelo@s1mbi0se.com.br> wrote:
>
>> I have a 10 node cluster with cassandra 2.0.8.
>>
>> I am taking this exceptions in the log when I run my code. What my code
>> does is just reading data from a CF and in some cases it writes new data.
>>
>>  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
>> BatchStatement.java (line 228) Batch of prepared statements for
>> [identification1.entity, identification1.entity_lookup] is of size 6165,
>> exceeding specified threshold of 5120 by 1045.
>>  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
>> BatchStatement.java (line 228) Batch of prepared statements for
>> [identification1.entity, identification1.entity_lookup] is of size 21266,
>> exceeding specified threshold of 5120 by 16146.
>>  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
>> BatchStatement.java (line 228) Batch of prepared statements for
>> [identification1.entity, identification1.entity_lookup] is of size 22978,
>> exceeding specified threshold of 5120 by 17858.
>>  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481)
>> CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
>> 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
>> 1024 cells
>>
>> After some time, one node of the cluster goes down. Then it goes back
>> after some seconds and another node goes down. It keeps happening and there
>> is always a node down in the cluster, when it goes back another one falls.
>>
>> The only exceptions I see in the log is "connected reset by the peer",
>> which seems to be relative to gossip protocol, when a node goes down.
>>
>> Any hint of what could I do to investigate this problem further?
>>
>> Best regards,
>> Marcelo Valle.
>>
>
>

Re: Batch of prepared statements exceeding specified threshold

Posted by Marcelo Elias Del Valle <ma...@s1mbi0se.com.br>.

I know now it's been caused by the heap filling up in some nodes. When it
fills up, the node goes does, GC runs more, then the node goes up again.
Looking for GCInspector in the log, I see GC takes more time to run each
time it runs, as shown bellow.
I have set key cache to 100 mb and I was used to query much much more rows
in Cassandra before...

I am trying to find some query I do that returns a lot of coluns, but
usually it's returning just 1 row and each row use to have 100 columns at
most...
I will keep looking, but is it possible to be a bug? I am using the newest
version of Cassandra.

INFO [ScheduledTasks:1] 2014-06-19 19:36:36,240 GCInspector.java (line 116)
GC for ConcurrentMarkSweep: 14837 ms for 1 collections, 8245789072 used;
max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:36:57,621 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 20637 ms for 1 collections, 8403381728
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:37:13,291 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 15134 ms for 1 collections, 8398383880
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:37:34,775 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 20897 ms for 1 collections, 8404085176
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:37:50,364 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 14926 ms for 1 collections, 8293046264
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:38:11,762 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 20705 ms for 1 collections, 8426815144
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:38:27,413 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 15172 ms for 1 collections, 8426043120
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:38:48,956 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 20993 ms for 1 collections, 8425551136
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:39:04,827 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 15374 ms for 1 collections, 8426721952
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:39:26,319 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 20958 ms for 1 collections, 8431842432
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:39:41,996 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 14984 ms for 1 collections, 8422533432
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:40:03,351 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 21220 ms for 1 collections, 8422545360
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:40:18,866 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 15135 ms for 1 collections, 8426651232
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:40:40,305 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 20953 ms for 1 collections, 8462319400
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:41:08,079 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 27551 ms for 2 collections, 8463374528
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:41:29,510 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 21189 ms for 1 collections, 8466512144
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:41:44,936 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 15220 ms for 1 collections, 8470873096
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:42:06,300 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 21178 ms for 1 collections, 8471013184
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:42:21,712 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 15227 ms for 1 collections, 8476991784
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:42:43,068 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 21199 ms for 1 collections, 8478612392
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:42:58,120 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 14917 ms for 1 collections, 8481096608
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:43:31,975 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 33698 ms for 2 collections, 8484881064
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:43:47,339 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 15239 ms for 1 collections, 8485352000
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:44:24,136 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 36576 ms for 2 collections, 8489333048
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:45:01,187 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 36861 ms for 2 collections, 8491277224
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:45:22,650 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 21369 ms for 1 collections, 8493227920
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:45:59,412 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 36616 ms for 2 collections, 8494231408
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:47:27,072 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 87408 ms for 5 collections, 8497634840
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:48:03,811 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 36654 ms for 2 collections, 8499869936
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:50:15,419 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 131381 ms for 7 collections, 8504981328
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:52:05,398 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 109789 ms for 6 collections, 8505124200
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:54:25,825 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 140241 ms for 8 collections, 8503231888
used; max is 8506048512
 INFO [ScheduledTasks:1] 2014-06-19 19:56:13,206 GCInspector.java (line
116) GC for ConcurrentMarkSweep: 106587 ms for 7 collections, 784762904
used; max is 8506048512

Best regards,
Marcelo Valle.



2014-06-19 16:10 GMT-03:00 Pavel Kogan <pa...@cortica.com>:

> What a coincidence! Today happened in my cluster of 7 nodes as well.
>
> Regards,
>   Pavel
>
>
> On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <
> marcelo@s1mbi0se.com.br> wrote:
>
>> I have a 10 node cluster with cassandra 2.0.8.
>>
>> I am taking this exceptions in the log when I run my code. What my code
>> does is just reading data from a CF and in some cases it writes new data.
>>
>>  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
>> BatchStatement.java (line 228) Batch of prepared statements for
>> [identification1.entity, identification1.entity_lookup] is of size 6165,
>> exceeding specified threshold of 5120 by 1045.
>>  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
>> BatchStatement.java (line 228) Batch of prepared statements for
>> [identification1.entity, identification1.entity_lookup] is of size 21266,
>> exceeding specified threshold of 5120 by 16146.
>>  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
>> BatchStatement.java (line 228) Batch of prepared statements for
>> [identification1.entity, identification1.entity_lookup] is of size 22978,
>> exceeding specified threshold of 5120 by 17858.
>>  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481)
>> CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
>> 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
>> 1024 cells
>>
>> After some time, one node of the cluster goes down. Then it goes back
>> after some seconds and another node goes down. It keeps happening and there
>> is always a node down in the cluster, when it goes back another one falls.
>>
>> The only exceptions I see in the log is "connected reset by the peer",
>> which seems to be relative to gossip protocol, when a node goes down.
>>
>> Any hint of what could I do to investigate this problem further?
>>
>> Best regards,
>> Marcelo Valle.
>>
>
>

Re: Batch of prepared statements exceeding specified threshold

Posted by Pavel Kogan <pa...@cortica.com>.

What a coincidence! Today happened in my cluster of 7 nodes as well.

Regards,
  Pavel


On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <
marcelo@s1mbi0se.com.br> wrote:

> I have a 10 node cluster with cassandra 2.0.8.
>
> I am taking this exceptions in the log when I run my code. What my code
> does is just reading data from a CF and in some cases it writes new data.
>
>  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
> BatchStatement.java (line 228) Batch of prepared statements for
> [identification1.entity, identification1.entity_lookup] is of size 6165,
> exceeding specified threshold of 5120 by 1045.
>  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
> BatchStatement.java (line 228) Batch of prepared statements for
> [identification1.entity, identification1.entity_lookup] is of size 21266,
> exceeding specified threshold of 5120 by 16146.
>  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
> BatchStatement.java (line 228) Batch of prepared statements for
> [identification1.entity, identification1.entity_lookup] is of size 22978,
> exceeding specified threshold of 5120 by 17858.
>  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481)
> CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
> 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
> 1024 cells
>
> After some time, one node of the cluster goes down. Then it goes back
> after some seconds and another node goes down. It keeps happening and there
> is always a node down in the cluster, when it goes back another one falls.
>
> The only exceptions I see in the log is "connected reset by the peer",
> which seems to be relative to gossip protocol, when a node goes down.
>
> Any hint of what could I do to investigate this problem further?
>
> Best regards,
> Marcelo Valle.
>