You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jiri Horky <ho...@avast.com> on 2015/02/08 21:44:22 UTC

High GC activity on node with 4TB on data

Hi all,

we are seeing quite high GC pressure (in old space by CMS GC Algorithm)
on a node with 4TB of data. It runs C* 1.2.18 with 12G of heap memory
(2G for new space). The node runs fine for couple of days when the GC
activity starts to raise and reaches about 15% of the C* activity which
causes dropped messages and other problems.

Taking a look at heap dump, there is about 8G used by SSTableReader
classes in org.apache.cassandra.io.compress.CompressedRandomAccessReader.

Is this something expected and we have just reached the limit of how
many data a single Cassandra instance can handle or it is possible to
tune it better?

Regards
Jiri Horky

Re: High GC activity on node with 4TB on data

Posted by Francois Richard <fr...@yahoo-inc.com>.

Hi Jiri,
We do run multiple nodes with 2TB to 4TB of data and we will usually see GC pressure when we create a lot of tombstones.
With Cassandra 2.0.x you would be able to see a log with the following pattern:
WARN [ReadStage:7] 2015-02-08 22:55:09,621 SliceQueryFilter.java (line 225) Read 939 live and 1017 tombstoned cells in SyncCore.ContactInformation (see tombstone_warn_threshold). 1000 columns was requested, slices=[-], delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}
This basically indicates that you add some major deletions for a given row.
Thanks,

FR
     From: Mark Reddy <ma...@gmail.com>
 To: user@cassandra.apache.org 
Cc: cassandra-user@apache.org; FF Systems <ff...@avast.com> 
 Sent: Sunday, February 8, 2015 1:32 PM
 Subject: Re: High GC activity on node with 4TB on data

Hey Jiri, 
While I don't have any experience running 4TB nodes (yet), I would recommend taking a look at a presentation by Arron Morton on large nodes: http://planetcassandra.org/blog/cassandra-community-webinar-videoslides-large-nodes-with-cassandra-by-aaron-morton/ to see if you can glean anything from that.
I would note that at the start of his talk he mentions that in version 1.2 we can now talk about nodes around 1 - 3 TB in size, so if you are storing anything more than that you are getting into very specialised use cases.
If you could provide us with some more information about your cluster setup (No. of CFs, read/write patterns, do you delete / update often, etc.) that may help in getting you to a better place.

Regards,
Mark

On 8 February 2015 at 21:10, Kevin Burton <bu...@spinn3r.com> wrote:

Do you have a lot of individual tables?  Or lots of small compactions?
I think the general consensus is that (at least for Cassandra), 8GB heaps are ideal.  
If you have lots of small tables it’s a known anti-pattern (I believe) because the Cassandra internals could do a better job on handling the in memory metadata representation.
I think this has been improved in 2.0 and 2.1 though so the fact that you’re on 1.2.18 could exasperate the issue.  You might want to consider an upgrade (though that has its own issues as well).
On Sun, Feb 8, 2015 at 12:44 PM, Jiri Horky <ho...@avast.com> wrote:

Hi all,

we are seeing quite high GC pressure (in old space by CMS GC Algorithm)
on a node with 4TB of data. It runs C* 1.2.18 with 12G of heap memory
(2G for new space). The node runs fine for couple of days when the GC
activity starts to raise and reaches about 15% of the C* activity which
causes dropped messages and other problems.

Taking a look at heap dump, there is about 8G used by SSTableReader
classes in org.apache.cassandra.io.compress.CompressedRandomAccessReader.

Is this something expected and we have just reached the limit of how
many data a single Cassandra instance can handle or it is possible to
tune it better?

Regards
Jiri Horky

-- 
Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com… or check out my Google+ profile

Re: High GC activity on node with 4TB on data

Posted by Arya Goudarzi <go...@gmail.com>.

Sorry to jump on this late. GC is one of my favorite topics. A while ago I
wrote a blob post about C* GC tuning and documented several issues that I
had experienced. It seems it has helped some people in that past, so I am
sharing it here:

http://aryanet.com/blog/cassandra-garbage-collector-tuning



On Thu, Feb 12, 2015 at 11:08 AM, Jiri Horky <ho...@avast.com> wrote:

>  Number of cores: 2x6Cores x 2(HT).
>
> I do agree with you that the the hardware is certainly overestimated for
> just one Cassandra, but we got a very good price since we ordered several
> 10s of the same nodes for a different project. That's why we use for
> multiple cassandra instances.
>
> Jirka H.
>
>
> On 02/12/2015 04:18 PM, Eric Stevens wrote:
>
> > each node has 256G of memory, 24x1T drives, 2x Xeon CPU
>
>  I don't have first hand experience running Cassandra on such massive
> hardware, but it strikes me that these machines are dramatically oversized
> to be good candidates for Cassandra (though I wonder how many cores are in
> those CPUs; I'm guessing closer to 18 than 2 based on the other hardware).
>
>  A larger cluster of smaller hardware would be a much better shape for
> Cassandra.  Or several clusters of smaller hardware since you're running
> multiple instances on this hardware - best practices have one instance per
> host no matter the hardware size.
>
> On Thu, Feb 12, 2015 at 12:36 AM, Jiri Horky <ho...@avast.com> wrote:
>
>>  Hi Chris,
>>
>> On 02/09/2015 04:22 PM, Chris Lohfink wrote:
>>
>>   - number of tombstones - how can I reliably find it out?
>>  https://github.com/spotify/cassandra-opstools
>>  https://github.com/cloudian/support-tools
>>
>>  thanks.
>>
>>
>>  If not getting much compression it may be worth trying to disable it,
>> it may contribute but its very unlikely that its the cause of the gc
>> pressure itself.
>>
>>  7000 sstables but STCS? Sounds like compactions couldn't keep up.  Do
>> you have a lot of pending compactions (nodetool)?  You may want to increase
>> your compaction throughput (nodetool) to see if you can catch up a little,
>> it would cause a lot of heap overhead to do reads with that many.  May even
>> need to take more drastic measures if it cant catch back up.
>>
>>  I am sorry, I was wrong. We actually do use LCS (the switch was done
>> recently). There are almost none pending compaction. We have increased the
>> size sstable to 768M, so it should help as as well.
>>
>>
>>  May also be good to check `nodetool cfstats` for very wide partitions.
>>
>>
>>  There are basically none, this is fine.
>>
>> It seems that the problem really comes from having so much data in so
>> many sstables, so
>> org.apache.cassandra.io.compress.CompressedRandomAccessReader classes
>> consumes more memory than 0.75*HEAP_SIZE, which triggers the CMS over and
>> over.
>>
>> We have turned off the compression and so far, the situation seems to be
>> fine.
>>
>> Cheers
>> Jirka H.
>>
>>
>>
>>  Theres a good chance if under load and you have over 8gb heap your GCs
>> could use tuning.  The bigger the nodes the more manual tweaking it will
>> require to get the most out of them
>> https://issues.apache.org/jira/browse/CASSANDRA-8150 also has some
>> ideas.
>>
>>  Chris
>>
>> On Mon, Feb 9, 2015 at 2:00 AM, Jiri Horky <ho...@avast.com> wrote:
>>
>>>  Hi all,
>>>
>>> thank you all for the info.
>>>
>>> To answer the questions:
>>>  - we have 2 DCs with 5 nodes in each, each node has 256G of memory,
>>> 24x1T drives, 2x Xeon CPU - there are multiple cassandra instances running
>>> for different project. The node itself is powerful enough.
>>>  - there 2 keyspaces, one with 3 replicas per DC, one with 1 replica per
>>> DC (because of amount of data and because it serves more or less like a
>>> cache)
>>>  - there are about 4k/s Request-response, 3k/s Read and 2k/s Mutation
>>> requests  - numbers are sum of all nodes
>>>  - we us STCS (LCS would be quite IO have for this amount of data)
>>>  - number of tombstones - how can I reliably find it out?
>>>  - the biggest CF (3.6T per node) has 7000 sstables
>>>
>>> Now, I understand that the best practice for Cassandra is to run "with
>>> the minimum size of heap which is enough" which for this case we thought is
>>> about 12G - there is always 8G consumbed by the SSTable readers. Also, I
>>> though that high number of tombstones create pressure in the new space
>>> (which can then cause pressure in old space as well), but this is not what
>>> we are seeing. We see continuous GC activity in Old generation only.
>>>
>>> Also, I noticed that the biggest CF has Compression factor of 0.99 which
>>> basically means that the data come compressed already. Do you think that
>>> turning off the compression should help with memory consumption?
>>>
>>> Also, I think that tuning CMSInitiatingOccupancyFraction=75 might help
>>> here, as it seems that 8G is something that Cassandra needs for bookkeeping
>>> this amount of data and that this was sligtly above the 75% limit which
>>> triggered the CMS again and again.
>>>
>>> I will definitely have a look at the presentation.
>>>
>>> Regards
>>> Jiri Horky
>>>
>>>
>>> On 02/08/2015 10:32 PM, Mark Reddy wrote:
>>>
>>> Hey Jiri,
>>>
>>>  While I don't have any experience running 4TB nodes (yet), I would
>>> recommend taking a look at a presentation by Arron Morton on large nodes:
>>> http://planetcassandra.org/blog/cassandra-community-webinar-videoslides-large-nodes-with-cassandra-by-aaron-morton/
>>> to see if you can glean anything from that.
>>>
>>>  I would note that at the start of his talk he mentions that in version
>>> 1.2 we can now talk about nodes around 1 - 3 TB in size, so if you are
>>> storing anything more than that you are getting into very specialised use
>>> cases.
>>>
>>>  If you could provide us with some more information about your cluster
>>> setup (No. of CFs, read/write patterns, do you delete / update often, etc.)
>>> that may help in getting you to a better place.
>>>
>>>
>>>  Regards,
>>> Mark
>>>
>>> On 8 February 2015 at 21:10, Kevin Burton <bu...@spinn3r.com> wrote:
>>>
>>>> Do you have a lot of individual tables?  Or lots of small compactions?
>>>>
>>>>  I think the general consensus is that (at least for Cassandra), 8GB
>>>> heaps are ideal.
>>>>
>>>>  If you have lots of small tables it’s a known anti-pattern (I
>>>> believe) because the Cassandra internals could do a better job on handling
>>>> the in memory metadata representation.
>>>>
>>>>  I think this has been improved in 2.0 and 2.1 though so the fact that
>>>> you’re on 1.2.18 could exasperate the issue.  You might want to consider an
>>>> upgrade (though that has its own issues as well).
>>>>
>>>> On Sun, Feb 8, 2015 at 12:44 PM, Jiri Horky <ho...@avast.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> we are seeing quite high GC pressure (in old space by CMS GC Algorithm)
>>>>> on a node with 4TB of data. It runs C* 1.2.18 with 12G of heap memory
>>>>> (2G for new space). The node runs fine for couple of days when the GC
>>>>> activity starts to raise and reaches about 15% of the C* activity which
>>>>> causes dropped messages and other problems.
>>>>>
>>>>> Taking a look at heap dump, there is about 8G used by SSTableReader
>>>>> classes in
>>>>> org.apache.cassandra.io.compress.CompressedRandomAccessReader.
>>>>>
>>>>> Is this something expected and we have just reached the limit of how
>>>>> many data a single Cassandra instance can handle or it is possible to
>>>>> tune it better?
>>>>>
>>>>> Regards
>>>>> Jiri Horky
>>>>>
>>>>
>>>>
>>>>
>>>>   --
>>>>   Founder/CEO Spinn3r.com
>>>>  Location: *San Francisco, CA*
>>>>  blog: http://burtonator.wordpress.com
>>>> … or check out my Google+ profile
>>>> <https://plus.google.com/102718274791889610666/posts>
>>>>  <http://spinn3r.com>
>>>>
>>>
>>>
>>>
>>
>>
>
>


-- 
Cheers,
-Arya

Re: High GC activity on node with 4TB on data

Posted by Jiri Horky <ho...@avast.com>.

Number of cores: 2x6Cores x 2(HT).

I do agree with you that the the hardware is certainly overestimated for
just one Cassandra, but we got a very good price since we ordered
several 10s of the same nodes for a different project. That's why we use
for multiple cassandra instances.

Jirka H.

On 02/12/2015 04:18 PM, Eric Stevens wrote:
> > each node has 256G of memory, 24x1T drives, 2x Xeon CPU
>
> I don't have first hand experience running Cassandra on such massive
> hardware, but it strikes me that these machines are dramatically
> oversized to be good candidates for Cassandra (though I wonder how
> many cores are in those CPUs; I'm guessing closer to 18 than 2 based
> on the other hardware).
>
> A larger cluster of smaller hardware would be a much better shape for
> Cassandra.  Or several clusters of smaller hardware since you're
> running multiple instances on this hardware - best practices have one
> instance per host no matter the hardware size.
>
> On Thu, Feb 12, 2015 at 12:36 AM, Jiri Horky <horky@avast.com
> <ma...@avast.com>> wrote:
>
>     Hi Chris,
>
>     On 02/09/2015 04:22 PM, Chris Lohfink wrote:
>>      - number of tombstones - how can I reliably find it out?
>>     https://github.com/spotify/cassandra-opstools
>>     https://github.com/cloudian/support-tools
>     thanks.
>>
>>     If not getting much compression it may be worth trying to disable
>>     it, it may contribute but its very unlikely that its the cause of
>>     the gc pressure itself.
>>
>>     7000 sstables but STCS? Sounds like compactions couldn't keep
>>     up.  Do you have a lot of pending compactions (nodetool)?  You
>>     may want to increase your compaction throughput (nodetool) to see
>>     if you can catch up a little, it would cause a lot of heap
>>     overhead to do reads with that many.  May even need to take more
>>     drastic measures if it cant catch back up.
>     I am sorry, I was wrong. We actually do use LCS (the switch was
>     done recently). There are almost none pending compaction. We have
>     increased the size sstable to 768M, so it should help as as well.
>
>>
>>     May also be good to check `nodetool cfstats` for very wide
>>     partitions.  
>     There are basically none, this is fine.
>
>     It seems that the problem really comes from having so much data in
>     so many sstables, so
>     org.apache.cassandra.io.compress.CompressedRandomAccessReader
>     classes consumes more memory than 0.75*HEAP_SIZE, which triggers
>     the CMS over and over.
>
>     We have turned off the compression and so far, the situation seems
>     to be fine.
>
>     Cheers
>     Jirka H.
>
>
>>
>>     Theres a good chance if under load and you have over 8gb heap
>>     your GCs could use tuning.  The bigger the nodes the more manual
>>     tweaking it will require to get the most out of
>>     them https://issues.apache.org/jira/browse/CASSANDRA-8150 also
>>     has some ideas.
>>
>>     Chris
>>
>>     On Mon, Feb 9, 2015 at 2:00 AM, Jiri Horky <horky@avast.com
>>     <ma...@avast.com>> wrote:
>>
>>         Hi all,
>>
>>         thank you all for the info.
>>
>>         To answer the questions:
>>          - we have 2 DCs with 5 nodes in each, each node has 256G of
>>         memory, 24x1T drives, 2x Xeon CPU - there are multiple
>>         cassandra instances running for different project. The node
>>         itself is powerful enough.
>>          - there 2 keyspaces, one with 3 replicas per DC, one with 1
>>         replica per DC (because of amount of data and because it
>>         serves more or less like a cache)
>>          - there are about 4k/s Request-response, 3k/s Read and 2k/s
>>         Mutation requests  - numbers are sum of all nodes
>>          - we us STCS (LCS would be quite IO have for this amount of
>>         data)
>>          - number of tombstones - how can I reliably find it out?
>>          - the biggest CF (3.6T per node) has 7000 sstables
>>
>>         Now, I understand that the best practice for Cassandra is to
>>         run "with the minimum size of heap which is enough" which for
>>         this case we thought is about 12G - there is always 8G
>>         consumbed by the SSTable readers. Also, I though that high
>>         number of tombstones create pressure in the new space (which
>>         can then cause pressure in old space as well), but this is
>>         not what we are seeing. We see continuous GC activity in Old
>>         generation only.
>>
>>         Also, I noticed that the biggest CF has Compression factor of
>>         0.99 which basically means that the data come compressed
>>         already. Do you think that turning off the compression should
>>         help with memory consumption?
>>
>>         Also, I think that tuning CMSInitiatingOccupancyFraction=75
>>         might help here, as it seems that 8G is something that
>>         Cassandra needs for bookkeeping this amount of data and that
>>         this was sligtly above the 75% limit which triggered the CMS
>>         again and again.
>>
>>         I will definitely have a look at the presentation.
>>
>>         Regards
>>         Jiri Horky
>>
>>
>>         On 02/08/2015 10:32 PM, Mark Reddy wrote:
>>>         Hey Jiri, 
>>>
>>>         While I don't have any experience running 4TB nodes (yet), I
>>>         would recommend taking a look at a presentation by Arron
>>>         Morton on large
>>>         nodes: http://planetcassandra.org/blog/cassandra-community-webinar-videoslides-large-nodes-with-cassandra-by-aaron-morton/
>>>         to see if you can glean anything from that.
>>>
>>>         I would note that at the start of his talk he mentions that
>>>         in version 1.2 we can now talk about nodes around 1 - 3 TB
>>>         in size, so if you are storing anything more than that you
>>>         are getting into very specialised use cases.
>>>
>>>         If you could provide us with some more information about
>>>         your cluster setup (No. of CFs, read/write patterns, do you
>>>         delete / update often, etc.) that may help in getting you to
>>>         a better place.
>>>
>>>
>>>         Regards,
>>>         Mark
>>>
>>>         On 8 February 2015 at 21:10, Kevin Burton
>>>         <burton@spinn3r.com <ma...@spinn3r.com>> wrote:
>>>
>>>             Do you have a lot of individual tables?  Or lots of
>>>             small compactions?
>>>
>>>             I think the general consensus is that (at least for
>>>             Cassandra), 8GB heaps are ideal.  
>>>
>>>             If you have lots of small tables it’s a known
>>>             anti-pattern (I believe) because the Cassandra internals
>>>             could do a better job on handling the in memory metadata
>>>             representation.
>>>
>>>             I think this has been improved in 2.0 and 2.1 though so
>>>             the fact that you’re on 1.2.18 could exasperate the
>>>             issue.  You might want to consider an upgrade (though
>>>             that has its own issues as well).
>>>
>>>             On Sun, Feb 8, 2015 at 12:44 PM, Jiri Horky
>>>             <horky@avast.com <ma...@avast.com>> wrote:
>>>
>>>                 Hi all,
>>>
>>>                 we are seeing quite high GC pressure (in old space
>>>                 by CMS GC Algorithm)
>>>                 on a node with 4TB of data. It runs C* 1.2.18 with
>>>                 12G of heap memory
>>>                 (2G for new space). The node runs fine for couple of
>>>                 days when the GC
>>>                 activity starts to raise and reaches about 15% of
>>>                 the C* activity which
>>>                 causes dropped messages and other problems.
>>>
>>>                 Taking a look at heap dump, there is about 8G used
>>>                 by SSTableReader
>>>                 classes in
>>>                 org.apache.cassandra.io.compress.CompressedRandomAccessReader.
>>>
>>>                 Is this something expected and we have just reached
>>>                 the limit of how
>>>                 many data a single Cassandra instance can handle or
>>>                 it is possible to
>>>                 tune it better?
>>>
>>>                 Regards
>>>                 Jiri Horky
>>>
>>>
>>>
>>>
>>>             -- 
>>>             Founder/CEO Spinn3r.com <http://Spinn3r.com>
>>>             Location: *San Francisco, CA*
>>>             blog:* *http://burtonator.wordpress.com
>>>             … or check out my Google+ profile
>>>             <https://plus.google.com/102718274791889610666/posts>
>>>             <http://spinn3r.com>
>>>
>>>
>>
>>
>
>

Re: High GC activity on node with 4TB on data

Posted by Eric Stevens <mi...@gmail.com>.

> each node has 256G of memory, 24x1T drives, 2x Xeon CPU

I don't have first hand experience running Cassandra on such massive
hardware, but it strikes me that these machines are dramatically oversized
to be good candidates for Cassandra (though I wonder how many cores are in
those CPUs; I'm guessing closer to 18 than 2 based on the other hardware).

A larger cluster of smaller hardware would be a much better shape for
Cassandra.  Or several clusters of smaller hardware since you're running
multiple instances on this hardware - best practices have one instance per
host no matter the hardware size.

On Thu, Feb 12, 2015 at 12:36 AM, Jiri Horky <ho...@avast.com> wrote:

>  Hi Chris,
>
> On 02/09/2015 04:22 PM, Chris Lohfink wrote:
>
>   - number of tombstones - how can I reliably find it out?
>  https://github.com/spotify/cassandra-opstools
>  https://github.com/cloudian/support-tools
>
> thanks.
>
>
>  If not getting much compression it may be worth trying to disable it, it
> may contribute but its very unlikely that its the cause of the gc pressure
> itself.
>
>  7000 sstables but STCS? Sounds like compactions couldn't keep up.  Do
> you have a lot of pending compactions (nodetool)?  You may want to increase
> your compaction throughput (nodetool) to see if you can catch up a little,
> it would cause a lot of heap overhead to do reads with that many.  May even
> need to take more drastic measures if it cant catch back up.
>
> I am sorry, I was wrong. We actually do use LCS (the switch was done
> recently). There are almost none pending compaction. We have increased the
> size sstable to 768M, so it should help as as well.
>
>
>  May also be good to check `nodetool cfstats` for very wide partitions.
>
> There are basically none, this is fine.
>
> It seems that the problem really comes from having so much data in so many
> sstables, so org.apache.cassandra.io.compress.CompressedRandomAccessReader
> classes consumes more memory than 0.75*HEAP_SIZE, which triggers the CMS
> over and over.
>
> We have turned off the compression and so far, the situation seems to be
> fine.
>
> Cheers
> Jirka H.
>
>
>
>  Theres a good chance if under load and you have over 8gb heap your GCs
> could use tuning.  The bigger the nodes the more manual tweaking it will
> require to get the most out of them
> https://issues.apache.org/jira/browse/CASSANDRA-8150 also has some ideas.
>
>  Chris
>
> On Mon, Feb 9, 2015 at 2:00 AM, Jiri Horky <ho...@avast.com> wrote:
>
>>  Hi all,
>>
>> thank you all for the info.
>>
>> To answer the questions:
>>  - we have 2 DCs with 5 nodes in each, each node has 256G of memory,
>> 24x1T drives, 2x Xeon CPU - there are multiple cassandra instances running
>> for different project. The node itself is powerful enough.
>>  - there 2 keyspaces, one with 3 replicas per DC, one with 1 replica per
>> DC (because of amount of data and because it serves more or less like a
>> cache)
>>  - there are about 4k/s Request-response, 3k/s Read and 2k/s Mutation
>> requests  - numbers are sum of all nodes
>>  - we us STCS (LCS would be quite IO have for this amount of data)
>>  - number of tombstones - how can I reliably find it out?
>>  - the biggest CF (3.6T per node) has 7000 sstables
>>
>> Now, I understand that the best practice for Cassandra is to run "with
>> the minimum size of heap which is enough" which for this case we thought is
>> about 12G - there is always 8G consumbed by the SSTable readers. Also, I
>> though that high number of tombstones create pressure in the new space
>> (which can then cause pressure in old space as well), but this is not what
>> we are seeing. We see continuous GC activity in Old generation only.
>>
>> Also, I noticed that the biggest CF has Compression factor of 0.99 which
>> basically means that the data come compressed already. Do you think that
>> turning off the compression should help with memory consumption?
>>
>> Also, I think that tuning CMSInitiatingOccupancyFraction=75 might help
>> here, as it seems that 8G is something that Cassandra needs for bookkeeping
>> this amount of data and that this was sligtly above the 75% limit which
>> triggered the CMS again and again.
>>
>> I will definitely have a look at the presentation.
>>
>> Regards
>> Jiri Horky
>>
>>
>> On 02/08/2015 10:32 PM, Mark Reddy wrote:
>>
>> Hey Jiri,
>>
>>  While I don't have any experience running 4TB nodes (yet), I would
>> recommend taking a look at a presentation by Arron Morton on large nodes:
>> http://planetcassandra.org/blog/cassandra-community-webinar-videoslides-large-nodes-with-cassandra-by-aaron-morton/
>> to see if you can glean anything from that.
>>
>>  I would note that at the start of his talk he mentions that in version
>> 1.2 we can now talk about nodes around 1 - 3 TB in size, so if you are
>> storing anything more than that you are getting into very specialised use
>> cases.
>>
>>  If you could provide us with some more information about your cluster
>> setup (No. of CFs, read/write patterns, do you delete / update often, etc.)
>> that may help in getting you to a better place.
>>
>>
>>  Regards,
>> Mark
>>
>> On 8 February 2015 at 21:10, Kevin Burton <bu...@spinn3r.com> wrote:
>>
>>> Do you have a lot of individual tables?  Or lots of small compactions?
>>>
>>>  I think the general consensus is that (at least for Cassandra), 8GB
>>> heaps are ideal.
>>>
>>>  If you have lots of small tables it’s a known anti-pattern (I believe)
>>> because the Cassandra internals could do a better job on handling the in
>>> memory metadata representation.
>>>
>>>  I think this has been improved in 2.0 and 2.1 though so the fact that
>>> you’re on 1.2.18 could exasperate the issue.  You might want to consider an
>>> upgrade (though that has its own issues as well).
>>>
>>> On Sun, Feb 8, 2015 at 12:44 PM, Jiri Horky <ho...@avast.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> we are seeing quite high GC pressure (in old space by CMS GC Algorithm)
>>>> on a node with 4TB of data. It runs C* 1.2.18 with 12G of heap memory
>>>> (2G for new space). The node runs fine for couple of days when the GC
>>>> activity starts to raise and reaches about 15% of the C* activity which
>>>> causes dropped messages and other problems.
>>>>
>>>> Taking a look at heap dump, there is about 8G used by SSTableReader
>>>> classes in
>>>> org.apache.cassandra.io.compress.CompressedRandomAccessReader.
>>>>
>>>> Is this something expected and we have just reached the limit of how
>>>> many data a single Cassandra instance can handle or it is possible to
>>>> tune it better?
>>>>
>>>> Regards
>>>> Jiri Horky
>>>>
>>>
>>>
>>>
>>>   --
>>>   Founder/CEO Spinn3r.com
>>>  Location: *San Francisco, CA*
>>>  blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> <https://plus.google.com/102718274791889610666/posts>
>>>  <http://spinn3r.com>
>>>
>>
>>
>>
>
>

Re: High GC activity on node with 4TB on data

Posted by Jiri Horky <ho...@avast.com>.

Hi Chris,

On 02/09/2015 04:22 PM, Chris Lohfink wrote:
>  - number of tombstones - how can I reliably find it out?
> https://github.com/spotify/cassandra-opstools
> https://github.com/cloudian/support-tools
thanks.
>
> If not getting much compression it may be worth trying to disable it,
> it may contribute but its very unlikely that its the cause of the gc
> pressure itself.
>
> 7000 sstables but STCS? Sounds like compactions couldn't keep up.  Do
> you have a lot of pending compactions (nodetool)?  You may want to
> increase your compaction throughput (nodetool) to see if you can catch
> up a little, it would cause a lot of heap overhead to do reads with
> that many.  May even need to take more drastic measures if it cant
> catch back up.
I am sorry, I was wrong. We actually do use LCS (the switch was done
recently). There are almost none pending compaction. We have increased
the size sstable to 768M, so it should help as as well.

>
> May also be good to check `nodetool cfstats` for very wide partitions.  
There are basically none, this is fine.

It seems that the problem really comes from having so much data in so
many sstables, so
org.apache.cassandra.io.compress.CompressedRandomAccessReader classes
consumes more memory than 0.75*HEAP_SIZE, which triggers the CMS over
and over.

We have turned off the compression and so far, the situation seems to be
fine.

Cheers
Jirka H.

>
> Theres a good chance if under load and you have over 8gb heap your GCs
> could use tuning.  The bigger the nodes the more manual tweaking it
> will require to get the most out of
> them https://issues.apache.org/jira/browse/CASSANDRA-8150 also has
> some ideas.
>
> Chris
>
> On Mon, Feb 9, 2015 at 2:00 AM, Jiri Horky <horky@avast.com
> <ma...@avast.com>> wrote:
>
>     Hi all,
>
>     thank you all for the info.
>
>     To answer the questions:
>      - we have 2 DCs with 5 nodes in each, each node has 256G of
>     memory, 24x1T drives, 2x Xeon CPU - there are multiple cassandra
>     instances running for different project. The node itself is
>     powerful enough.
>      - there 2 keyspaces, one with 3 replicas per DC, one with 1
>     replica per DC (because of amount of data and because it serves
>     more or less like a cache)
>      - there are about 4k/s Request-response, 3k/s Read and 2k/s
>     Mutation requests  - numbers are sum of all nodes
>      - we us STCS (LCS would be quite IO have for this amount of data)
>      - number of tombstones - how can I reliably find it out?
>      - the biggest CF (3.6T per node) has 7000 sstables
>
>     Now, I understand that the best practice for Cassandra is to run
>     "with the minimum size of heap which is enough" which for this
>     case we thought is about 12G - there is always 8G consumbed by the
>     SSTable readers. Also, I though that high number of tombstones
>     create pressure in the new space (which can then cause pressure in
>     old space as well), but this is not what we are seeing. We see
>     continuous GC activity in Old generation only.
>
>     Also, I noticed that the biggest CF has Compression factor of 0.99
>     which basically means that the data come compressed already. Do
>     you think that turning off the compression should help with memory
>     consumption?
>
>     Also, I think that tuning CMSInitiatingOccupancyFraction=75 might
>     help here, as it seems that 8G is something that Cassandra needs
>     for bookkeeping this amount of data and that this was sligtly
>     above the 75% limit which triggered the CMS again and again.
>
>     I will definitely have a look at the presentation.
>
>     Regards
>     Jiri Horky
>
>
>     On 02/08/2015 10:32 PM, Mark Reddy wrote:
>>     Hey Jiri, 
>>
>>     While I don't have any experience running 4TB nodes (yet), I
>>     would recommend taking a look at a presentation by Arron Morton
>>     on large
>>     nodes: http://planetcassandra.org/blog/cassandra-community-webinar-videoslides-large-nodes-with-cassandra-by-aaron-morton/
>>     to see if you can glean anything from that.
>>
>>     I would note that at the start of his talk he mentions that in
>>     version 1.2 we can now talk about nodes around 1 - 3 TB in size,
>>     so if you are storing anything more than that you are getting
>>     into very specialised use cases.
>>
>>     If you could provide us with some more information about your
>>     cluster setup (No. of CFs, read/write patterns, do you delete /
>>     update often, etc.) that may help in getting you to a better place.
>>
>>
>>     Regards,
>>     Mark
>>
>>     On 8 February 2015 at 21:10, Kevin Burton <burton@spinn3r.com
>>     <ma...@spinn3r.com>> wrote:
>>
>>         Do you have a lot of individual tables?  Or lots of small
>>         compactions?
>>
>>         I think the general consensus is that (at least for
>>         Cassandra), 8GB heaps are ideal.  
>>
>>         If you have lots of small tables it’s a known anti-pattern (I
>>         believe) because the Cassandra internals could do a better
>>         job on handling the in memory metadata representation.
>>
>>         I think this has been improved in 2.0 and 2.1 though so the
>>         fact that you’re on 1.2.18 could exasperate the issue.  You
>>         might want to consider an upgrade (though that has its own
>>         issues as well).
>>
>>         On Sun, Feb 8, 2015 at 12:44 PM, Jiri Horky <horky@avast.com
>>         <ma...@avast.com>> wrote:
>>
>>             Hi all,
>>
>>             we are seeing quite high GC pressure (in old space by CMS
>>             GC Algorithm)
>>             on a node with 4TB of data. It runs C* 1.2.18 with 12G of
>>             heap memory
>>             (2G for new space). The node runs fine for couple of days
>>             when the GC
>>             activity starts to raise and reaches about 15% of the C*
>>             activity which
>>             causes dropped messages and other problems.
>>
>>             Taking a look at heap dump, there is about 8G used by
>>             SSTableReader
>>             classes in
>>             org.apache.cassandra.io.compress.CompressedRandomAccessReader.
>>
>>             Is this something expected and we have just reached the
>>             limit of how
>>             many data a single Cassandra instance can handle or it is
>>             possible to
>>             tune it better?
>>
>>             Regards
>>             Jiri Horky
>>
>>
>>
>>
>>         -- 
>>         Founder/CEO Spinn3r.com <http://Spinn3r.com>
>>         Location: *San Francisco, CA*
>>         blog:* *http://burtonator.wordpress.com
>>         … or check out my Google+ profile
>>         <https://plus.google.com/102718274791889610666/posts>
>>         <http://spinn3r.com>
>>
>>
>
>

Re: High GC activity on node with 4TB on data

Posted by Chris Lohfink <cl...@gmail.com>.

 - number of tombstones - how can I reliably find it out?
https://github.com/spotify/cassandra-opstools
https://github.com/cloudian/support-tools

If not getting much compression it may be worth trying to disable it, it
may contribute but its very unlikely that its the cause of the gc pressure
itself.

7000 sstables but STCS? Sounds like compactions couldn't keep up.  Do you
have a lot of pending compactions (nodetool)?  You may want to increase
your compaction throughput (nodetool) to see if you can catch up a little,
it would cause a lot of heap overhead to do reads with that many.  May even
need to take more drastic measures if it cant catch back up.

May also be good to check `nodetool cfstats` for very wide partitions.

Theres a good chance if under load and you have over 8gb heap your GCs
could use tuning.  The bigger the nodes the more manual tweaking it will
require to get the most out of them
https://issues.apache.org/jira/browse/CASSANDRA-8150 also has some ideas.

Chris

On Mon, Feb 9, 2015 at 2:00 AM, Jiri Horky <ho...@avast.com> wrote:

>  Hi all,
>
> thank you all for the info.
>
> To answer the questions:
>  - we have 2 DCs with 5 nodes in each, each node has 256G of memory, 24x1T
> drives, 2x Xeon CPU - there are multiple cassandra instances running for
> different project. The node itself is powerful enough.
>  - there 2 keyspaces, one with 3 replicas per DC, one with 1 replica per
> DC (because of amount of data and because it serves more or less like a
> cache)
>  - there are about 4k/s Request-response, 3k/s Read and 2k/s Mutation
> requests  - numbers are sum of all nodes
>  - we us STCS (LCS would be quite IO have for this amount of data)
>  - number of tombstones - how can I reliably find it out?
>  - the biggest CF (3.6T per node) has 7000 sstables
>
> Now, I understand that the best practice for Cassandra is to run "with the
> minimum size of heap which is enough" which for this case we thought is
> about 12G - there is always 8G consumbed by the SSTable readers. Also, I
> though that high number of tombstones create pressure in the new space
> (which can then cause pressure in old space as well), but this is not what
> we are seeing. We see continuous GC activity in Old generation only.
>
> Also, I noticed that the biggest CF has Compression factor of 0.99 which
> basically means that the data come compressed already. Do you think that
> turning off the compression should help with memory consumption?
>
> Also, I think that tuning CMSInitiatingOccupancyFraction=75 might help
> here, as it seems that 8G is something that Cassandra needs for bookkeeping
> this amount of data and that this was sligtly above the 75% limit which
> triggered the CMS again and again.
>
> I will definitely have a look at the presentation.
>
> Regards
> Jiri Horky
>
>
> On 02/08/2015 10:32 PM, Mark Reddy wrote:
>
> Hey Jiri,
>
>  While I don't have any experience running 4TB nodes (yet), I would
> recommend taking a look at a presentation by Arron Morton on large nodes:
> http://planetcassandra.org/blog/cassandra-community-webinar-videoslides-large-nodes-with-cassandra-by-aaron-morton/
> to see if you can glean anything from that.
>
>  I would note that at the start of his talk he mentions that in version
> 1.2 we can now talk about nodes around 1 - 3 TB in size, so if you are
> storing anything more than that you are getting into very specialised use
> cases.
>
>  If you could provide us with some more information about your cluster
> setup (No. of CFs, read/write patterns, do you delete / update often, etc.)
> that may help in getting you to a better place.
>
>
>  Regards,
> Mark
>
> On 8 February 2015 at 21:10, Kevin Burton <bu...@spinn3r.com> wrote:
>
>> Do you have a lot of individual tables?  Or lots of small compactions?
>>
>>  I think the general consensus is that (at least for Cassandra), 8GB
>> heaps are ideal.
>>
>>  If you have lots of small tables it’s a known anti-pattern (I believe)
>> because the Cassandra internals could do a better job on handling the in
>> memory metadata representation.
>>
>>  I think this has been improved in 2.0 and 2.1 though so the fact that
>> you’re on 1.2.18 could exasperate the issue.  You might want to consider an
>> upgrade (though that has its own issues as well).
>>
>> On Sun, Feb 8, 2015 at 12:44 PM, Jiri Horky <ho...@avast.com> wrote:
>>
>>> Hi all,
>>>
>>> we are seeing quite high GC pressure (in old space by CMS GC Algorithm)
>>> on a node with 4TB of data. It runs C* 1.2.18 with 12G of heap memory
>>> (2G for new space). The node runs fine for couple of days when the GC
>>> activity starts to raise and reaches about 15% of the C* activity which
>>> causes dropped messages and other problems.
>>>
>>> Taking a look at heap dump, there is about 8G used by SSTableReader
>>> classes in org.apache.cassandra.io.compress.CompressedRandomAccessReader.
>>>
>>> Is this something expected and we have just reached the limit of how
>>> many data a single Cassandra instance can handle or it is possible to
>>> tune it better?
>>>
>>> Regards
>>> Jiri Horky
>>>
>>
>>
>>
>>   --
>>   Founder/CEO Spinn3r.com
>>  Location: *San Francisco, CA*
>>  blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>>  <http://spinn3r.com>
>>
>
>
>

Re: High GC activity on node with 4TB on data

Posted by Jiri Horky <ho...@avast.com>.

Hi all,

thank you all for the info.

To answer the questions:
 - we have 2 DCs with 5 nodes in each, each node has 256G of memory,
24x1T drives, 2x Xeon CPU - there are multiple cassandra instances
running for different project. The node itself is powerful enough.
 - there 2 keyspaces, one with 3 replicas per DC, one with 1 replica per
DC (because of amount of data and because it serves more or less like a
cache)
 - there are about 4k/s Request-response, 3k/s Read and 2k/s Mutation
requests  - numbers are sum of all nodes
 - we us STCS (LCS would be quite IO have for this amount of data)
 - number of tombstones - how can I reliably find it out?
 - the biggest CF (3.6T per node) has 7000 sstables

Now, I understand that the best practice for Cassandra is to run "with
the minimum size of heap which is enough" which for this case we thought
is about 12G - there is always 8G consumbed by the SSTable readers.
Also, I though that high number of tombstones create pressure in the new
space (which can then cause pressure in old space as well), but this is
not what we are seeing. We see continuous GC activity in Old generation
only.

Also, I noticed that the biggest CF has Compression factor of 0.99 which
basically means that the data come compressed already. Do you think that
turning off the compression should help with memory consumption?

Also, I think that tuning CMSInitiatingOccupancyFraction=75 might help
here, as it seems that 8G is something that Cassandra needs for
bookkeeping this amount of data and that this was sligtly above the 75%
limit which triggered the CMS again and again.

I will definitely have a look at the presentation.

Regards
Jiri Horky

On 02/08/2015 10:32 PM, Mark Reddy wrote:
> Hey Jiri, 
>
> While I don't have any experience running 4TB nodes (yet), I would
> recommend taking a look at a presentation by Arron Morton on large
> nodes: http://planetcassandra.org/blog/cassandra-community-webinar-videoslides-large-nodes-with-cassandra-by-aaron-morton/
> to see if you can glean anything from that.
>
> I would note that at the start of his talk he mentions that in version
> 1.2 we can now talk about nodes around 1 - 3 TB in size, so if you are
> storing anything more than that you are getting into very specialised
> use cases.
>
> If you could provide us with some more information about your cluster
> setup (No. of CFs, read/write patterns, do you delete / update often,
> etc.) that may help in getting you to a better place.
>
>
> Regards,
> Mark
>
> On 8 February 2015 at 21:10, Kevin Burton <burton@spinn3r.com
> <ma...@spinn3r.com>> wrote:
>
>     Do you have a lot of individual tables?  Or lots of small
>     compactions?
>
>     I think the general consensus is that (at least for Cassandra),
>     8GB heaps are ideal.  
>
>     If you have lots of small tables it’s a known anti-pattern (I
>     believe) because the Cassandra internals could do a better job on
>     handling the in memory metadata representation.
>
>     I think this has been improved in 2.0 and 2.1 though so the fact
>     that you’re on 1.2.18 could exasperate the issue.  You might want
>     to consider an upgrade (though that has its own issues as well).
>
>     On Sun, Feb 8, 2015 at 12:44 PM, Jiri Horky <horky@avast.com
>     <ma...@avast.com>> wrote:
>
>         Hi all,
>
>         we are seeing quite high GC pressure (in old space by CMS GC
>         Algorithm)
>         on a node with 4TB of data. It runs C* 1.2.18 with 12G of heap
>         memory
>         (2G for new space). The node runs fine for couple of days when
>         the GC
>         activity starts to raise and reaches about 15% of the C*
>         activity which
>         causes dropped messages and other problems.
>
>         Taking a look at heap dump, there is about 8G used by
>         SSTableReader
>         classes in
>         org.apache.cassandra.io.compress.CompressedRandomAccessReader.
>
>         Is this something expected and we have just reached the limit
>         of how
>         many data a single Cassandra instance can handle or it is
>         possible to
>         tune it better?
>
>         Regards
>         Jiri Horky
>
>
>
>
>     -- 
>     Founder/CEO Spinn3r.com <http://Spinn3r.com>
>     Location: *San Francisco, CA*
>     blog:* *http://burtonator.wordpress.com
>     … or check out my Google+ profile
>     <https://plus.google.com/102718274791889610666/posts>
>     <http://spinn3r.com>
>
>

Re: High GC activity on node with 4TB on data

Posted by Mark Reddy <ma...@gmail.com>.

Hey Jiri,

While I don't have any experience running 4TB nodes (yet), I would
recommend taking a look at a presentation by Arron Morton on large nodes:
http://planetcassandra.org/blog/cassandra-community-webinar-videoslides-large-nodes-with-cassandra-by-aaron-morton/
to see if you can glean anything from that.

I would note that at the start of his talk he mentions that in version 1.2
we can now talk about nodes around 1 - 3 TB in size, so if you are storing
anything more than that you are getting into very specialised use cases.

If you could provide us with some more information about your cluster setup
(No. of CFs, read/write patterns, do you delete / update often, etc.) that
may help in getting you to a better place.

Regards,
Mark

On 8 February 2015 at 21:10, Kevin Burton <bu...@spinn3r.com> wrote:

> Do you have a lot of individual tables?  Or lots of small compactions?
>
> I think the general consensus is that (at least for Cassandra), 8GB heaps
> are ideal.
>
> If you have lots of small tables it’s a known anti-pattern (I believe)
> because the Cassandra internals could do a better job on handling the in
> memory metadata representation.
>
> I think this has been improved in 2.0 and 2.1 though so the fact that
> you’re on 1.2.18 could exasperate the issue.  You might want to consider an
> upgrade (though that has its own issues as well).
>
> On Sun, Feb 8, 2015 at 12:44 PM, Jiri Horky <ho...@avast.com> wrote:
>
>> Hi all,
>>
>> we are seeing quite high GC pressure (in old space by CMS GC Algorithm)
>> on a node with 4TB of data. It runs C* 1.2.18 with 12G of heap memory
>> (2G for new space). The node runs fine for couple of days when the GC
>> activity starts to raise and reaches about 15% of the C* activity which
>> causes dropped messages and other problems.
>>
>> Taking a look at heap dump, there is about 8G used by SSTableReader
>> classes in org.apache.cassandra.io.compress.CompressedRandomAccessReader.
>>
>> Is this something expected and we have just reached the limit of how
>> many data a single Cassandra instance can handle or it is possible to
>> tune it better?
>>
>> Regards
>> Jiri Horky
>>
>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>

Re: High GC activity on node with 4TB on data

Posted by Kevin Burton <bu...@spinn3r.com>.

Do you have a lot of individual tables?  Or lots of small compactions?

I think the general consensus is that (at least for Cassandra), 8GB heaps
are ideal.

If you have lots of small tables it’s a known anti-pattern (I believe)
because the Cassandra internals could do a better job on handling the in
memory metadata representation.

I think this has been improved in 2.0 and 2.1 though so the fact that
you’re on 1.2.18 could exasperate the issue.  You might want to consider an
upgrade (though that has its own issues as well).

On Sun, Feb 8, 2015 at 12:44 PM, Jiri Horky <ho...@avast.com> wrote:

> Hi all,
>
> we are seeing quite high GC pressure (in old space by CMS GC Algorithm)
> on a node with 4TB of data. It runs C* 1.2.18 with 12G of heap memory
> (2G for new space). The node runs fine for couple of days when the GC
> activity starts to raise and reaches about 15% of the C* activity which
> causes dropped messages and other problems.
>
> Taking a look at heap dump, there is about 8G used by SSTableReader
> classes in org.apache.cassandra.io.compress.CompressedRandomAccessReader.
>
> Is this something expected and we have just reached the limit of how
> many data a single Cassandra instance can handle or it is possible to
> tune it better?
>
> Regards
> Jiri Horky
>

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Re: High GC activity on node with 4TB on data

Posted by Colin <co...@clark.ws>.

The most data I put on a node with spinning disk is 1TB.

What are the machine specs? Cpu, memory, etc and what is the read/write pattern-heavy ingest rate/heavy read rate and how ling do you keep data in the cluster?

--
Colin Clark 
+1 612 859 6129
Skype colin.p.clark

> On Feb 8, 2015, at 2:44 PM, Jiri Horky <ho...@avast.com> wrote:
> 
> Hi all,
> 
> we are seeing quite high GC pressure (in old space by CMS GC Algorithm)
> on a node with 4TB of data. It runs C* 1.2.18 with 12G of heap memory
> (2G for new space). The node runs fine for couple of days when the GC
> activity starts to raise and reaches about 15% of the C* activity which
> causes dropped messages and other problems.
> 
> Taking a look at heap dump, there is about 8G used by SSTableReader
> classes in org.apache.cassandra.io.compress.CompressedRandomAccessReader.
> 
> Is this something expected and we have just reached the limit of how
> many data a single Cassandra instance can handle or it is possible to
> tune it better?
> 
> Regards
> Jiri Horky