You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Anubhav Kale <An...@microsoft.com> on 2016/09/26 21:51:58 UTC

Repairs at scale in Cassandra 2.1.13

Hello,

We run Cassandra 2.1.13 (don't have plans to upgrade yet). What is the best way to run repairs at scale (400 nodes, each holding ~600GB) that actually works ?

I'm considering doing subrange repairs (https://github.com/BrianGallew/cassandra_range_repair/blob/master/src/range_repair.py) as I've heard from folks that incremental repairs simply don't work even in 3.x (Yeah, that's a strong statement but I heard that from multiple folks at the Summit).

Any guidance would be greatly appreciated !

Thanks,
Anubhav

Re: crash with OOM

Posted by Ben Slater <be...@instaclustr.com>.

That is a very large heap size for C* - most installations I’ve seen are
running in the 8-12MB heap range. Apparently G1GC is better for larger
heaps so that may help. However, you are probably better off digging a bit
deeper into what is using all that heap? Massive IN clause lists? Massive
multi-partition batches? Massive partitions?

Especially given it hit two nodes simultaneously I would be looking for
 rogue query as my first point of investigation.

Cheers
Ben

On Tue, 27 Sep 2016 at 17:49 xutom <xu...@126.com> wrote:

>
> Hi, all
> I have a C* cluster with 12 nodes.  My cassandra version is 2.1.14; Just
> now two nodes crashed and client fails to export data with read consistency
> QUORUM. The following are logs of failed nodes:
>
> ERROR [SharedPool-Worker-159] 2016-09-26 20:51:14,124 Message.java:538 -
> Unexpected exception during request; channel = [id: 0xce43a388, /
> 13.13.13.80:55536 :> /13.13.13.149:9042]
> java.lang.AssertionError: null
>         at
> org.apache.cassandra.transport.ServerConnection.applyStateTransition(ServerConnection.java:100)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
>         at
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:442)
> [apache-cassandra-2.1.14.jar:2.1.14]
>         at
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335)
> [apache-cassandra-2.1.14.jar:2.1.14]
>         at
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
>         at
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
> [netty-all-4.0.23.Final.jar:4.0.23.Final]
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> [na:1.7.0_65]
>         at
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
> [apache-cassandra-2.1.14.jar:2.1.14]
>         at
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [apache-cassandra-2.1.14.jar:2.1.14]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> ERROR [SharedPool-Worker-116] 2016-09-26 20:51:14,125
> JVMStabilityInspector.java:117 - JVM state determined to be unstable.
> Exiting forcefully due to:
> java.lang.OutOfMemoryError: Java heap space
> ERROR [SharedPool-Worker-121] 2016-09-26 20:51:14,125
> JVMStabilityInspector.java:117 - JVM state determined to be unstable.
> Exiting forcefully due to:
> java.lang.OutOfMemoryError: Java heap space
> ERROR [SharedPool-Worker-157] 2016-09-26 20:51:14,124 Message.java:538 -
> Unexpected exception during request; channel = [id: 0xce43a388, /
> 13.13.13.80:55536 :> /13.13.13.149:9042]
>
> My server has total 256G memory so I set the MAX_HEAP_SIZE 60G, the config
> in cassandra-env.sh:
> MAX_HEAP_SIZE="60G"
> HEAP_NEWSIZE="20G"
> How to solve such OOM?
>
>
>
>
>
-- 
————————
Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798

crash with OOM

Posted by xutom <xu...@126.com>.

Hi, all
I have a C* cluster with 12 nodes.  My cassandra version is 2.1.14; Just now two nodes crashed and client fails to export data with read consistency QUORUM. The following are logs of failed nodes:

ERROR [SharedPool-Worker-159] 2016-09-26 20:51:14,124 Message.java:538 - Unexpected exception during request; channel = [id: 0xce43a388, /13.13.13.80:55536 :> /13.13.13.149:9042]
java.lang.AssertionError: null
        at org.apache.cassandra.transport.ServerConnection.applyStateTransition(ServerConnection.java:100) ~[apache-cassandra-2.1.14.jar:2.1.14]
        at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:442) [apache-cassandra-2.1.14.jar:2.1.14]
        at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335) [apache-cassandra-2.1.14.jar:2.1.14]
        at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.23.Final.jar:4.0.23.Final]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final]
        at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) [netty-all-4.0.23.Final.jar:4.0.23.Final]
        at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) [netty-all-4.0.23.Final.jar:4.0.23.Final]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_65]
        at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) [apache-cassandra-2.1.14.jar:2.1.14]
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.1.14.jar:2.1.14]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
ERROR [SharedPool-Worker-116] 2016-09-26 20:51:14,125 JVMStabilityInspector.java:117 - JVM state determined to be unstable.  Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
ERROR [SharedPool-Worker-121] 2016-09-26 20:51:14,125 JVMStabilityInspector.java:117 - JVM state determined to be unstable.  Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
ERROR [SharedPool-Worker-157] 2016-09-26 20:51:14,124 Message.java:538 - Unexpected exception during request; channel = [id: 0xce43a388, /13.13.13.80:55536 :> /13.13.13.149:9042]

My server has total 256G memory so I set the MAX_HEAP_SIZE 60G, the config in cassandra-env.sh:
MAX_HEAP_SIZE="60G"
HEAP_NEWSIZE="20G"
How to solve such OOM?

RE: Repairs at scale in Cassandra 2.1.13

Posted by Anubhav Kale <An...@microsoft.com>.

Thanks !

For subrange repairs I have seen two approaches. For our specific requirement, we want to do repairs on a small set of keyspaces.


1.       Thrift describe_local_ring(keyspace), parse and get token ranges for a given node, split token ranges for given keyspace + table using  describe_splits_ex, and call nodetool repair subranges

a.       https://github.com/pauloricardomg/cassandra-list-subranges does it this way.

2.       Get tokens using nodetool info -T, split those, and call nodetool repair with subranges

a.       https://github.com/BrianGallew/cassandra_range_repair does it this way.

Can experts please help me understand the nuances between these APIs and which one is better / more efficient ? Since the first one is keyspace aware, I like that better since that lets us do repairs on specific keyspaces more concretely. I am leaning toward that atm.

Thanks !

From: Paulo Motta [mailto:pauloricardomg@gmail.com]
Sent: Wednesday, September 28, 2016 5:16 AM
To: user@cassandra.apache.org
Subject: Re: Repairs at scale in Cassandra 2.1.13

There were a few streaming bugs fixed between 2.1.13 and 2.1.15 (see CHANGES.txt for more details), so I'd recommend you to upgrade to 2.1.15 in order to avoid having those.

2016-09-28 9:08 GMT-03:00 Alain RODRIGUEZ <ar...@gmail.com>>:
Hi Anubhav,

I’m considering doing subrange repairs (https://github.com/BrianGallew/cassandra_range_repair/blob/master/src/range_repair.py<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBrianGallew%2Fcassandra_range_repair%2Fblob%2Fmaster%2Fsrc%2Frange_repair.py&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=w53NMlnYdbYgoAnBUS95yMEeb%2Fg%2BNH09UgMJEFaw9dE%3D&reserved=0>)

I used this script a lot, and quite successfully.

An other working option that people are using is:

https://github.com/spotify/cassandra-reaper<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspotify%2Fcassandra-reaper&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=oa8XYbIG4FtxERwioEDYw9B4tb1zHxjy5psYC6wutEs%3D&reserved=0>

Alexander, a coworker integrated an existing UI and made it compatible with incremental repairs:

Incremental repairs on Reaper: https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-that-works<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fadejanovski%2Fcassandra-reaper%2Ftree%2Finc-repair-that-works&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=kv8zpJI8c8Ibj48mjqfrLHZjiaVDBYd79uC7MDpRWLw%3D&reserved=0>
UI integration with incremental repairs on Reaper: https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fadejanovski%2Fcassandra-reaper%2Ftree%2Finc-repair-support-with-ui&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=h%2BKZvdMRR9Oi3plUMOIiX5LfvQmPvXD0BHJeCZVw0YM%3D&reserved=0>

as I’ve heard from folks that incremental repairs simply don’t work even in 3.x (Yeah, that’s a strong statement but I heard that from multiple folks at the Summit).

Alexander also did a talk about repairs at the Summit (including incremental repairs) and someone from Netflix also did a good one as well, not mentioning incremental repairs but with some benchmarks and tips to run repairs. You might want to check one of those (or both):

https://www.youtube.com/playlist?list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fplaylist%3Flist%3DPLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=UshPKzYyUDIR8idi0JONqsZ0cghRq1f6wLdpxHIJ9oM%3D&reserved=0>

I believe they haven't been released by Datastax yet, they probably will sometime soon.

Repair is something all the large setups companies are struggling with, I mean, Spotify made the Reaper and Netflix a talk about repairs presenting the range_repair.py script and much more stuff. But I know there is some work going on to improve things.

Meanwhile, given the load per node (600 GB, it's big but not that huge) and the number of node (400 is quite a high number of nodes), I would say that the hardest part for you would be to handle the scheduling part to avoid harming the cluster and make sure all the nodes are repaired. I believe Reaper might be a better match in your case as it does that quite well from what I heard, I am not really sure.

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com<ma...@thelastpickle.com>
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.thelastpickle.com&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=q9UnaVZgS0HekWDbwakpK3piOMdvEpQtiUuiDzly%2Bu0%3D&reserved=0>

2016-09-26 23:51 GMT+02:00 Anubhav Kale <An...@microsoft.com>>:
Hello,

We run Cassandra 2.1.13 (don’t have plans to upgrade yet). What is the best way to run repairs at scale (400 nodes, each holding ~600GB) that actually works ?

I’m considering doing subrange repairs (https://github.com/BrianGallew/cassandra_range_repair/blob/master/src/range_repair.py<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBrianGallew%2Fcassandra_range_repair%2Fblob%2Fmaster%2Fsrc%2Frange_repair.py&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=w53NMlnYdbYgoAnBUS95yMEeb%2Fg%2BNH09UgMJEFaw9dE%3D&reserved=0>) as I’ve heard from folks that incremental repairs simply don’t work even in 3.x (Yeah, that’s a strong statement but I heard that from multiple folks at the Summit).

Any guidance would be greatly appreciated !

Thanks,
Anubhav

Re: Repairs at scale in Cassandra 2.1.13

Posted by Paulo Motta <pa...@gmail.com>.

There were a few streaming bugs fixed between 2.1.13 and 2.1.15 (see
CHANGES.txt for more details), so I'd recommend you to upgrade to 2.1.15 in
order to avoid having those.

2016-09-28 9:08 GMT-03:00 Alain RODRIGUEZ <ar...@gmail.com>:

> Hi Anubhav,
>
>
>> I’m considering doing subrange repairs (https://github.com/BrianGalle
>> w/cassandra_range_repair/blob/master/src/range_repair.py)
>>
>
> I used this script a lot, and quite successfully.
>
> An other working option that people are using is:
>
> https://github.com/spotify/cassandra-reaper
>
> Alexander, a coworker integrated an existing UI and made it compatible
> with incremental repairs:
>
> Incremental repairs on Reaper: https://github.com/
> adejanovski/cassandra-reaper/tree/inc-repair-that-works
> UI integration with incremental repairs on Reaper: https://github.com/
> adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui
>
> as I’ve heard from folks that incremental repairs simply don’t work even
>> in 3.x (Yeah, that’s a strong statement but I heard that from multiple
>> folks at the Summit).
>>
>
> Alexander also did a talk about repairs at the Summit (including
> incremental repairs) and someone from Netflix also did a good one as well,
> not mentioning incremental repairs but with some benchmarks and tips to run
> repairs. You might want to check one of those (or both):
>
> https://www.youtube.com/playlist?list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk
>
> I believe they haven't been released by Datastax yet, they probably will
> sometime soon.
>
> Repair is something all the large setups companies are struggling with, I
> mean, Spotify made the Reaper and Netflix a talk about repairs presenting
> the range_repair.py script and much more stuff. But I know there is some
> work going on to improve things.
>
> Meanwhile, given the load per node (600 GB, it's big but not that huge)
> and the number of node (400 is quite a high number of nodes), I would say
> that the hardest part for you would be to handle the scheduling part to
> avoid harming the cluster and make sure all the nodes are repaired. I
> believe Reaper might be a better match in your case as it does that quite
> well from what I heard, I am not really sure.
>
> C*heers,
> -----------------------
> Alain Rodriguez - @arodream - alain@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-09-26 23:51 GMT+02:00 Anubhav Kale <An...@microsoft.com>:
>
>> Hello,
>>
>>
>>
>> We run Cassandra 2.1.13 (don’t have plans to upgrade yet). What is the
>> best way to run repairs at scale (400 nodes, each holding ~600GB) that
>> actually works ?
>>
>>
>>
>> I’m considering doing subrange repairs (https://github.com/BrianGalle
>> w/cassandra_range_repair/blob/master/src/range_repair.py) as I’ve heard
>> from folks that incremental repairs simply don’t work even in 3.x (Yeah,
>> that’s a strong statement but I heard that from multiple folks at the
>> Summit).
>>
>>
>>
>> Any guidance would be greatly appreciated !
>>
>>
>>
>> Thanks,
>>
>> Anubhav
>>
>
>

Re: Repairs at scale in Cassandra 2.1.13

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Hi Anubhav,


> I’m considering doing subrange repairs (https://github.com/
> BrianGallew/cassandra_range_repair/blob/master/src/range_repair.py)
>

I used this script a lot, and quite successfully.

An other working option that people are using is:

https://github.com/spotify/cassandra-reaper

Alexander, a coworker integrated an existing UI and made it compatible with
incremental repairs:

Incremental repairs on Reaper:
https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-that-works
UI integration with incremental repairs on Reaper:
https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui

as I’ve heard from folks that incremental repairs simply don’t work even in
> 3.x (Yeah, that’s a strong statement but I heard that from multiple folks
> at the Summit).
>

Alexander also did a talk about repairs at the Summit (including
incremental repairs) and someone from Netflix also did a good one as well,
not mentioning incremental repairs but with some benchmarks and tips to run
repairs. You might want to check one of those (or both):

https://www.youtube.com/playlist?list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk

I believe they haven't been released by Datastax yet, they probably will
sometime soon.

Repair is something all the large setups companies are struggling with, I
mean, Spotify made the Reaper and Netflix a talk about repairs presenting
the range_repair.py script and much more stuff. But I know there is some
work going on to improve things.

Meanwhile, given the load per node (600 GB, it's big but not that huge) and
the number of node (400 is quite a high number of nodes), I would say that
the hardest part for you would be to handle the scheduling part to avoid
harming the cluster and make sure all the nodes are repaired. I believe
Reaper might be a better match in your case as it does that quite well from
what I heard, I am not really sure.

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-09-26 23:51 GMT+02:00 Anubhav Kale <An...@microsoft.com>:

> Hello,
>
>
>
> We run Cassandra 2.1.13 (don’t have plans to upgrade yet). What is the
> best way to run repairs at scale (400 nodes, each holding ~600GB) that
> actually works ?
>
>
>
> I’m considering doing subrange repairs (https://github.com/
> BrianGallew/cassandra_range_repair/blob/master/src/range_repair.py) as
> I’ve heard from folks that incremental repairs simply don’t work even in
> 3.x (Yeah, that’s a strong statement but I heard that from multiple folks
> at the Summit).
>
>
>
> Any guidance would be greatly appreciated !
>
>
>
> Thanks,
>
> Anubhav
>