You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Gene Robichaux <Ge...@match.com> on 2014/09/26 18:52:27 UTC

Repair taking long time

I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in another.

Running a repair on a large column family seems to be moving much slower than I expect.

Looking at nodetool compaction stats it indicates the Validation phase is running that the total bytes is 4.5T (4505336278756).

This is a very large CF. The process has been running for 2.5 hours and has processed 71G (71950433062). That rate is about 28.4 GB per hour. At this rate it will take 158 hours, just shy of 1 week.

Is this reasonable? This is my first large repair and I am wondering if this is normal for a CF of this size. Seems like a long time to me.

Is it possible to tune this process to speed it up? Is there something in my configuration that could be causing this slow performance? I am running HDDs, not SSDs in a JBOD configuration.



Gene Robichaux
Manager, Database Operations
Match.com
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
Phone: 214-576-3273

Re: Repair taking long time

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Well, in that case, you may want to roll your own script for doing
constant repairs of your cluster, and extend your gc grace seconds so
you can repair the whole cluster before the tombstones are cleared.

On Fri, Sep 26, 2014 at 11:15 AM, Gene Robichaux
<Ge...@match.com> wrote:
> Using their community edition......no support (yet!) :(
>
> Gene Robichaux
> Manager, Database Operations
> Match.com
> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
> Phone: 214-576-3273
>
> -----Original Message-----
> From: jonathan.haddad@gmail.com [mailto:jonathan.haddad@gmail.com] On Behalf Of Jonathan Haddad
> Sent: Friday, September 26, 2014 12:58 PM
> To: user@cassandra.apache.org
> Subject: Re: Repair taking long time
>
> If you're using DSE you might want to contact Datastax support, rather than the ML.
>
> On Fri, Sep 26, 2014 at 10:52 AM, Gene Robichaux <Ge...@match.com> wrote:
>> I am on DSE 4.0.3 which is 2.0.7.
>>
>>
>>
>> If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..
>>
>>
>>
>> The bad thing is that table is not our largest….. :(
>>
>>
>>
>>
>>
>> Gene Robichaux
>>
>> Manager, Database Operations
>>
>> Match.com
>>
>> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>>
>> Phone: 214-576-3273
>>
>>
>>
>> From: Brice Dutheil [mailto:brice.dutheil@gmail.com]
>> Sent: Friday, September 26, 2014 12:47 PM
>> To: user@cassandra.apache.org
>> Subject: Re: Repair taking long time
>>
>>
>>
>> Unfortunately DSE 4.5.0 is still on 2.0.x
>>
>>
>> -- Brice
>>
>>
>>
>> On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:
>>
>> Are you using Cassandra 2.0 & vnodes?  If so, repair takes forever.
>> This problem is addressed in 2.1.
>>
>>
>> On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
>> <Ge...@match.com> wrote:
>>> I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC
>>> and 4 in another.
>>>
>>>
>>>
>>> Running a repair on a large column family seems to be moving much
>>> slower than I expect.
>>>
>>>
>>>
>>> Looking at nodetool compaction stats it indicates the Validation
>>> phase is running that the total bytes is 4.5T (4505336278756).
>>>
>>>
>>>
>>> This is a very large CF. The process has been running for 2.5 hours
>>> and has processed 71G (71950433062). That rate is about 28.4 GB per
>>> hour. At this rate it will take 158 hours, just shy of 1 week.
>>>
>>>
>>>
>>> Is this reasonable? This is my first large repair and I am wondering
>>> if this is normal for a CF of this size. Seems like a long time to
>>> me.
>>>
>>>
>>>
>>> Is it possible to tune this process to speed it up? Is there
>>> something in my configuration that could be causing this slow
>>> performance? I am running HDDs, not SSDs in a JBOD configuration.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Gene Robichaux
>>>
>>> Manager, Database Operations
>>>
>>> Match.com
>>>
>>> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>>>
>>> Phone: 214-576-3273
>>>
>>>
>>
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>>
>>
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

RE: Repair taking long time

Posted by Gene Robichaux <Ge...@match.com>.

Using their community edition......no support (yet!) :(

Gene Robichaux
Manager, Database Operations
Match.com
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
Phone: 214-576-3273

-----Original Message-----
From: jonathan.haddad@gmail.com [mailto:jonathan.haddad@gmail.com] On Behalf Of Jonathan Haddad
Sent: Friday, September 26, 2014 12:58 PM
To: user@cassandra.apache.org
Subject: Re: Repair taking long time

If you're using DSE you might want to contact Datastax support, rather than the ML.

On Fri, Sep 26, 2014 at 10:52 AM, Gene Robichaux <Ge...@match.com> wrote:
> I am on DSE 4.0.3 which is 2.0.7.
>
>
>
> If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..
>
>
>
> The bad thing is that table is not our largest….. :(
>
>
>
>
>
> Gene Robichaux
>
> Manager, Database Operations
>
> Match.com
>
> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>
> Phone: 214-576-3273
>
>
>
> From: Brice Dutheil [mailto:brice.dutheil@gmail.com]
> Sent: Friday, September 26, 2014 12:47 PM
> To: user@cassandra.apache.org
> Subject: Re: Repair taking long time
>
>
>
> Unfortunately DSE 4.5.0 is still on 2.0.x
>
>
> -- Brice
>
>
>
> On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:
>
> Are you using Cassandra 2.0 & vnodes?  If so, repair takes forever.
> This problem is addressed in 2.1.
>
>
> On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux 
> <Ge...@match.com> wrote:
>> I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC 
>> and 4 in another.
>>
>>
>>
>> Running a repair on a large column family seems to be moving much 
>> slower than I expect.
>>
>>
>>
>> Looking at nodetool compaction stats it indicates the Validation 
>> phase is running that the total bytes is 4.5T (4505336278756).
>>
>>
>>
>> This is a very large CF. The process has been running for 2.5 hours 
>> and has processed 71G (71950433062). That rate is about 28.4 GB per 
>> hour. At this rate it will take 158 hours, just shy of 1 week.
>>
>>
>>
>> Is this reasonable? This is my first large repair and I am wondering 
>> if this is normal for a CF of this size. Seems like a long time to 
>> me.
>>
>>
>>
>> Is it possible to tune this process to speed it up? Is there 
>> something in my configuration that could be causing this slow 
>> performance? I am running HDDs, not SSDs in a JBOD configuration.
>>
>>
>>
>>
>>
>>
>>
>> Gene Robichaux
>>
>> Manager, Database Operations
>>
>> Match.com
>>
>> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>>
>> Phone: 214-576-3273
>>
>>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
>



--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Repair taking long time

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

If you're using DSE you might want to contact Datastax support, rather
than the ML.

On Fri, Sep 26, 2014 at 10:52 AM, Gene Robichaux
<Ge...@match.com> wrote:
> I am on DSE 4.0.3 which is 2.0.7.
>
>
>
> If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..
>
>
>
> The bad thing is that table is not our largest….. :(
>
>
>
>
>
> Gene Robichaux
>
> Manager, Database Operations
>
> Match.com
>
> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>
> Phone: 214-576-3273
>
>
>
> From: Brice Dutheil [mailto:brice.dutheil@gmail.com]
> Sent: Friday, September 26, 2014 12:47 PM
> To: user@cassandra.apache.org
> Subject: Re: Repair taking long time
>
>
>
> Unfortunately DSE 4.5.0 is still on 2.0.x
>
>
> -- Brice
>
>
>
> On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:
>
> Are you using Cassandra 2.0 & vnodes?  If so, repair takes forever.
> This problem is addressed in 2.1.
>
>
> On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
> <Ge...@match.com> wrote:
>> I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4
>> in
>> another.
>>
>>
>>
>> Running a repair on a large column family seems to be moving much slower
>> than I expect.
>>
>>
>>
>> Looking at nodetool compaction stats it indicates the Validation phase is
>> running that the total bytes is 4.5T (4505336278756).
>>
>>
>>
>> This is a very large CF. The process has been running for 2.5 hours and
>> has
>> processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
>> rate it will take 158 hours, just shy of 1 week.
>>
>>
>>
>> Is this reasonable? This is my first large repair and I am wondering if
>> this
>> is normal for a CF of this size. Seems like a long time to me.
>>
>>
>>
>> Is it possible to tune this process to speed it up? Is there something in
>> my
>> configuration that could be causing this slow performance? I am running
>> HDDs, not SSDs in a JBOD configuration.
>>
>>
>>
>>
>>
>>
>>
>> Gene Robichaux
>>
>> Manager, Database Operations
>>
>> Match.com
>>
>> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>>
>> Phone: 214-576-3273
>>
>>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
>



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

RE: Repair taking long time

Posted by Gene Robichaux <Ge...@match.com>.

I am on DSE 4.0.3 which is 2.0.7.

If 4.5.1 is NOT 2.1. I guess an upgrade will not buy me much…..

The bad thing is that table is not our largest….. :(


Gene Robichaux
Manager, Database Operations
Match.com
8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
Phone: 214-576-3273

From: Brice Dutheil [mailto:brice.dutheil@gmail.com]
Sent: Friday, September 26, 2014 12:47 PM
To: user@cassandra.apache.org
Subject: Re: Repair taking long time

Unfortunately DSE 4.5.0 is still on 2.0.x

-- Brice

On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad <jo...@jonhaddad.com>> wrote:
Are you using Cassandra 2.0 & vnodes?  If so, repair takes forever.
This problem is addressed in 2.1.

On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
<Ge...@match.com>> wrote:
> I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in
> another.
>
>
>
> Running a repair on a large column family seems to be moving much slower
> than I expect.
>
>
>
> Looking at nodetool compaction stats it indicates the Validation phase is
> running that the total bytes is 4.5T (4505336278756).
>
>
>
> This is a very large CF. The process has been running for 2.5 hours and has
> processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
> rate it will take 158 hours, just shy of 1 week.
>
>
>
> Is this reasonable? This is my first large repair and I am wondering if this
> is normal for a CF of this size. Seems like a long time to me.
>
>
>
> Is it possible to tune this process to speed it up? Is there something in my
> configuration that could be causing this slow performance? I am running
> HDDs, not SSDs in a JBOD configuration.
>
>
>
>
>
>
>
> Gene Robichaux
>
> Manager, Database Operations
>
> Match.com
>
> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>
> Phone: 214-576-3273
>
>


--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Repair taking long time

Posted by Brice Dutheil <br...@gmail.com>.

Unfortunately DSE 4.5.0 is still on 2.0.x

-- Brice

On Fri, Sep 26, 2014 at 7:40 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Are you using Cassandra 2.0 & vnodes?  If so, repair takes forever.
> This problem is addressed in 2.1.
>
> On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
> <Ge...@match.com> wrote:
> > I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and
> 4 in
> > another.
> >
> >
> >
> > Running a repair on a large column family seems to be moving much slower
> > than I expect.
> >
> >
> >
> > Looking at nodetool compaction stats it indicates the Validation phase is
> > running that the total bytes is 4.5T (4505336278756).
> >
> >
> >
> > This is a very large CF. The process has been running for 2.5 hours and
> has
> > processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
> > rate it will take 158 hours, just shy of 1 week.
> >
> >
> >
> > Is this reasonable? This is my first large repair and I am wondering if
> this
> > is normal for a CF of this size. Seems like a long time to me.
> >
> >
> >
> > Is it possible to tune this process to speed it up? Is there something
> in my
> > configuration that could be causing this slow performance? I am running
> > HDDs, not SSDs in a JBOD configuration.
> >
> >
> >
> >
> >
> >
> >
> > Gene Robichaux
> >
> > Manager, Database Operations
> >
> > Match.com
> >
> > 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
> >
> > Phone: 214-576-3273
> >
> >
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>

Re: Repair taking long time

Posted by Bryan Talbot <br...@playnext.com>.

With a 4.5 TB table and just 4 nodes, repair will likely take forever for
any version.

-Bryan


On Fri, Sep 26, 2014 at 10:40 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Are you using Cassandra 2.0 & vnodes?  If so, repair takes forever.
> This problem is addressed in 2.1.
>
> On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
> <Ge...@match.com> wrote:
> > I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and
> 4 in
> > another.
> >
> >
> >
> > Running a repair on a large column family seems to be moving much slower
> > than I expect.
> >
> >
> >
> > Looking at nodetool compaction stats it indicates the Validation phase is
> > running that the total bytes is 4.5T (4505336278756).
> >
> >
> >
> > This is a very large CF. The process has been running for 2.5 hours and
> has
> > processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
> > rate it will take 158 hours, just shy of 1 week.
> >
> >
> >
> > Is this reasonable? This is my first large repair and I am wondering if
> this
> > is normal for a CF of this size. Seems like a long time to me.
> >
> >
> >
> > Is it possible to tune this process to speed it up? Is there something
> in my
> > configuration that could be causing this slow performance? I am running
> > HDDs, not SSDs in a JBOD configuration.
> >
> >
> >
> >
> >
> >
> >
> > Gene Robichaux
> >
> > Manager, Database Operations
> >
> > Match.com
> >
> > 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
> >
> > Phone: 214-576-3273
> >
> >
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>

Re: Repair taking long time

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Are you using Cassandra 2.0 & vnodes?  If so, repair takes forever.
This problem is addressed in 2.1.

On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux
<Ge...@match.com> wrote:
> I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in
> another.
>
>
>
> Running a repair on a large column family seems to be moving much slower
> than I expect.
>
>
>
> Looking at nodetool compaction stats it indicates the Validation phase is
> running that the total bytes is 4.5T (4505336278756).
>
>
>
> This is a very large CF. The process has been running for 2.5 hours and has
> processed 71G (71950433062). That rate is about 28.4 GB per hour. At this
> rate it will take 158 hours, just shy of 1 week.
>
>
>
> Is this reasonable? This is my first large repair and I am wondering if this
> is normal for a CF of this size. Seems like a long time to me.
>
>
>
> Is it possible to tune this process to speed it up? Is there something in my
> configuration that could be causing this slow performance? I am running
> HDDs, not SSDs in a JBOD configuration.
>
>
>
>
>
>
>
> Gene Robichaux
>
> Manager, Database Operations
>
> Match.com
>
> 8300 Douglas Avenue I Suite 800 I Dallas, TX  75225
>
> Phone: 214-576-3273
>
>



-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

Re: Repair taking long time

Posted by Ben Bromhead <be...@instaclustr.com>.

use https://github.com/BrianGallew/cassandra_range_repair



On 30 September 2014 05:24, Ken Hancock <ke...@schange.com> wrote:

>
> On Mon, Sep 29, 2014 at 2:29 PM, Robert Coli <rc...@eventbrite.com> wrote:
>
>>
>> As an aside, you "just lose" with vnodes and clusters of the size. I
>> presume you plan to grow over appx 9 nodes per DC, in which case you
>> probably do want vnodes enabled.
>>
>
> I typically only see discussion on vnodes vs. non-vnodes, but it seems to
> me that might be more important to discuss the number of vnodes per node.
> A small cluster having 256 vnodes/node is unwise given some of the
> sequential operations that are still done.  Even if operations were done in
> parallel, having a 256x increase in parallelization seems an equally bad
> choice.
>
> I've never seen any discussion on how many vnodes per node might be an
> appropriate answer based a planned cluster size -- does such a thing exist?
>
> Ken
>
>
>
>
>


-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
<http://twitter.com/instaclustr> | +61 415 936 359

Re: Repair taking long time

Posted by Ken Hancock <ke...@schange.com>.

On Mon, Sep 29, 2014 at 2:29 PM, Robert Coli <rc...@eventbrite.com> wrote:

>
> As an aside, you "just lose" with vnodes and clusters of the size. I
> presume you plan to grow over appx 9 nodes per DC, in which case you
> probably do want vnodes enabled.
>

I typically only see discussion on vnodes vs. non-vnodes, but it seems to
me that might be more important to discuss the number of vnodes per node.
A small cluster having 256 vnodes/node is unwise given some of the
sequential operations that are still done.  Even if operations were done in
parallel, having a 256x increase in parallelization seems an equally bad
choice.

I've never seen any discussion on how many vnodes per node might be an
appropriate answer based a planned cluster size -- does such a thing exist?

Ken

Re: Repair taking long time

Posted by Rahul Neelakantan <ra...@rahul.be>.

What is the recommendation on the number of tokens value? I am asking because of the issue with sequential repairs on token range after token range.

Rahul Neelakantan

> On Sep 29, 2014, at 2:29 PM, Robert Coli <rc...@eventbrite.com> wrote:
> 
>> On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux <Ge...@match.com> wrote:
>> I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in another.
>> 
>>  
>> 
>> Running a repair on a large column family seems to be moving much slower than I expect.
>> 
> 
> Unfortunately, as others have mentioned, the slowness/broken-ness of repair is a long running (groan!) issue and therefore currently expected. 
> 
> At this time, I do not recommend upgrading to 2.1 in production to attempt to fix it. I am also broadly skeptical that it as fixed in 2.1 as all that.
> 
> Once can increase gc_grace_seconds to 34 days [1] and repair once a month, which should help make repair slightly more tractable.
> 
> For now you should probably evaluate which of your column families you *absolutely must* repair (because you do DELETE like operations in them, etc.) and only repair those.
> 
> As an aside, you "just lose" with vnodes and clusters of the size. I presume you plan to grow over appx 9 nodes per DC, in which case you probably do want vnodes enabled.
> 
> One note :
>>  Looking at nodetool compaction stats it indicates the Validation phase is running that the total bytes is 4.5T (4505336278756).
> 
> This is the uncompressed size, I'm betting your actual on disk size is closer to 2T? Even though 2.0 has improved performance for nodes with lots of data, 2T per node is still relatively "fat" for a Cassandra node.
> 
> 
> =Rob
> [1] https://issues.apache.org/jira/browse/CASSANDRA-5850

Re: Repair taking long time

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux <Ge...@match.com>
wrote:

>  I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and
> 4 in another.
>
>
>
> Running a repair on a large column family seems to be moving much slower
> than I expect.
>

Unfortunately, as others have mentioned, the slowness/broken-ness of repair
is a long running (groan!) issue and therefore currently expected.

At this time, I do not recommend upgrading to 2.1 in production to attempt
to fix it. I am also broadly skeptical that it as fixed in 2.1 as all that.

Once can increase gc_grace_seconds to 34 days [1] and repair once a month,
which should help make repair slightly more tractable.

For now you should probably evaluate which of your column families you
*absolutely must* repair (because you do DELETE like operations in them,
etc.) and only repair those.

As an aside, you "just lose" with vnodes and clusters of the size. I
presume you plan to grow over appx 9 nodes per DC, in which case you
probably do want vnodes enabled.

One note :

>  Looking at nodetool compaction stats it indicates the Validation phase
> is running that the total bytes is 4.5T (4505336278756).

This is the uncompressed size, I'm betting your actual on disk size is
closer to 2T? Even though 2.0 has improved performance for nodes with lots
of data, 2T per node is still relatively "fat" for a Cassandra node.

=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-5850