You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jan Kesten <ja...@dg6obo.de> on 2014/05/02 09:29:46 UTC

repair -pr does not return

Hello together,

I'm running a cassandra cluster with 2.0.6 and 6 nodes. As far as I 
know, routine repairs are still mandatory for handling tombstones - even 
I noticed that the cluster now does a "snapshot-repair" by default.

Now my cluster is running a while and has a load of about 200g per node 
- running a "nodetool repair -pr" on one of the nodes seems to run 
forever, right now it's running for 2 complete days and does not return.

Any suggestions?

Thanks in advance,
Jan

Re: repair -pr does not return

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, May 2, 2014 at 12:29 AM, Jan Kesten <ja...@dg6obo.de> wrote:

> I'm running a cassandra cluster with 2.0.6 and 6 nodes. As far as I know,
> routine repairs are still mandatory for handling tombstones - even I
> noticed that the cluster now does a "snapshot-repair" by default.
>
> Now my cluster is running a while and has a load of about 200g per node -
> running a "nodetool repair -pr" on one of the nodes seems to run forever,
> right now it's running for 2 complete days and does not return.
>

https://issues.apache.org/jira/browse/CASSANDRA-5220

The reports I am getting on this list and in #cassandra about the newly
re-written repair in 2.0.x line, with vnodes on a real sized data set, is
that it often does not work, and if it does work, not in tractable time. As
other posters have said, it is continually being fixed and improved. If I
were you, I would consider increasing gc_grace_seconds to something like 34
days until repair starts working more efficiently with vnodes.

https://issues.apache.org/jira/browse/CASSANDRA-5850

=Rob

Re: repair -pr does not return

Posted by Artur Kronenberg <ar...@openmarket.com>.

Hi,

to be honest 2 days for 200GB nodes doesn't sound too unreasonable to me 
(depending on your hardware of course). We were running a ~20 GB cluster 
with regualr hard drives (no SSD) and our first repair ran a day as well 
if I recall correctly. We since improved our hardware and got it down to 
a couple of hours (~5h for all nodes triggering a -pr repair).

As far as I know you can use nodetool compactionstats and nodetool 
netstats to check for activity on your repairs. There may be a chance 
that it is hanging but also that it just really takes a quite long time.

Cheers,

-- artur

On 02/05/14 09:12, Jan Kesten wrote:
> Hi Duncan,
>
>> is it actually doing something or does it look like it got stuck?  
>> 2.0.7 has a fix for a getting stuck problem.
>
> it starts with sending merkle trees and streaming for some time (some 
> hours in fact) and then seems just to hang. So I'll try to update and 
> see it that's solves the issue. Thanks for that hint!
>
> Cheers,
> Jan
>
>

Re: Backup procedure

Posted by Patricia Gorla <pa...@thelastpickle.com>.

Artur,

Replies inline.

On Fri, May 2, 2014 at 10:42 AM, Artur Kronenberg <
artur.kronenberg@openmarket.com> wrote:

> we are running a 7 node cluster with an RF of 5. Each node holds about 70%
> of the data and we are now wondering about the backup process.
>

What are you using for a backup process at the moment? Or, even just your
application stack. If you're using Amazon's AWS it is simple to get started
with a project like tablesnap <https://github.com/JeremyGrosser/tablesnap>,
which listens for new sstables and uploads them to S3.

You can also take snapshots of the data on each node with 'nodetool
snapshot', and move the data manually.

>  1. Is there a best practice procedure or a tool that we can use to have
> one backup that holds 100 % of the data or is it necessary for us to take
> multiple backups.
>

Backups on a distributed system generally refers to the concept that you
have captured the state of the database at a particular point in time. The
size and spread of your data will be the limiting factor in having one
backup — you can store the data from each node on a single computer, you
just won't be able to combine the data into one node without some extra
legwork.

> 2. If we have to use multiple backups, is there a way to combine them? We
> would like to be able to start up a 1 node cluster that holds 100% of data
> if necessary. Can we just chug all sstables into the data directory and
> cassandra will figure out the rest?
>

> 4. If all of the above would work, could we in case of emergency setup a
> massive 1-node cluster that holds 100 % of the data and repair the rest of
> our cluster based of this? E.g. have the 1 node run with the correct data,
> and then hook it into our existing cluster and call repair on it to restore
> data on the rest of our nodes?
>

You could bulk load the sstable data to a smaller cluster using the
'sstableloader' tool. I gave a
webinar<https://www.youtube.com/watch?v=00weETpk3Yo> for
Planet Cassandra a few months ago about how to backfill in data to your
cluster, this could help here.

3. How do we handle the commitlog files from all of our nodes? Given we'd
> like to restore to a certain point in time and we have all the commitlogs,
> can we have commitlogs from multiple locations in the commitlog folder and
> cassandra will pick and execute the right thing?
>

You'll want to use 'nodetool
drain<http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsDrain.html>'
beforehand to avoid this issue. This makes the node unavailable for writes,
flushes the memtables and replays the commitlog.

Cheers,
-- 
Patricia Gorla
@patriciagorla

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com <http://thelastpickle.com>

Re: Backup procedure

Posted by Chris Burroughs <ch...@gmail.com>.

It's also good to note that only the Data files are compressed already. 
  Depending on your data the Index and other files may be a significant 
percent of total on disk data.

On 05/02/2014 01:14 PM, tommaso barbugli wrote:
> In my tests compressing with lzop sstables (with cassandra compression
> turned on) resulted in approx. 50% smaller files.
> Thats probably because the chunks of data compressed by lzop are way bigger
> than the average size of writes performed on Cassandra (not sure how data
> is compressed but I guess it is done per single cell so unless one stores)
>
>
> 2014-05-02 19:01 GMT+02:00 Robert Coli <rc...@eventbrite.com>:
>
>> On Fri, May 2, 2014 at 2:07 AM, tommaso barbugli <tb...@gmail.com>wrote:
>>
>>> If you are thinking about using Amazon S3 storage I wrote a tool that
>>> performs snapshots and backups on multiple nodes.
>>> Backups are stored compressed on S3.
>>> https://github.com/tbarbugli/cassandra_snapshotter
>>>
>>
>> https://github.com/JeremyGrosser/tablesnap
>>
>> SSTables in Cassandra are compressed by default, if you are re-compressing
>> them you may just be wasting CPU.. :)
>>
>> =Rob
>>
>>
>

Re: Backup procedure

Posted by tommaso barbugli <tb...@gmail.com>.

In my tests compressing with lzop sstables (with cassandra compression
turned on) resulted in approx. 50% smaller files.
Thats probably because the chunks of data compressed by lzop are way bigger
than the average size of writes performed on Cassandra (not sure how data
is compressed but I guess it is done per single cell so unless one stores)

2014-05-02 19:01 GMT+02:00 Robert Coli <rc...@eventbrite.com>:

> On Fri, May 2, 2014 at 2:07 AM, tommaso barbugli <tb...@gmail.com>wrote:
>
>> If you are thinking about using Amazon S3 storage I wrote a tool that
>> performs snapshots and backups on multiple nodes.
>> Backups are stored compressed on S3.
>> https://github.com/tbarbugli/cassandra_snapshotter
>>
>
> https://github.com/JeremyGrosser/tablesnap
>
> SSTables in Cassandra are compressed by default, if you are re-compressing
> them you may just be wasting CPU.. :)
>
> =Rob
>
>

Re: Backup procedure

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, May 2, 2014 at 2:07 AM, tommaso barbugli <tb...@gmail.com>wrote:

> If you are thinking about using Amazon S3 storage I wrote a tool that
> performs snapshots and backups on multiple nodes.
> Backups are stored compressed on S3.
> https://github.com/tbarbugli/cassandra_snapshotter
>

https://github.com/JeremyGrosser/tablesnap

SSTables in Cassandra are compressed by default, if you are re-compressing
them you may just be wasting CPU.. :)

=Rob

Re: Backup procedure

Posted by tommaso barbugli <tb...@gmail.com>.

If you are thinking about using Amazon S3 storage I wrote a tool that
performs snapshots and backups on multiple nodes.
Backups are stored compressed on S3.
https://github.com/tbarbugli/cassandra_snapshotter

Cheers,
Tommaso


2014-05-02 10:42 GMT+02:00 Artur Kronenberg <artur.kronenberg@openmarket.com
>:

> Hi,
>
> we are running a 7 node cluster with an RF of 5. Each node holds about 70%
> of the data and we are now wondering about the backup process.
>
> 1. Is there a best practice procedure or a tool that we can use to have
> one backup that holds 100 % of the data or is it necessary for us to take
> multiple backups.
>
> 2. If we have to use multiple backups, is there a way to combine them? We
> would like to be able to start up a 1 node cluster that holds 100% of data
> if necessary. Can we just chug all sstables into the data directory and
> cassandra will figure out the rest?
>
> 3. How do we handle the commitlog files from all of our nodes? Given we'd
> like to restore to a certain point in time and we have all the commitlogs,
> can we have commitlogs from multiple locations in the commitlog folder and
> cassandra will pick and execute the right thing?
>
> 4. If all of the above would work, could we in case of emergency setup a
> massive 1-node cluster that holds 100 % of the data and repair the rest of
> our cluster based of this? E.g. have the 1 node run with the correct data,
> and then hook it into our existing cluster and call repair on it to restore
> data on the rest of our nodes?
>
> Thanks for your help!
>
> Cheers,
>
> Artur
>

Backup procedure

Posted by Artur Kronenberg <ar...@openmarket.com>.

Hi,

we are running a 7 node cluster with an RF of 5. Each node holds about 
70% of the data and we are now wondering about the backup process.

1. Is there a best practice procedure or a tool that we can use to have 
one backup that holds 100 % of the data or is it necessary for us to 
take multiple backups.

2. If we have to use multiple backups, is there a way to combine them? 
We would like to be able to start up a 1 node cluster that holds 100% of 
data if necessary. Can we just chug all sstables into the data directory 
and cassandra will figure out the rest?

3. How do we handle the commitlog files from all of our nodes? Given 
we'd like to restore to a certain point in time and we have all the 
commitlogs, can we have commitlogs from multiple locations in the 
commitlog folder and cassandra will pick and execute the right thing?

4. If all of the above would work, could we in case of emergency setup a 
massive 1-node cluster that holds 100 % of the data and repair the rest 
of our cluster based of this? E.g. have the 1 node run with the correct 
data, and then hook it into our existing cluster and call repair on it 
to restore data on the rest of our nodes?

Thanks for your help!

Cheers,

Artur

Re: repair -pr does not return

Posted by Jan Kesten <ja...@dg6obo.de>.

Hi Duncan,

> is it actually doing something or does it look like it got stuck?  
> 2.0.7 has a fix for a getting stuck problem.

it starts with sending merkle trees and streaming for some time (some 
hours in fact) and then seems just to hang. So I'll try to update and 
see it that's solves the issue. Thanks for that hint!

Cheers,
Jan

Re: repair -pr does not return

Posted by Duncan Sands <du...@gmail.com>.

Hi Jan,

On 02/05/14 09:29, Jan Kesten wrote:
> Hello together,
>
> I'm running a cassandra cluster with 2.0.6 and 6 nodes. As far as I know,
> routine repairs are still mandatory for handling tombstones - even I noticed
> that the cluster now does a "snapshot-repair" by default.
>
> Now my cluster is running a while and has a load of about 200g per node -
> running a "nodetool repair -pr" on one of the nodes seems to run forever, right
> now it's running for 2 complete days and does not return.

is it actually doing something or does it look like it got stuck?  2.0.7 has a 
fix for a getting stuck problem.

Ciao, Duncan.

>
> Any suggestions?
>
> Thanks in advance,
> Jan
>
>
>