You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Raj N <ra...@gmail.com> on 2012/04/28 16:47:14 UTC

nodetool repair cassandra 0.8.4 HELP!!!

I have a 6 node cassandra cluster DC1=3, DC2=3 with 60 GB data on each
node. I was bulk loading data over the weekend. But we forgot to turn off
the weekly nodetool repair job. As a result, repair was interfering when we
were bulk loading data. I canceled repair by restarting the nodes. But
unfortunately after the restart it looks like I dont have any data on those
nodes when I use list on cassandra-cli. I ran repair on one of the effected
nodes, but repair seems to be taking forever. Disk space has almost
tripled. I stopped the repair again in fear of running out of disk space.
After restart, the disk space is at 50% where as the good nodes are at 25%.
How should I proceed from here.  When I run list on cassandra-cli I do see
data on the effected node. But how can I be sure I have all the
data. Should I run repair again. Should I cleanup the disk by clearing
snapshots. Or should I just drop column families and bulk load the data
again?

Thanks
-Raj

Re: nodetool repair cassandra 0.8.4 HELP!!!

Posted by aaron morton <aa...@thelastpickle.com>.

When you start a node does it log that it's opening SSTables ?

After starting what does nodetool cfstats say for the node ?

Can you connect with cassandra-cli and do a get ?

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 29/04/2012, at 10:45 PM, Raj N wrote:

> I tried it on 1 column family. I believe there is a bug in 0.8* where repair ignores the cf. I tried this multiple times on different nodes. Every time the disk util was going uo to 80% on a 500 GB disk. I would eventually kill the repair. I only have 60GB worth data. I see this JIRA -
> 
> https://issues.apache.org/jira/browse/CASSANDRA-2324 
> 
> But that says it was fixed in 0.8 beta. Is this still broken in 0.8.4?
> 
> I also don't understand why the data was inconsistent in the first place. I read and write at LOCAL_QUORUM. 
> 
> Thanks
> -Raj
> 
> On Sun, Apr 29, 2012 at 2:06 AM, Watanabe Maki <wa...@gmail.com> wrote:
> You should run repair. If the disk space is the problem, try to cleanup and major compact before repair.
> You can limit the streaming data by running repair for each column family separately.
> 
> maki
> 
> On 2012/04/28, at 23:47, Raj N <ra...@gmail.com> wrote:
> 
> > I have a 6 node cassandra cluster DC1=3, DC2=3 with 60 GB data on each node. I was bulk loading data over the weekend. But we forgot to turn off the weekly nodetool repair job. As a result, repair was interfering when we were bulk loading data. I canceled repair by restarting the nodes. But unfortunately after the restart it looks like I dont have any data on those nodes when I use list on cassandra-cli. I ran repair on one of the effected nodes, but repair seems to be taking forever. Disk space has almost tripled. I stopped the repair again in fear of running out of disk space. After restart, the disk space is at 50% where as the good nodes are at 25%. How should I proceed from here.  When I run list on cassandra-cli I do see data on the effected node. But how can I be sure I have all the data. Should I run repair again. Should I cleanup the disk by clearing snapshots. Or should I just drop column families and bulk load the data again?
> >
> > Thanks
> > -Raj
>

Re: nodetool repair cassandra 0.8.4 HELP!!!

Posted by Raj N <ra...@gmail.com>.

I tried it on 1 column family. I believe there is a bug in 0.8* where
repair ignores the cf. I tried this multiple times on different nodes.
Every time the disk util was going uo to 80% on a 500 GB disk. I would
eventually kill the repair. I only have 60GB worth data. I see this JIRA -

https://issues.apache.org/jira/browse/CASSANDRA-2324

But that says it was fixed in 0.8 beta. Is this still broken in 0.8.4?

I also don't understand why the data was inconsistent in the first place. I
read and write at LOCAL_QUORUM.

Thanks
-Raj

On Sun, Apr 29, 2012 at 2:06 AM, Watanabe Maki <wa...@gmail.com>wrote:

> You should run repair. If the disk space is the problem, try to cleanup
> and major compact before repair.
> You can limit the streaming data by running repair for each column family
> separately.
>
> maki
>
> On 2012/04/28, at 23:47, Raj N <ra...@gmail.com> wrote:
>
> > I have a 6 node cassandra cluster DC1=3, DC2=3 with 60 GB data on each
> node. I was bulk loading data over the weekend. But we forgot to turn off
> the weekly nodetool repair job. As a result, repair was interfering when we
> were bulk loading data. I canceled repair by restarting the nodes. But
> unfortunately after the restart it looks like I dont have any data on those
> nodes when I use list on cassandra-cli. I ran repair on one of the effected
> nodes, but repair seems to be taking forever. Disk space has almost
> tripled. I stopped the repair again in fear of running out of disk space.
> After restart, the disk space is at 50% where as the good nodes are at 25%.
> How should I proceed from here.  When I run list on cassandra-cli I do see
> data on the effected node. But how can I be sure I have all the data.
> Should I run repair again. Should I cleanup the disk by clearing snapshots.
> Or should I just drop column families and bulk load the data again?
> >
> > Thanks
> > -Raj
>

Re: nodetool repair cassandra 0.8.4 HELP!!!

Posted by Watanabe Maki <wa...@gmail.com>.

You should run repair. If the disk space is the problem, try to cleanup and major compact before repair. 
You can limit the streaming data by running repair for each column family separately.

maki

On 2012/04/28, at 23:47, Raj N <ra...@gmail.com> wrote:

> I have a 6 node cassandra cluster DC1=3, DC2=3 with 60 GB data on each node. I was bulk loading data over the weekend. But we forgot to turn off the weekly nodetool repair job. As a result, repair was interfering when we were bulk loading data. I canceled repair by restarting the nodes. But unfortunately after the restart it looks like I dont have any data on those nodes when I use list on cassandra-cli. I ran repair on one of the effected nodes, but repair seems to be taking forever. Disk space has almost tripled. I stopped the repair again in fear of running out of disk space. After restart, the disk space is at 50% where as the good nodes are at 25%. How should I proceed from here.  When I run list on cassandra-cli I do see data on the effected node. But how can I be sure I have all the data. Should I run repair again. Should I cleanup the disk by clearing snapshots. Or should I just drop column families and bulk load the data again?
> 
> Thanks
> -Raj