You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Cyril Scetbon <cy...@free.fr> on 2012/03/21 19:33:26 UTC

exception when attempting to truncate a table

Hi,

I'm using version 1.0.7 and when I try to truncate a CF in cqlsh it 
raises an error message, but it seems incorrect ...

cqlsh:ks1> truncate core;
Unable to complete request: *one or more nodes were unavailable*.

 >nodetool -h localhost ring
                                                                               155962751505430129087380028406227096917
10.0.0.61       DC1         RAC1        Up     Normal  75.8 GB         
8.33%   0
10.0.0.62       DC1         RAC1        Up     Normal  72.84 GB        
8.33%   14178431955039102644307275309657008810
10.0.1.61       DC2         RAC1        Up     Normal  72.54 GB        
8.33%   28356863910078205288614550619314017621
10.0.1.62       DC2         RAC1        Up     Normal  108.11 GB       
8.33%   42535295865117307932921825928971026432
10.0.0.63       DC1         RAC1        Up     Normal  72.37 GB        
8.33%   56713727820156410577229101238628035242
10.0.0.64       DC1         RAC1        Up     Normal  72.56 GB        
8.33%   70892159775195513221536376548285044053
10.0.1.63       DC2         RAC1        Up     Normal  73.09 GB        
8.33%   85070591730234615865843651857942052864
10.0.1.64       DC2         RAC1        Up     Normal  169.85 GB       
8.33%   99249023685273718510150927167599061674
10.0.0.65       DC1         RAC1        Up     Normal  72.33 GB        
8.33%   113427455640312821154458202477256070485
10.0.0.66       DC1         RAC1        Up     Normal  93.12 GB        
8.33%   127605887595351923798765477786913079296
10.0.1.65       DC2         RAC1        Up     Normal  88.16 GB        
8.33%   141784319550391026443072753096570088106
10.0.1.66       DC2         RAC1        Up     Normal  92.46 GB        
8.33%   155962751505430129087380028406227096917

any idea ?

thanks

-- 
Cyril SCETBON

Re: exception when attempting to truncate a table

Posted by Cyril Scetbon <cy...@free.fr>.

On 3/21/12 10:47 PM, Viktor Jevdokimov wrote:
>
> This is a known issue(s) to be fixed (can't find exact tickets on the 
> tracker).
>
> Controller, that receives truncate command, checks all nodes up and 
> send truncate message to all (including itself), waiting for an answer 
> for rpc_timeout_in_ms (will be fixed to separate timeout setting).
>
> If any node timeouts, node unavailable exception will be thrown (this 
> will be fixed to timeout exception).
>
> Why truncate takes a long time? That because all memtables should be 
> flushed to free up commit logs, then truncated sstables should be 
> snapshotted (there will be introduced a setting to disable snapshots). 
> All this takes time for a truncate.
>
> But you can check with a short scan a few moments later after failed 
> truncate that the data is actually truncated.
>
Ok, that's what I've done, and as you mentioned I have some snapshots 
that killed my free space :( I didn't know that a snapshot was taken for 
each truncate request
>
> Best regards/ Pagarbiai
>
> *Viktor Jevdokimov*
>
> Senior Developer
>
> Email: Viktor.Jevdokimov@adform.com
>
> Phone: +370 5 212 3063. Fax: +370 5 261 0453
>
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>
> 	
>
> Adform news <http://www.adform.com/>
>
> Visit us!
>
> Follow:
>
> 	
>
> twitter <http://twitter.com/#%21/adforminsider>
>
> 	
>
> Visit our blog <http://www.adform.com/site/blog>
>
> Disclaimer: The information contained in this message and attachments 
> is intended solely for the attention and use of the named addressee 
> and may be confidential. If you are not the intended recipient, you 
> are reminded that the information remains the property of the sender. 
> You must not use, disclose, distribute, copy, print or rely on this 
> e-mail. If you have received this message in error, please contact the 
> sender immediately and irrevocably delete this message and any copies.
>
>
> *From:*Cyril Scetbon [mailto:cyril.scetbon@free.fr]
> *Sent:* Wednesday, March 21, 2012 20:33
> *To:* user@cassandra.apache.org
> *Subject:* exception when attempting to truncate a table
>
> Hi,
>
> I'm using version 1.0.7 and when I try to truncate a CF in cqlsh it 
> raises an error message, but it seems incorrect ...
>
> cqlsh:ks1> truncate core;
> Unable to complete request: *one or more nodes were unavailable*.
>
> >nodetool -h localhost ring
>                                                                               
> 155962751505430129087380028406227096917
> 10.0.0.61       DC1         RAC1        Up     Normal  75.8 GB         
> 8.33%   0
> 10.0.0.62       DC1         RAC1        Up     Normal  72.84 GB        
> 8.33%   14178431955039102644307275309657008810
> 10.0.1.61       DC2         RAC1        Up     Normal  72.54 GB        
> 8.33%   28356863910078205288614550619314017621
> 10.0.1.62       DC2         RAC1        Up     Normal  108.11 GB       
> 8.33%   42535295865117307932921825928971026432
> 10.0.0.63       DC1         RAC1        Up     Normal  72.37 GB        
> 8.33%   56713727820156410577229101238628035242
> 10.0.0.64       DC1         RAC1        Up     Normal  72.56 GB        
> 8.33%   70892159775195513221536376548285044053
> 10.0.1.63       DC2         RAC1        Up     Normal  73.09 GB        
> 8.33%   85070591730234615865843651857942052864
> 10.0.1.64       DC2         RAC1        Up     Normal  169.85 GB       
> 8.33%   99249023685273718510150927167599061674
> 10.0.0.65       DC1         RAC1        Up     Normal  72.33 GB        
> 8.33%   113427455640312821154458202477256070485
> 10.0.0.66       DC1         RAC1        Up     Normal  93.12 GB        
> 8.33%   127605887595351923798765477786913079296
> 10.0.1.65       DC2         RAC1        Up     Normal  88.16 GB        
> 8.33%   141784319550391026443072753096570088106
> 10.0.1.66       DC2         RAC1        Up     Normal  92.46 GB        
> 8.33%   155962751505430129087380028406227096917
>
> any idea ?
>
> thanks
>
> -- 
> Cyril SCETBON


-- 
Cyril SCETBON

RE: exception when attempting to truncate a table

Posted by Viktor Jevdokimov <Vi...@adform.com>.

This is a known issue(s) to be fixed (can't find exact tickets on the tracker).

Controller, that receives truncate command, checks all nodes up and send truncate message to all (including itself), waiting for an answer for rpc_timeout_in_ms (will be fixed to separate timeout setting).
If any node timeouts, node unavailable exception will be thrown (this will be fixed to timeout exception).

Why truncate takes a long time? That because all memtables should be flushed to free up commit logs, then truncated sstables should be snapshotted (there will be introduced a setting to disable snapshots). All this takes time for a truncate.

But you can check with a short scan a few moments later after failed truncate that the data is actually truncated.





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  Viktor.Jevdokimov@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]<http://www.adform.com/>

[Visit us!]

Follow:


[twitter]<http://twitter.com/#!/adforminsider>

Visit our blog<http://www.adform.com/site/blog>



Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.

From: Cyril Scetbon [mailto:cyril.scetbon@free.fr]
Sent: Wednesday, March 21, 2012 20:33
To: user@cassandra.apache.org
Subject: exception when attempting to truncate a table

Hi,

I'm using version 1.0.7 and when I try to truncate a CF in cqlsh it raises an error message, but it seems incorrect ...

cqlsh:ks1> truncate core;
Unable to complete request: one or more nodes were unavailable.

>nodetool -h localhost ring
                                                                              155962751505430129087380028406227096917
10.0.0.61       DC1         RAC1        Up     Normal  75.8 GB         8.33%   0
10.0.0.62       DC1         RAC1        Up     Normal  72.84 GB        8.33%   14178431955039102644307275309657008810
10.0.1.61       DC2         RAC1        Up     Normal  72.54 GB        8.33%   28356863910078205288614550619314017621
10.0.1.62       DC2         RAC1        Up     Normal  108.11 GB       8.33%   42535295865117307932921825928971026432
10.0.0.63       DC1         RAC1        Up     Normal  72.37 GB        8.33%   56713727820156410577229101238628035242
10.0.0.64       DC1         RAC1        Up     Normal  72.56 GB        8.33%   70892159775195513221536376548285044053
10.0.1.63       DC2         RAC1        Up     Normal  73.09 GB        8.33%   85070591730234615865843651857942052864
10.0.1.64       DC2         RAC1        Up     Normal  169.85 GB       8.33%   99249023685273718510150927167599061674
10.0.0.65       DC1         RAC1        Up     Normal  72.33 GB        8.33%   113427455640312821154458202477256070485
10.0.0.66       DC1         RAC1        Up     Normal  93.12 GB        8.33%   127605887595351923798765477786913079296
10.0.1.65       DC2         RAC1        Up     Normal  88.16 GB        8.33%   141784319550391026443072753096570088106
10.0.1.66       DC2         RAC1        Up     Normal  92.46 GB        8.33%   155962751505430129087380028406227096917

any idea ?

thanks


--

Cyril SCETBON

Re: exception when attempting to truncate a table

Posted by Ben Coverston <be...@datastax.com>.

run 'show keyspaces' in the cassandra-cli and paste the details for ks1
here.

On Wed, Mar 21, 2012 at 12:33 PM, Cyril Scetbon <cy...@free.fr>wrote:

>  Hi,
>
> I'm using version 1.0.7 and when I try to truncate a CF in cqlsh it raises
> an error message, but it seems incorrect ...
>
> cqlsh:ks1> truncate core;
> Unable to complete request: *one or more nodes were unavailable*.
>
> >nodetool -h localhost ring
>
> 155962751505430129087380028406227096917
> 10.0.0.61       DC1         RAC1        Up     Normal  75.8 GB
> 8.33%   0
> 10.0.0.62       DC1         RAC1        Up     Normal  72.84 GB
> 8.33%   14178431955039102644307275309657008810
> 10.0.1.61       DC2         RAC1        Up     Normal  72.54 GB
> 8.33%   28356863910078205288614550619314017621
> 10.0.1.62       DC2         RAC1        Up     Normal  108.11 GB
> 8.33%   42535295865117307932921825928971026432
> 10.0.0.63       DC1         RAC1        Up     Normal  72.37 GB
> 8.33%   56713727820156410577229101238628035242
> 10.0.0.64       DC1         RAC1        Up     Normal  72.56 GB
> 8.33%   70892159775195513221536376548285044053
> 10.0.1.63       DC2         RAC1        Up     Normal  73.09 GB
> 8.33%   85070591730234615865843651857942052864
> 10.0.1.64       DC2         RAC1        Up     Normal  169.85 GB
> 8.33%   99249023685273718510150927167599061674
> 10.0.0.65       DC1         RAC1        Up     Normal  72.33 GB
> 8.33%   113427455640312821154458202477256070485
> 10.0.0.66       DC1         RAC1        Up     Normal  93.12 GB
> 8.33%   127605887595351923798765477786913079296
> 10.0.1.65       DC2         RAC1        Up     Normal  88.16 GB
> 8.33%   141784319550391026443072753096570088106
> 10.0.1.66       DC2         RAC1        Up     Normal  92.46 GB
> 8.33%   155962751505430129087380028406227096917
>
> any idea ?
>
> thanks
>
> --
> Cyril SCETBON
>
>


-- 
Ben Coverston
DataStax -- The Apache Cassandra Company

RE: exception when attempting to truncate a table

Posted by Richard Lowe <ri...@arkivum.com>.

I'd double-check the firewall on each node to make sure the storage and RPC ports aren't being blocked.

We've found that "Up" in nodetool ring output reflects the gossip status, which means only that one of the nodes can contact the node, not necessarily the entire ring. It's possible for the entire ring to be "Up" but some nodes to be uncontactable by specific nodes due to the local firewall configuration on that node.

For example, if 10.0.0.61 can contact 10.0.1.61 but 10.0.0.62 can't then 10.0.1.61 will still appear as "Up", even though it cannot be contacted by 10.0.0.62.

This is especially relevant in multi-DC environments, where network topology may mean that not all nodes can contact nodes in the other DC. If the network is segregated in this way then you won't be able to use truncate. This seems to have nothing to do with the replication factor of an individual keyspace because truncate requires the entire ring to be available, regardless of whether the keyspace is replicated to all nodes.

I should mention that we're using 0.8.x so the behaviour may be have changed with 1.0.x.

-Richard


From: Cyril Scetbon [mailto:cyril.scetbon@free.fr]
Sent: 21 March 2012 18:33
To: user@cassandra.apache.org
Subject: exception when attempting to truncate a table

Hi,

I'm using version 1.0.7 and when I try to truncate a CF in cqlsh it raises an error message, but it seems incorrect ...

cqlsh:ks1> truncate core;
Unable to complete request: one or more nodes were unavailable.

>nodetool -h localhost ring
                                                                              155962751505430129087380028406227096917
10.0.0.61       DC1         RAC1        Up     Normal  75.8 GB         8.33%   0
10.0.0.62       DC1         RAC1        Up     Normal  72.84 GB        8.33%   14178431955039102644307275309657008810
10.0.1.61       DC2         RAC1        Up     Normal  72.54 GB        8.33%   28356863910078205288614550619314017621
10.0.1.62       DC2         RAC1        Up     Normal  108.11 GB       8.33%   42535295865117307932921825928971026432
10.0.0.63       DC1         RAC1        Up     Normal  72.37 GB        8.33%   56713727820156410577229101238628035242
10.0.0.64       DC1         RAC1        Up     Normal  72.56 GB        8.33%   70892159775195513221536376548285044053
10.0.1.63       DC2         RAC1        Up     Normal  73.09 GB        8.33%   85070591730234615865843651857942052864
10.0.1.64       DC2         RAC1        Up     Normal  169.85 GB       8.33%   99249023685273718510150927167599061674
10.0.0.65       DC1         RAC1        Up     Normal  72.33 GB        8.33%   113427455640312821154458202477256070485
10.0.0.66       DC1         RAC1        Up     Normal  93.12 GB        8.33%   127605887595351923798765477786913079296
10.0.1.65       DC2         RAC1        Up     Normal  88.16 GB        8.33%   141784319550391026443072753096570088106
10.0.1.66       DC2         RAC1        Up     Normal  92.46 GB        8.33%   155962751505430129087380028406227096917

any idea ?

thanks


--

Cyril SCETBON