You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Moty Kosharovsky <mo...@gmail.com> on 2013/04/15 03:12:45 UTC

re-execution of failed queries with rpc_timeout

Hello,

I'm running a 12 node cluser with cassandra 1.1.5 and oracle jdk 1.6.0_35.
Our application constantly writes large updates with cql. Once in a while,
an rpc_time will occur.

Since a lot of the information is counters, its impossible for me to
understand if the updates complete partially on rpc_timeout, or cassandra
somehow rolls back the change completely, and hence I can't tell if I
should re-execute the query on rpc_timeout (with double processing being a
bigger concern than missing updates).

I am thinking, but unsure of this, that if I'll switch to LOCAL_QUORUM,
rpc_timeout will always mean that the update was not processes as a whole.
In all other cases, the rpc_timeout might be thrown from a remote node (not
the one I'm connected to), and hence some parts of the update will be
performed and others parts will not.

Anyone solved this issue before?

Kind Regards,
Kosha

Re: re-execution of failed queries with rpc_timeout

Posted by Edward Capriolo <ed...@gmail.com>.

Q: The newer versions of Cassandra include extra information in the
exception, I **think** you can use that information to determine how many
machines the operation succeeded on. However I do not think that
information means you can make counters that timed out "bulletproof"


On Tue, Apr 16, 2013 at 5:08 PM, aaron morton <aa...@thelastpickle.com>wrote:

> If you are using Counters you need to do everything you can to avoid
> timeouts. In the worse case we do not know where it has been applied. The
> increment is applied on a lead and then replicated to the others, if the
> coordinator is not  the lead it may not know if the increments was applied
> at all.
>
> Start by reducing the size of the updates. Larger batches do not always
> mean better performance.
>
>  In all other cases, the rpc_timeout might be thrown from a remote node
>> (not the one I'm connected to), and hence some parts of the update will be
>> performed and others parts will not.
>>
> TimedOutException is always thrown from the coordinator you are connected
> to.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 15/04/2013, at 1:38 PM, Moty Kosharovsky <mo...@gmail.com> wrote:
>
> Sorry, not LOCAL QUORUM, I meant "ANY" quorum.
>
>
> On Mon, Apr 15, 2013 at 4:12 AM, Moty Kosharovsky <mo...@gmail.com>wrote:
>
>> Hello,
>>
>> I'm running a 12 node cluser with cassandra 1.1.5 and oracle jdk
>> 1.6.0_35. Our application constantly writes large updates with cql. Once in
>> a while, an rpc_time will occur.
>>
>> Since a lot of the information is counters, its impossible for me to
>> understand if the updates complete partially on rpc_timeout, or cassandra
>> somehow rolls back the change completely, and hence I can't tell if I
>> should re-execute the query on rpc_timeout (with double processing being a
>> bigger concern than missing updates).
>>
>> I am thinking, but unsure of this, that if I'll switch to LOCAL_QUORUM,
>> rpc_timeout will always mean that the update was not processes as a whole.
>> In all other cases, the rpc_timeout might be thrown from a remote node (not
>> the one I'm connected to), and hence some parts of the update will be
>> performed and others parts will not.
>>
>> Anyone solved this issue before?
>>
>> Kind Regards,
>> Kosha
>>
>
>
>

Re: re-execution of failed queries with rpc_timeout

Posted by aaron morton <aa...@thelastpickle.com>.

If you are using Counters you need to do everything you can to avoid timeouts. In the worse case we do not know where it has been applied. The increment is applied on a lead and then replicated to the others, if the coordinator is not  the lead it may not know if the increments was applied at all. 

Start by reducing the size of the updates. Larger batches do not always mean better performance. 

>  In all other cases, the rpc_timeout might be thrown from a remote node (not the one I'm connected to), and hence some parts of the update will be performed and others parts will not.
TimedOutException is always thrown from the coordinator you are connected to. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 15/04/2013, at 1:38 PM, Moty Kosharovsky <mo...@gmail.com> wrote:

> Sorry, not LOCAL QUORUM, I meant "ANY" quorum.
> 
> 
> On Mon, Apr 15, 2013 at 4:12 AM, Moty Kosharovsky <mo...@gmail.com> wrote:
> Hello,
> 
> I'm running a 12 node cluser with cassandra 1.1.5 and oracle jdk 1.6.0_35. Our application constantly writes large updates with cql. Once in a while, an rpc_time will occur.
> 
> Since a lot of the information is counters, its impossible for me to understand if the updates complete partially on rpc_timeout, or cassandra somehow rolls back the change completely, and hence I can't tell if I should re-execute the query on rpc_timeout (with double processing being a bigger concern than missing updates).
> 
> I am thinking, but unsure of this, that if I'll switch to LOCAL_QUORUM, rpc_timeout will always mean that the update was not processes as a whole. In all other cases, the rpc_timeout might be thrown from a remote node (not the one I'm connected to), and hence some parts of the update will be performed and others parts will not.
> 
> Anyone solved this issue before?
> 
> Kind Regards,
> Kosha
>

Re: re-execution of failed queries with rpc_timeout

Posted by Moty Kosharovsky <mo...@gmail.com>.

Sorry, not LOCAL QUORUM, I meant "ANY" quorum.


On Mon, Apr 15, 2013 at 4:12 AM, Moty Kosharovsky <mo...@gmail.com>wrote:

> Hello,
>
> I'm running a 12 node cluser with cassandra 1.1.5 and oracle jdk 1.6.0_35.
> Our application constantly writes large updates with cql. Once in a while,
> an rpc_time will occur.
>
> Since a lot of the information is counters, its impossible for me to
> understand if the updates complete partially on rpc_timeout, or cassandra
> somehow rolls back the change completely, and hence I can't tell if I
> should re-execute the query on rpc_timeout (with double processing being a
> bigger concern than missing updates).
>
> I am thinking, but unsure of this, that if I'll switch to LOCAL_QUORUM,
> rpc_timeout will always mean that the update was not processes as a whole.
> In all other cases, the rpc_timeout might be thrown from a remote node (not
> the one I'm connected to), and hence some parts of the update will be
> performed and others parts will not.
>
> Anyone solved this issue before?
>
> Kind Regards,
> Kosha
>