You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Sonny Heer <so...@gmail.com> on 2010/05/14 18:55:58 UTC

Timeouts running batch_mutate

Hey,

I'm running a map/reduce job, reading from HDFS directory, and
reducing to Cassandra using the batch_mutate method.

The reducer builds the list of rowmutations for a single row, and
calls batch_mutate at the end.  As I move to a larger dataset, i'm
seeing the following exception:

Caused by: TimedOutException()
        at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:15361)
        at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:796)
        at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:772)

I changed the RpcTimeoutInMillis to 60 seconds with no changes.  What
configuration changes should i make when doing intensive write
operations using batch mutate?

Re: Timeouts running batch_mutate

Posted by Sonny Heer <so...@gmail.com>.

meant to say OPP :)

On Thu, May 20, 2010 at 8:21 AM, Sonny Heer <so...@gmail.com> wrote:
> Yes, I'm using OOP, because of the way we modeled our data.  Does
> Cassandra not handle OOP intensive write operations?  Is HBase a
> better approach if one must use OOP?
>
>
> On Thu, May 20, 2010 at 7:41 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>> Are you using OOP?  That will tend to create hot spots like this,
>> which is why most people deploy on RP.
>>
>> If you are using RP you may simply need to add C* capacity, or take
>> TimeoutException as a signal to throttle your activity.
>>
>> On Tue, May 18, 2010 at 4:37 PM, Sonny Heer <so...@gmail.com> wrote:
>>> Yeah there are many writes happening at the same time to any given cass node.
>>>
>>> e.g. assume 10 machines, all running hadoop and cassandra.  The hadoop
>>> nodes are randomly picking a cassandra node and writing directly using
>>> the batch mutate.
>>>
>>> After increasing the timeout even more, i don't get that exception
>>> anymore.  But now getting UnavailableException.
>>>
>>> The wiki states this happens when all the replicas required could be
>>> created and/or read.  How do we resolve this problem?  the write
>>> consistency is one.
>>>
>>> thanks
>>>
>>>
>>> On Sat, May 15, 2010 at 8:02 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>> rpctimeout should be sufficient
>>>>
>>>> you can turn on debug logging to see how long it's actually taking the
>>>> destination node to do the write (or look at cfstats, if no other
>>>> writes are going on)
>>>>
>>>> On Fri, May 14, 2010 at 11:55 AM, Sonny Heer <so...@gmail.com> wrote:
>>>>> Hey,
>>>>>
>>>>> I'm running a map/reduce job, reading from HDFS directory, and
>>>>> reducing to Cassandra using the batch_mutate method.
>>>>>
>>>>> The reducer builds the list of rowmutations for a single row, and
>>>>> calls batch_mutate at the end.  As I move to a larger dataset, i'm
>>>>> seeing the following exception:
>>>>>
>>>>> Caused by: TimedOutException()
>>>>>        at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:15361)
>>>>>        at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:796)
>>>>>        at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:772)
>>>>>
>>>>> I changed the RpcTimeoutInMillis to 60 seconds with no changes.  What
>>>>> configuration changes should i make when doing intensive write
>>>>> operations using batch mutate?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jonathan Ellis
>>>> Project Chair, Apache Cassandra
>>>> co-founder of Riptano, the source for professional Cassandra support
>>>> http://riptano.com
>>>>
>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>

Re: Timeouts running batch_mutate

Posted by Jonathan Ellis <jb...@gmail.com>.

HBase has the same problem.

Your choices are basically (a) figure out a way to not do all writes
sequentially or (b) figure out a way to model w/o OPP.

Most Cassandra users go with option (b).

On Thu, May 20, 2010 at 8:21 AM, Sonny Heer <so...@gmail.com> wrote:
> Yes, I'm using OOP, because of the way we modeled our data.  Does
> Cassandra not handle OOP intensive write operations?  Is HBase a
> better approach if one must use OOP?
>
>
> On Thu, May 20, 2010 at 7:41 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>> Are you using OOP?  That will tend to create hot spots like this,
>> which is why most people deploy on RP.
>>
>> If you are using RP you may simply need to add C* capacity, or take
>> TimeoutException as a signal to throttle your activity.
>>
>> On Tue, May 18, 2010 at 4:37 PM, Sonny Heer <so...@gmail.com> wrote:
>>> Yeah there are many writes happening at the same time to any given cass node.
>>>
>>> e.g. assume 10 machines, all running hadoop and cassandra.  The hadoop
>>> nodes are randomly picking a cassandra node and writing directly using
>>> the batch mutate.
>>>
>>> After increasing the timeout even more, i don't get that exception
>>> anymore.  But now getting UnavailableException.
>>>
>>> The wiki states this happens when all the replicas required could be
>>> created and/or read.  How do we resolve this problem?  the write
>>> consistency is one.
>>>
>>> thanks
>>>
>>>
>>> On Sat, May 15, 2010 at 8:02 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>> rpctimeout should be sufficient
>>>>
>>>> you can turn on debug logging to see how long it's actually taking the
>>>> destination node to do the write (or look at cfstats, if no other
>>>> writes are going on)
>>>>
>>>> On Fri, May 14, 2010 at 11:55 AM, Sonny Heer <so...@gmail.com> wrote:
>>>>> Hey,
>>>>>
>>>>> I'm running a map/reduce job, reading from HDFS directory, and
>>>>> reducing to Cassandra using the batch_mutate method.
>>>>>
>>>>> The reducer builds the list of rowmutations for a single row, and
>>>>> calls batch_mutate at the end.  As I move to a larger dataset, i'm
>>>>> seeing the following exception:
>>>>>
>>>>> Caused by: TimedOutException()
>>>>>        at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:15361)
>>>>>        at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:796)
>>>>>        at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:772)
>>>>>
>>>>> I changed the RpcTimeoutInMillis to 60 seconds with no changes.  What
>>>>> configuration changes should i make when doing intensive write
>>>>> operations using batch mutate?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jonathan Ellis
>>>> Project Chair, Apache Cassandra
>>>> co-founder of Riptano, the source for professional Cassandra support
>>>> http://riptano.com
>>>>
>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Timeouts running batch_mutate

Posted by Sonny Heer <so...@gmail.com>.

Yes, I'm using OOP, because of the way we modeled our data.  Does
Cassandra not handle OOP intensive write operations?  Is HBase a
better approach if one must use OOP?


On Thu, May 20, 2010 at 7:41 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> Are you using OOP?  That will tend to create hot spots like this,
> which is why most people deploy on RP.
>
> If you are using RP you may simply need to add C* capacity, or take
> TimeoutException as a signal to throttle your activity.
>
> On Tue, May 18, 2010 at 4:37 PM, Sonny Heer <so...@gmail.com> wrote:
>> Yeah there are many writes happening at the same time to any given cass node.
>>
>> e.g. assume 10 machines, all running hadoop and cassandra.  The hadoop
>> nodes are randomly picking a cassandra node and writing directly using
>> the batch mutate.
>>
>> After increasing the timeout even more, i don't get that exception
>> anymore.  But now getting UnavailableException.
>>
>> The wiki states this happens when all the replicas required could be
>> created and/or read.  How do we resolve this problem?  the write
>> consistency is one.
>>
>> thanks
>>
>>
>> On Sat, May 15, 2010 at 8:02 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> rpctimeout should be sufficient
>>>
>>> you can turn on debug logging to see how long it's actually taking the
>>> destination node to do the write (or look at cfstats, if no other
>>> writes are going on)
>>>
>>> On Fri, May 14, 2010 at 11:55 AM, Sonny Heer <so...@gmail.com> wrote:
>>>> Hey,
>>>>
>>>> I'm running a map/reduce job, reading from HDFS directory, and
>>>> reducing to Cassandra using the batch_mutate method.
>>>>
>>>> The reducer builds the list of rowmutations for a single row, and
>>>> calls batch_mutate at the end.  As I move to a larger dataset, i'm
>>>> seeing the following exception:
>>>>
>>>> Caused by: TimedOutException()
>>>>        at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:15361)
>>>>        at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:796)
>>>>        at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:772)
>>>>
>>>> I changed the RpcTimeoutInMillis to 60 seconds with no changes.  What
>>>> configuration changes should i make when doing intensive write
>>>> operations using batch mutate?
>>>>
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra support
>>> http://riptano.com
>>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Timeouts running batch_mutate

Posted by Jonathan Ellis <jb...@gmail.com>.

Are you using OOP?  That will tend to create hot spots like this,
which is why most people deploy on RP.

If you are using RP you may simply need to add C* capacity, or take
TimeoutException as a signal to throttle your activity.

On Tue, May 18, 2010 at 4:37 PM, Sonny Heer <so...@gmail.com> wrote:
> Yeah there are many writes happening at the same time to any given cass node.
>
> e.g. assume 10 machines, all running hadoop and cassandra.  The hadoop
> nodes are randomly picking a cassandra node and writing directly using
> the batch mutate.
>
> After increasing the timeout even more, i don't get that exception
> anymore.  But now getting UnavailableException.
>
> The wiki states this happens when all the replicas required could be
> created and/or read.  How do we resolve this problem?  the write
> consistency is one.
>
> thanks
>
>
> On Sat, May 15, 2010 at 8:02 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>> rpctimeout should be sufficient
>>
>> you can turn on debug logging to see how long it's actually taking the
>> destination node to do the write (or look at cfstats, if no other
>> writes are going on)
>>
>> On Fri, May 14, 2010 at 11:55 AM, Sonny Heer <so...@gmail.com> wrote:
>>> Hey,
>>>
>>> I'm running a map/reduce job, reading from HDFS directory, and
>>> reducing to Cassandra using the batch_mutate method.
>>>
>>> The reducer builds the list of rowmutations for a single row, and
>>> calls batch_mutate at the end.  As I move to a larger dataset, i'm
>>> seeing the following exception:
>>>
>>> Caused by: TimedOutException()
>>>        at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:15361)
>>>        at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:796)
>>>        at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:772)
>>>
>>> I changed the RpcTimeoutInMillis to 60 seconds with no changes.  What
>>> configuration changes should i make when doing intensive write
>>> operations using batch mutate?
>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Timeouts running batch_mutate

Posted by Sonny Heer <so...@gmail.com>.

Yeah there are many writes happening at the same time to any given cass node.

e.g. assume 10 machines, all running hadoop and cassandra.  The hadoop
nodes are randomly picking a cassandra node and writing directly using
the batch mutate.

After increasing the timeout even more, i don't get that exception
anymore.  But now getting UnavailableException.

The wiki states this happens when all the replicas required could be
created and/or read.  How do we resolve this problem?  the write
consistency is one.

thanks


On Sat, May 15, 2010 at 8:02 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> rpctimeout should be sufficient
>
> you can turn on debug logging to see how long it's actually taking the
> destination node to do the write (or look at cfstats, if no other
> writes are going on)
>
> On Fri, May 14, 2010 at 11:55 AM, Sonny Heer <so...@gmail.com> wrote:
>> Hey,
>>
>> I'm running a map/reduce job, reading from HDFS directory, and
>> reducing to Cassandra using the batch_mutate method.
>>
>> The reducer builds the list of rowmutations for a single row, and
>> calls batch_mutate at the end.  As I move to a larger dataset, i'm
>> seeing the following exception:
>>
>> Caused by: TimedOutException()
>>        at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:15361)
>>        at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:796)
>>        at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:772)
>>
>> I changed the RpcTimeoutInMillis to 60 seconds with no changes.  What
>> configuration changes should i make when doing intensive write
>> operations using batch mutate?
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Timeouts running batch_mutate

Posted by Jonathan Ellis <jb...@gmail.com>.

rpctimeout should be sufficient

you can turn on debug logging to see how long it's actually taking the
destination node to do the write (or look at cfstats, if no other
writes are going on)

On Fri, May 14, 2010 at 11:55 AM, Sonny Heer <so...@gmail.com> wrote:
> Hey,
>
> I'm running a map/reduce job, reading from HDFS directory, and
> reducing to Cassandra using the batch_mutate method.
>
> The reducer builds the list of rowmutations for a single row, and
> calls batch_mutate at the end.  As I move to a larger dataset, i'm
> seeing the following exception:
>
> Caused by: TimedOutException()
>        at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:15361)
>        at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:796)
>        at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:772)
>
> I changed the RpcTimeoutInMillis to 60 seconds with no changes.  What
> configuration changes should i make when doing intensive write
> operations using batch mutate?
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com