You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Andrey Ilinykh <ai...@gmail.com> on 2012/12/07 18:24:04 UTC

Re: Batch mutation streaming

Cassandra uses thrift messages to pass data to and from server. A batch is
just a convenient way to create such message. Nothing happens until you
send this message. Probably, this is what you call "close the batch".

Thank you,
  Andrey


On Fri, Dec 7, 2012 at 5:34 AM, Ben Hood <0x...@gmail.com> wrote:

> Hi,
>
> I'd like my app to stream a large number of events into Cassandra that
> originate from the same network input stream. If I create one batch
> mutation, can I just keep appending events to the Cassandra batch until I'm
> done, or are there some practical considerations about doing this (e.g. too
> much stuff buffering up on the client or server side, visibility of the
> data within the batch that hasn't been closed by the client yet)? Barring
> any discussion about atomicity, if I were able to stream a largish source
> into Cassandra, what would happen if the client crashed and didn't close
> the batch? Or is this kind of thing just a normal occurrence that Cassandra
> has to be aware of anyway?
>
> Cheers,
>
> Ben

Re: Batch mutation streaming

Posted by Ben Hood <0x...@gmail.com>.

Hey Aaron,

That sounds sensible - thanks for the heads up.

Cheers,

Ben

On Dec 10, 2012, at 0:47, aaron morton <aa...@thelastpickle.com> wrote:

>> (and if the message is being decoded on the server site as a complete message, then presumably the same resident memory consumption applies there too).
> Yerp. 
> And every row mutation in your batch becomes a task in the Mutation thread pool. If one replica gets 500 row mutations from one client request it will take a while for the (default) 32 threads to chew through them. While this is going on other client request will be effectively blocked. 
> 
> Depending on the number of clients, I would start with say 50 rows per mutation and keep and eye of the *request* latency. 
> 
> Hope that helps. 
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 9/12/2012, at 7:18 AM, Ben Hood <0x...@gmail.com> wrote:
> 
>> Thanks for the clarification Andrey. If that is the case, I had better ensure that I don't put the entire contents of a very long input stream into a single batch, since that is presumably going to cause a very large message to accumulate on the client side (and if the message is being decoded on the server site as a complete message, then presumably the same resident memory consumption applies there too).
>> 
>> Cheers,
>> 
>> 
>> Ben
>> 
>> On Dec 7, 2012, at 17:24, Andrey Ilinykh <ai...@gmail.com> wrote:
>> 
>>> Cassandra uses thrift messages to pass data to and from server. A batch is just a convenient way to create such message. Nothing happens until you send this message. Probably, this is what you call "close the batch".
>>> 
>>> Thank you,
>>>   Andrey
>>> 
>>> 
>>> On Fri, Dec 7, 2012 at 5:34 AM, Ben Hood <0x...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> I'd like my app to stream a large number of events into Cassandra that originate from the same network input stream. If I create one batch mutation, can I just keep appending events to the Cassandra batch until I'm done, or are there some practical considerations about doing this (e.g. too much stuff buffering up on the client or server side, visibility of the data within the batch that hasn't been closed by the client yet)? Barring any discussion about atomicity, if I were able to stream a largish source into Cassandra, what would happen if the client crashed and didn't close the batch? Or is this kind of thing just a normal occurrence that Cassandra has to be aware of anyway?
>>>> 
>>>> Cheers,
>>>> 
>>>> Ben
>

Re: Batch mutation streaming

Posted by aaron morton <aa...@thelastpickle.com>.

> (and if the message is being decoded on the server site as a complete message, then presumably the same resident memory consumption applies there too).
Yerp. 
And every row mutation in your batch becomes a task in the Mutation thread pool. If one replica gets 500 row mutations from one client request it will take a while for the (default) 32 threads to chew through them. While this is going on other client request will be effectively blocked. 

Depending on the number of clients, I would start with say 50 rows per mutation and keep and eye of the *request* latency. 

Hope that helps. 


-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/12/2012, at 7:18 AM, Ben Hood <0x...@gmail.com> wrote:

> Thanks for the clarification Andrey. If that is the case, I had better ensure that I don't put the entire contents of a very long input stream into a single batch, since that is presumably going to cause a very large message to accumulate on the client side (and if the message is being decoded on the server site as a complete message, then presumably the same resident memory consumption applies there too).
> 
> Cheers,
> 
> 
> Ben
> 
> On Dec 7, 2012, at 17:24, Andrey Ilinykh <ai...@gmail.com> wrote:
> 
>> Cassandra uses thrift messages to pass data to and from server. A batch is just a convenient way to create such message. Nothing happens until you send this message. Probably, this is what you call "close the batch".
>> 
>> Thank you,
>>   Andrey
>> 
>> 
>> On Fri, Dec 7, 2012 at 5:34 AM, Ben Hood <0x...@gmail.com> wrote:
>> Hi,
>> 
>> I'd like my app to stream a large number of events into Cassandra that originate from the same network input stream. If I create one batch mutation, can I just keep appending events to the Cassandra batch until I'm done, or are there some practical considerations about doing this (e.g. too much stuff buffering up on the client or server side, visibility of the data within the batch that hasn't been closed by the client yet)? Barring any discussion about atomicity, if I were able to stream a largish source into Cassandra, what would happen if the client crashed and didn't close the batch? Or is this kind of thing just a normal occurrence that Cassandra has to be aware of anyway?
>> 
>> Cheers,
>> 
>> Ben
>>

Re: Batch mutation streaming

Posted by Ben Hood <0x...@gmail.com>.

Thanks for the clarification Andrey. If that is the case, I had better ensure that I don't put the entire contents of a very long input stream into a single batch, since that is presumably going to cause a very large message to accumulate on the client side (and if the message is being decoded on the server site as a complete message, then presumably the same resident memory consumption applies there too).

Cheers,

Ben

On Dec 7, 2012, at 17:24, Andrey Ilinykh <ai...@gmail.com> wrote:

> Cassandra uses thrift messages to pass data to and from server. A batch is just a convenient way to create such message. Nothing happens until you send this message. Probably, this is what you call "close the batch".
> 
> Thank you,
>   Andrey
> 
> 
> On Fri, Dec 7, 2012 at 5:34 AM, Ben Hood <0x...@gmail.com> wrote:
>> Hi,
>> 
>> I'd like my app to stream a large number of events into Cassandra that originate from the same network input stream. If I create one batch mutation, can I just keep appending events to the Cassandra batch until I'm done, or are there some practical considerations about doing this (e.g. too much stuff buffering up on the client or server side, visibility of the data within the batch that hasn't been closed by the client yet)? Barring any discussion about atomicity, if I were able to stream a largish source into Cassandra, what would happen if the client crashed and didn't close the batch? Or is this kind of thing just a normal occurrence that Cassandra has to be aware of anyway?
>> 
>> Cheers,
>> 
>> Ben
>