You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by kant kodali <ka...@gmail.com> on 2016/10/27 08:17:29 UTC

Can we do batch writes on cassandra using flink while leveraging the locality?

Can we do batch writes on Cassandra using Flink while leveraging the
locality? For example the batch writes in Cassandra will put pressure on
the coordinator but since the connectors are built by leveraging the
locality I was wondering if we could do batch of writes on a node where the
batch belongs?

Re: Can we do batch writes on cassandra using flink while leveraging the locality?

Posted by Chesnay Schepler <ch...@apache.org>.
Hello,

the main issue that prevented us from writing batches is that there is a 
server-side limit as to how big a batch may be,
however there was no way to tell how big the batch that you are 
currently building up actually is.

Regarding locality, I'm not sure if a partitioner alone solves this 
issue. While the data and sink instance may be on the right node, the sink
would still have to know which Cassandra instance to write to to 
actually make use of the locality. Never looked to deeply into data 
locality,
so I don't whether/how we would have to change the sink to do that :(

Regards,
Chesnay

On 01.11.2016 20:29, Stephan Ewen wrote:
> Hi!
>
> I do not know the details of how Cassandra supports batched writes, 
> but here are some thoughts:
>
>   - Grouping writes that go to the same partition together into one 
> batch write request makes sense. If you have some sample code for 
> that, it should be not too hard to integrate into the Flink Cassandra 
> connector
>
>   - If you know the partitioning scheme in Cassandra and you use 
> "DataStream.partitionCustom(partitioner, key)" it should result in a 
> way that all write requests from one parallel sink instance go to the 
> same Cassandra node (or a small number of nodes). Would that help?
>
> Greetings,
> Stephan
>
>
>
>
> On Fri, Oct 28, 2016 at 8:57 AM, kant kodali <kanth909@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Spark Cassandra connector does it! but I don't think it really
>     implements a custom partitioner I think it just leverages token
>     aware policy and does batch writes by default within a partition
>     but you can also do across partitions with the same replica!
>
>     On Thu, Oct 27, 2016 at 8:41 AM, Shannon Carey <scarey@expedia.com
>     <ma...@expedia.com>> wrote:
>
>         It certainly seems possible to write a Partitioner that does
>         what you describe. I started implementing one but didn't have
>         time to finish it. I think the main difficulty is in properly
>         dealing with partition ownership changes in Cassandra\u2026 if you
>         are maintaining state in Flink and the partitioning changes,
>         your job might produce inaccurate output. If, on the other
>         hand, you are only using the partitioner just before the
>         output, dynamic partitioning changes might be ok.
>
>
>         From: kant kodali <kanth909@gmail.com <ma...@gmail.com>>
>         Date: Thursday, October 27, 2016 at 3:17 AM
>         To: <user@flink.apache.org <ma...@flink.apache.org>>
>         Subject: Can we do batch writes on cassandra using flink while
>         leveraging the locality?
>
>         locality? For example the batch writes in Cassandra will put
>         pressure on the coordinator but since the connectors are built
>         by leveraging the locality I was wondering if we could do
>         batch of writes on a node where the batch belongs?
>
>
>


Re: Can we do batch writes on cassandra using flink while leveraging the locality?

Posted by Stephan Ewen <se...@apache.org>.
Hi!

I do not know the details of how Cassandra supports batched writes, but
here are some thoughts:

  - Grouping writes that go to the same partition together into one batch
write request makes sense. If you have some sample code for that, it should
be not too hard to integrate into the Flink Cassandra connector

  - If you know the partitioning scheme in Cassandra and you use
"DataStream.partitionCustom(partitioner, key)" it should result in a way
that all write requests from one parallel sink instance go to the same
Cassandra node (or a small number of nodes). Would that help?

Greetings,
Stephan




On Fri, Oct 28, 2016 at 8:57 AM, kant kodali <ka...@gmail.com> wrote:

> Spark Cassandra connector does it! but I don't think it really implements
> a custom partitioner I think it just leverages token aware policy and does
> batch writes by default within a partition but you can also do across
> partitions with the same replica!
>
> On Thu, Oct 27, 2016 at 8:41 AM, Shannon Carey <sc...@expedia.com> wrote:
>
>> It certainly seems possible to write a Partitioner that does what you
>> describe. I started implementing one but didn't have time to finish it. I
>> think the main difficulty is in properly dealing with partition ownership
>> changes in Cassandra… if you are maintaining state in Flink and the
>> partitioning changes, your job might produce inaccurate output. If, on the
>> other hand, you are only using the partitioner just before the output,
>> dynamic partitioning changes might be ok.
>>
>>
>> From: kant kodali <ka...@gmail.com>
>> Date: Thursday, October 27, 2016 at 3:17 AM
>> To: <us...@flink.apache.org>
>> Subject: Can we do batch writes on cassandra using flink while
>> leveraging the locality?
>>
>> locality? For example the batch writes in Cassandra will put pressure on
>> the coordinator but since the connectors are built by leveraging the
>> locality I was wondering if we could do batch of writes on a node where the
>> batch belongs?
>>
>
>

Re: Can we do batch writes on cassandra using flink while leveraging the locality?

Posted by kant kodali <ka...@gmail.com>.
Spark Cassandra connector does it! but I don't think it really implements a
custom partitioner I think it just leverages token aware policy and does
batch writes by default within a partition but you can also do across
partitions with the same replica!

On Thu, Oct 27, 2016 at 8:41 AM, Shannon Carey <sc...@expedia.com> wrote:

> It certainly seems possible to write a Partitioner that does what you
> describe. I started implementing one but didn't have time to finish it. I
> think the main difficulty is in properly dealing with partition ownership
> changes in Cassandra… if you are maintaining state in Flink and the
> partitioning changes, your job might produce inaccurate output. If, on the
> other hand, you are only using the partitioner just before the output,
> dynamic partitioning changes might be ok.
>
>
> From: kant kodali <ka...@gmail.com>
> Date: Thursday, October 27, 2016 at 3:17 AM
> To: <us...@flink.apache.org>
> Subject: Can we do batch writes on cassandra using flink while leveraging
> the locality?
>
> locality? For example the batch writes in Cassandra will put pressure on
> the coordinator but since the connectors are built by leveraging the
> locality I was wondering if we could do batch of writes on a node where the
> batch belongs?
>

Re: Can we do batch writes on cassandra using flink while leveraging the locality?

Posted by Shannon Carey <sc...@expedia.com>.
It certainly seems possible to write a Partitioner that does what you describe. I started implementing one but didn't have time to finish it. I think the main difficulty is in properly dealing with partition ownership changes in Cassandra… if you are maintaining state in Flink and the partitioning changes, your job might produce inaccurate output. If, on the other hand, you are only using the partitioner just before the output, dynamic partitioning changes might be ok.


From: kant kodali <ka...@gmail.com>>
Date: Thursday, October 27, 2016 at 3:17 AM
To: <us...@flink.apache.org>>
Subject: Can we do batch writes on cassandra using flink while leveraging the locality?

locality? For example the batch writes in Cassandra will put pressure on the coordinator but since the connectors are built by leveraging the locality I was wondering if we could do batch of writes on a node where the batch belongs?