You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Gerard Maas <ge...@gmail.com> on 2015/09/22 16:54:58 UTC

To batch or not to batch: A question for fast inserts

General advice advocates for individual async inserts as the fastest way to
insert data into Cassandra. Our insertion mechanism is based on that model
and recently we have been evaluating performance, looking to measure and
optimize our ingestion rate.

I side-tracked some punctual benchmarks and stumbled on the observations of
unlogged inserts being *A LOT* faster than the async counterparts.

In our tests, unlogged batch shows increased throughput and lower cluster
CPU usage, so I'm wondering where the tradeoff might be.

I compiled those observations in this document that I'm sharing and opening
up for comments.  Are we observing some artifact or should we set the
record straight for unlogged batches to achieve better insertion throughput?

https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI

Let me know.

Kind regards,

Gerard.

Re: To batch or not to batch: A question for fast inserts

Posted by Graham Sanderson <gr...@vast.com>.

We are about to prototype upgrading our batch inserts, so I’m really glad about this thread… we are able to saturate our dedicated network links from hadoop when inserting via thrift API (Astyanax) - at the time we wrote that code CQL wasn’t there.

Reasons to replace our current solution:

1) We don’t have GC problems, but the Thrift lack of streaming (or the C* decision not to have Quaid turn it on)… definitely means we allocate a lot more memory on the server than we should
2) Lack of token awareness.
3) Thrift is not given any love.

Tuning the batch size is key - apart from the performance sweet spot, we had a bug at first where we submitted batches by size not number of mutations - we store “key frame” and then deltas of our data, so we were batching up to about 50k mutations in the latter case, which might be OK, but is horrible if one mutation fails!

> On Sep 27, 2015, at 11:45 AM, Gerard Maas <ge...@gmail.com> wrote:
> 
> Hi Eric, Ryan,
> 
> Thanks a lot for your insights. I got more than I hoped for in this discussion.
> I'll further improve our code to include the replica-awareness and will compare that to the previous tests.
> 
> That snipped of code is really helpful. Thanks.
> 
> I have not been in the list long enough to have read the discussion you mention. And search delivers too many results. Any pointer to that? 
> What are the doom scenarios that unlogged batch would bring along? 
> 
> When we ported the results of our tests to our application logic, the Cassandra cluster CPU load dropped 66% compared to the same ingest rate using single statement async loading. We are certainly a step further in this area.
> 
> I'll drop our findings in a blog post. When I did my initial research, I didn't find any material that would support unlogged batch as being an alternative for performance insertion. The blogosphere seems to support async as the only approach. As Ryan mentioned, that is case dependent and I think devs should be exposed to the pro's and con's of every alternative to enable them to evaluate the best approach for their particular scenario.
> 
> Thanks a ton!
> 
> Kind regards, Gerard.
> 
> On Fri, Sep 25, 2015 at 10:04 PM, Eric Stevens <mightye@gmail.com <ma...@gmail.com>> wrote:
> Yep, my approach is definitely naive to hotspotting.  If someone had that trouble, they could exhaust the iterator out of getReplicas() and distribute their writes more evenly (which might result in better statement distribution, but wouldn't change the workload on the cluster).  In the end they're going to get in trouble with hotspotting regardless of async single statements or batches.  The single statement async code prefers the first replica returned, so this logic is consistent with the default model.
> 
> > Lots of folks are still stuck on maximum utilization, ironically these same people tend to focus on using spindles for storage and so will ultimately end up having to throttle ingest to allow compaction to catch up
> 
> Yeah, otherwise known cost sensitivity, with the unfortunate side effect of making it easy to accidentally overwhelm a cluster as a new operator since the warning signs look different than they do for most other data stores.  
> 
> Straying a bit far afield here, but I actually think it would be a nice feature if by default Cassandra artificially throttled writes as compaction starts getting behind as an early warning sign (a feature you could turn off with a flag).  Cassandra does a great job of absorbing bursty writes, but unfortunately that masks (for the new operator) the warning signs that your sustained write rate is more than the cluster can handle.  Writes are still fast so you assume the cluster is healthy, and by the time there's backpressure to the client, you're already possibly past the point of simple recovery (eg you no longer have enough excess IO to support bootstrapping new nodes).  That would also actually free up some I/O to keep the cluster from tipping over so hard.
> 
> On Fri, Sep 25, 2015 at 12:14 PM, Ryan Svihla <rs@foundev.pro <ma...@foundev.pro>> wrote:
> 
> I think my main point is still, unlogged token aware batches are great, but if you’re writes are large enough, they may actually hurt rather than help, and likewise if your writes are too small, async only is likely only going to hurt. I’d say the average user I’ve had to help (with my selection bias) has individual writes already on the large size of optimal so batching frequently hurts them. Also they tend not to do async in the first place.
> 
> In summary, batch or not is IMO the wrong area to focus, total write payload sizing for your cluster is the factor to focus on and however you get there is fantastic. more replies inline:
> 
>> On Sep 25, 2015, at 1:24 PM, Eric Stevens <mightye@gmail.com <ma...@gmail.com>> wrote:
>> 
>> > compaction usually is the limiter for most clusters, so the difference between async versus unlogged batch ends up being minor or worse..non existent cause the hardware and data model combination result in compaction being the main throttle.
>> 
>> If your number of records to load per second is predetermined (as would be the case in any production use case), then this doesn't make any difference on compaction whether loaded as batches vs as single statements, your cluster needs to support the same number and shape of mutates either way.
> 
> Not everyone is as grown up about their cluster sizing. Lots of folks are still stuck on maximum utilization, ironically these same people tend to focus on using spindles for storage and so will ultimately end up having to throttle ingest to allow compaction to catch up. Anyway in these admittedly awful situations throttling of ingest is all too common as the commit log can basically easily outstrip compaction. 
> 
>> 
>> > if you add in token awareness to your batch..you’ve basically eliminated the primary complaint of using unlogged batches so why not do that. 
>> 
>> This is absolutely the right idea if your driver supports it, but the gain is smaller than I would have expected based on the warnings of imminent doom when we've had this conversation before.  If your driver supports token awareness, use that to group statements by primary replica and concurrently execute those that way.  Here's the code we're using (in Scala using the Java driver):
>> def groupByFirstReplica()(implicit session: CQLSession): Map[Host, CQLBatch] = {
>>   val meta = session.getCluster.getMetadata
>>   statements.groupBy { st =>
>>     try {
>>       meta.getReplicas(st.getKeyspace, st.getRoutingKey).iterator().next
>>     } catch { case NonFatal(e) =>
>>       null
>>     }
>>   } mapValues { st => CQLBatch(st) }
>> }
>> We now have a map of primary host to sub-batch for all the statements in our logical batch.  We can now do either of these (depending on how greedy we want to be in our client; Future.traverse is preferred and nicer, Future.sequence is greedier and more resource intensive):
>> Future.sequence(groupByFirstReplica().values.map(_.execute())).map(_.flatten)
>> Future.traverse(groupByFirstReplica().values) { _.execute() }.map(_.flatten)
>> We get back Future[Iterable[ResultSet]] - this future completes when the logical batch's sub-batches have all completed.
>> 
>> Note that with the DSE Java driver, for the above to succeed in its intent, the statements need to be prepared statements (for st.getRoutingKey to return non-null), and either the keyspace has to be fully defined in the CQL, or you have to have set the correct keyspace when you created the connection (for st.getKeyspace to return non-null).  Otherwise the values given to meta.getReplicas will fail to resolve a primary host which results in doing token-unaware batches (i.e. you'll get back a Map(null -> allStatements)).  However those same criteria are required for single statements to be token aware.
>> 
> 
> This is excellent stuff, my only concern with primary replicas is for people with uneven partitions, and the occasionally stupidly fat one. I’d rather spread those writes around the other replicas instead of beating up the primary one. However, for a well modeled partition key the approach you outline is probably optimal.
> 
>> 
>> 
>> 
>> On Fri, Sep 25, 2015 at 7:30 AM, Ryan Svihla <rs@foundev.pro <ma...@foundev.pro>> wrote:
>> Generally this is all correct but I cannot emphasize enough how much this “just depends” and today I generally move people to async inserts first before trying to micro-optimize some things to keep in mind.
>> 
>> compaction usually is the limiter for most clusters, so the difference between async versus unlogged batch ends up being minor or worse..non existent cause the hardware and data model combination result in compaction being the main throttle.
>> if you add in token awareness to your batch..you’ve basically eliminated the primary complaint of using unlogged batches so why not do that. When I was at DataStax I made some similar suggestions for token aware batch after seeing the perf improvements with Spark writes using unlogged batch. Several others did as well so I’m not the first one with this idea.
>> write size makes in my experience the largest difference BY FAR about which is faster. and the number is largely irrelevant compared to the total payload size. Depending on the hardware and etc a good rule of thumb is writes below 1k bytes tend to get really inefficient and writes that are over 100k tend to slow down total throughput. I’ll reemphasize this magic number has been different on almost every cluster I’ve tuned.
>> 
>> In summary all this means is, too small or too large of writes are slow, and unlogged batches may involve some extra hops, if you eliminate the extra hops by token awareness then it just comes down to write size optimization.
>> 
>>> On Sep 24, 2015, at 5:18 PM, Eric Stevens <mightye@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> > I side-tracked some punctual benchmarks and stumbled on the observations of unlogged inserts being *A LOT* faster than the async counterparts.
>>> 
>>> My own testing agrees very strongly with this.  When this topic came up on this list before, there was a concern that batch coordination produces GC pressure in your cluster because you're involving nodes which aren't strictly speaking necessary to be involved.  
>>> 
>>> Our own testing shows some small impact on this front, but really lightweight GC tuning mitigated the effects by putting a little more room in Xmn (if you're still on CMS garbage collector).  On G1GC (which is what we run in production) we weren't able to measure a difference. 
>>> 
>>> Our testing shows data loads being as much as 5x to 8x faster when using small concurrent batches over using single statements concurrently.  We tried three different concurrency models.
>>> 
>>> To save on coordinator overhead, we group the statements in our "batch" by replica (using the functionality exposed by the DataStax Java driver), and do essentially token aware batching.  This still has a small amount of additional coordinator overhead (since the data size of the unit of work is larger, and sits in memory in the coordinator longer).  We've been running this way successfully for months with sustained rates north of 50,000 mutates per second.  We burst much higher.
>>> 
>>> Through trial and error we determined we got diminishing returns in the realm of 100 statements per token-aware batch.  It looks like your own data bears that out as well.  I'm sure that's workload dependent though.
>>> 
>>> I've been disagreed with on this topic in this list in the past despite the numbers I was able to post.  Nobody has shown me numbers (nor anything else concrete) that contradict my position though, so I stand by it.  There's no question in my mind, if your mutates are of any significant volume and you care about the performance of them, token aware unlogged batching is the right strategy.  When we reduce our batch sizes or switch to single async statements, we fall over immediately.  
>>> 
>>> On Tue, Sep 22, 2015 at 7:54 AM, Gerard Maas <gerard.maas@gmail.com <ma...@gmail.com>> wrote:
>>> General advice advocates for individual async inserts as the fastest way to insert data into Cassandra. Our insertion mechanism is based on that model and recently we have been evaluating performance, looking to measure and optimize our ingestion rate.
>>> 
>>> I side-tracked some punctual benchmarks and stumbled on the observations of unlogged inserts being *A LOT* faster than the async counterparts.
>>> 
>>> In our tests, unlogged batch shows increased throughput and lower cluster CPU usage, so I'm wondering where the tradeoff might be.
>>> 
>>> I compiled those observations in this document that I'm sharing and opening up for comments.  Are we observing some artifact or should we set the record straight for unlogged batches to achieve better insertion throughput?
>>> 
>>> https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI <https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI>
>>> 
>>> Let me know.
>>> 
>>> Kind regards, 
>>> 
>>> Gerard.
>>> 
>> 
>> Regards,
>> 
>> Ryan Svihla
>> 
>> 
> 
> 
>

Re: To batch or not to batch: A question for fast inserts

Posted by Gerard Maas <ge...@gmail.com>.

Hi Eric, Ryan,

Thanks a lot for your insights. I got more than I hoped for in this
discussion.
I'll further improve our code to include the replica-awareness and will
compare that to the previous tests.

That snipped of code is really helpful. Thanks.

I have not been in the list long enough to have read the discussion you
mention. And search delivers too many results. Any pointer to that?
What are the doom scenarios that unlogged batch would bring along?

When we ported the results of our tests to our application logic, the
Cassandra cluster CPU load dropped 66% compared to the same ingest rate
using single statement async loading. We are certainly a step further in
this area.

I'll drop our findings in a blog post. When I did my initial research, I
didn't find any material that would support unlogged batch as being an
alternative for performance insertion. The blogosphere seems to support
async as the only approach. As Ryan mentioned, that is case dependent and I
think devs should be exposed to the pro's and con's of every alternative to
enable them to evaluate the best approach for their particular scenario.

Thanks a ton!

Kind regards, Gerard.

On Fri, Sep 25, 2015 at 10:04 PM, Eric Stevens <mi...@gmail.com> wrote:

> Yep, my approach is definitely naive to hotspotting.  If someone had that
> trouble, they could exhaust the iterator out of getReplicas() and
> distribute their writes more evenly (which might result in better statement
> distribution, but wouldn't change the workload on the cluster).  In the end
> they're going to get in trouble with hotspotting regardless of async single
> statements or batches.  The single statement async code prefers the first
> replica returned, so this logic is consistent with the default model.
>
> > Lots of folks are still stuck on maximum utilization, ironically these
> same people tend to focus on using spindles for storage and so will
> ultimately end up having to throttle ingest to allow compaction to catch up
>
> Yeah, otherwise known cost sensitivity, with the unfortunate side effect
> of making it easy to accidentally overwhelm a cluster as a new operator
> since the warning signs look different than they do for most other data
> stores.
>
> Straying a bit far afield here, but I actually think it would be a nice
> feature if *by default* Cassandra artificially throttled writes as
> compaction starts getting behind as an early warning sign (a feature you
> could turn off with a flag).  Cassandra does a great job of absorbing
> bursty writes, but unfortunately that masks (for the new operator) the
> warning signs that your sustained write rate is more than the cluster can
> handle.  Writes are still fast so you assume the cluster is healthy, and by
> the time there's backpressure to the client, you're already possibly past
> the point of simple recovery (eg you no longer have enough excess IO to
> support bootstrapping new nodes).  That would also actually free up some
> I/O to keep the cluster from tipping over so hard.
>
> On Fri, Sep 25, 2015 at 12:14 PM, Ryan Svihla <rs...@foundev.pro> wrote:
>
>>
>> I think my main point is still, unlogged token aware batches are great,
>> but if you’re writes are large enough, they may actually hurt rather than
>> help, and likewise if your writes are too small, async only is likely only
>> going to hurt. I’d say the average user I’ve had to help (with my selection
>> bias) has individual writes already on the large size of optimal so
>> batching frequently hurts them. Also they tend not to do async in the first
>> place.
>>
>> In summary, batch or not is IMO the wrong area to focus, total write
>> payload sizing for your cluster is the factor to focus on and however you
>> get there is fantastic. more replies inline:
>>
>> On Sep 25, 2015, at 1:24 PM, Eric Stevens <mi...@gmail.com> wrote:
>>
>> > compaction usually is the limiter for most clusters, so the difference
>> between async versus unlogged batch ends up being minor or worse..non
>> existent cause the hardware and data model combination result in compaction
>> being the main throttle.
>>
>> If your number of records to load per second is predetermined (as would
>> be the case in any production use case), then this doesn't make any
>> difference on compaction whether loaded as batches vs as single statements,
>> your cluster needs to support the same number and shape of mutates either
>> way.
>>
>>
>> Not everyone is as grown up about their cluster sizing. Lots of folks are
>> still stuck on maximum utilization, ironically these same people tend to
>> focus on using spindles for storage and so will ultimately end up having to
>> throttle ingest to allow compaction to catch up. Anyway in these admittedly
>> awful situations throttling of ingest is all too common as the commit log
>> can basically easily outstrip compaction.
>>
>>
>> > if you add in token awareness to your batch..you’ve basically
>> eliminated the primary complaint of using unlogged batches so why not do
>> that.
>>
>> This is absolutely the right idea if your driver supports it, but the
>> gain is smaller than I would have expected based on the warnings
>> of imminent doom when we've had this conversation before.  If your driver
>> supports token awareness, use that to group statements by primary replica
>> and concurrently execute those that way.  Here's the code we're using (in
>> Scala using the Java driver):
>>
>> def groupByFirstReplica()(implicit session: CQLSession): Map[Host, CQLBatch] = {
>>   val meta = session.getCluster.getMetadata
>>   statements.groupBy { st =>
>>     try {
>>       meta.getReplicas(st.getKeyspace, st.getRoutingKey).iterator().next
>>     } catch { case NonFatal(e) =>
>>       null
>>     }
>>   } mapValues { st => CQLBatch(st) }
>> }
>>
>> We now have a map of primary host to sub-batch for all the statements in
>> our logical batch.  We can now do either of these (depending on how greedy
>> we want to be in our client; Future.traverse is preferred and nicer,
>> Future.sequence is greedier and more resource intensive):
>>
>> Future.sequence(groupByFirstReplica().values.map(_.execute())).map(_.flatten)
>> Future.traverse(groupByFirstReplica().values) { _.execute() }.map(_.flatten)
>>
>> We get back Future[Iterable[ResultSet]] - this future completes when the
>> logical batch's sub-batches have all completed.
>>
>> Note that with the DSE Java driver, for the above to succeed in its
>> intent, the statements need to be prepared statements (for st.getRoutingKey
>> to return non-null), and either the keyspace has to be fully defined in the
>> CQL, or you have to have set the correct keyspace when you created the
>> connection (for st.getKeyspace to return non-null).  Otherwise the values
>> given to meta.getReplicas will fail to resolve a primary host which results
>> in doing token-unaware batches (i.e. you'll get back a Map(null ->
>> allStatements)).  However those same criteria are required for single
>> statements to be token aware.
>>
>>
>> This is excellent stuff, my only concern with primary replicas is for
>> people with uneven partitions, and the occasionally stupidly fat one. I’d
>> rather spread those writes around the other replicas instead of beating up
>> the primary one. However, for a well modeled partition key the approach you
>> outline is probably optimal.
>>
>>
>>
>>
>> On Fri, Sep 25, 2015 at 7:30 AM, Ryan Svihla <rs...@foundev.pro> wrote:
>>
>>> Generally this is all correct but I cannot emphasize enough how much
>>> this “just depends” and today I generally move people to async inserts
>>> first before trying to micro-optimize some things to keep in mind.
>>>
>>>
>>>    - compaction usually is the limiter for most clusters, so the
>>>    difference between async versus unlogged batch ends up being minor or
>>>    worse..non existent cause the hardware and data model combination result in
>>>    compaction being the main throttle.
>>>    - if you add in token awareness to your batch..you’ve basically
>>>    eliminated the primary complaint of using unlogged batches so why not do
>>>    that. When I was at DataStax I made some similar suggestions for token
>>>    aware batch after seeing the perf improvements with Spark writes using
>>>    unlogged batch. Several others did as well so I’m not the first one with
>>>    this idea.
>>>    - write size makes in my experience the largest difference BY FAR
>>>    about which is faster. and the number is largely irrelevant compared to the
>>>    total payload size. Depending on the hardware and etc a good rule of thumb
>>>    is writes below 1k bytes tend to get really inefficient and writes that are
>>>    over 100k tend to slow down total throughput. I’ll reemphasize this magic
>>>    number has been different on almost every cluster I’ve tuned.
>>>
>>>
>>> In summary all this means is, too small or too large of writes are slow,
>>> and unlogged batches may involve some extra hops, if you eliminate the
>>> extra hops by token awareness then it just comes down to write size
>>> optimization.
>>>
>>> On Sep 24, 2015, at 5:18 PM, Eric Stevens <mi...@gmail.com> wrote:
>>>
>>> > I side-tracked some punctual benchmarks and stumbled on the
>>> observations of unlogged inserts being *A LOT* faster than the async
>>> counterparts.
>>>
>>> My own testing agrees very strongly with this.  When this topic came up
>>> on this list before, there was a concern that batch coordination produces
>>> GC pressure in your cluster because you're involving nodes which aren't *strictly
>>> speaking* necessary to be involved.
>>>
>>> Our own testing shows some small impact on this front, but really
>>> lightweight GC tuning mitigated the effects by putting a little more room
>>> in Xmn (if you're still on CMS garbage collector).  On G1GC (which is what
>>> we run in production) we weren't able to measure a difference.
>>>
>>> Our testing shows data loads being as much as 5x to 8x faster when using
>>> small concurrent batches over using single statements concurrently.  We
>>> tried three different concurrency models.
>>>
>>> To save on coordinator overhead, we group the statements in our "batch"
>>> by replica (using the functionality exposed by the DataStax Java driver),
>>> and do essentially token aware batching.  This still has a *small* amount
>>> of additional coordinator overhead (since the data size of the unit of work
>>> is larger, and sits in memory in the coordinator longer).  We've been
>>> running this way successfully for months with *sustained* rates north
>>> of 50,000 mutates per second.  We burst *much* higher.
>>>
>>> Through trial and error we determined we got diminishing returns in the
>>> realm of 100 statements per token-aware batch.  It looks like your own data
>>> bears that out as well.  I'm sure that's workload dependent though.
>>>
>>> I've been disagreed with on this topic in this list in the past despite
>>> the numbers I was able to post.  Nobody has shown me numbers (nor anything
>>> else concrete) that contradict my position though, so I stand by it.  There's
>>> no question in my mind, if your mutates are of any significant volume and
>>> you care about the performance of them, token aware unlogged batching is
>>> the right strategy.  When we reduce our batch sizes or switch to single
>>> async statements, we fall over immediately.
>>>
>>> On Tue, Sep 22, 2015 at 7:54 AM, Gerard Maas <ge...@gmail.com>
>>> wrote:
>>>
>>>> General advice advocates for individual async inserts as the fastest
>>>> way to insert data into Cassandra. Our insertion mechanism is based on that
>>>> model and recently we have been evaluating performance, looking to measure
>>>> and optimize our ingestion rate.
>>>>
>>>> I side-tracked some punctual benchmarks and stumbled on the
>>>> observations of unlogged inserts being *A LOT* faster than the async
>>>> counterparts.
>>>>
>>>> In our tests, unlogged batch shows increased throughput and lower
>>>> cluster CPU usage, so I'm wondering where the tradeoff might be.
>>>>
>>>> I compiled those observations in this document that I'm sharing and
>>>> opening up for comments.  Are we observing some artifact or should we set
>>>> the record straight for unlogged batches to achieve better insertion
>>>> throughput?
>>>>
>>>>
>>>> https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI
>>>>
>>>> Let me know.
>>>>
>>>> Kind regards,
>>>>
>>>> Gerard.
>>>>
>>>
>>>
>>> Regards,
>>>
>>> Ryan Svihla
>>>
>>>
>>
>>
>

Re: To batch or not to batch: A question for fast inserts

Posted by Eric Stevens <mi...@gmail.com>.

Yep, my approach is definitely naive to hotspotting.  If someone had that
trouble, they could exhaust the iterator out of getReplicas() and
distribute their writes more evenly (which might result in better statement
distribution, but wouldn't change the workload on the cluster).  In the end
they're going to get in trouble with hotspotting regardless of async single
statements or batches.  The single statement async code prefers the first
replica returned, so this logic is consistent with the default model.

> Lots of folks are still stuck on maximum utilization, ironically these
same people tend to focus on using spindles for storage and so will
ultimately end up having to throttle ingest to allow compaction to catch up

Yeah, otherwise known cost sensitivity, with the unfortunate side effect of
making it easy to accidentally overwhelm a cluster as a new operator since
the warning signs look different than they do for most other data stores.

Straying a bit far afield here, but I actually think it would be a nice
feature if *by default* Cassandra artificially throttled writes as
compaction starts getting behind as an early warning sign (a feature you
could turn off with a flag).  Cassandra does a great job of absorbing
bursty writes, but unfortunately that masks (for the new operator) the
warning signs that your sustained write rate is more than the cluster can
handle.  Writes are still fast so you assume the cluster is healthy, and by
the time there's backpressure to the client, you're already possibly past
the point of simple recovery (eg you no longer have enough excess IO to
support bootstrapping new nodes).  That would also actually free up some
I/O to keep the cluster from tipping over so hard.

On Fri, Sep 25, 2015 at 12:14 PM, Ryan Svihla <rs...@foundev.pro> wrote:

>
> I think my main point is still, unlogged token aware batches are great,
> but if you’re writes are large enough, they may actually hurt rather than
> help, and likewise if your writes are too small, async only is likely only
> going to hurt. I’d say the average user I’ve had to help (with my selection
> bias) has individual writes already on the large size of optimal so
> batching frequently hurts them. Also they tend not to do async in the first
> place.
>
> In summary, batch or not is IMO the wrong area to focus, total write
> payload sizing for your cluster is the factor to focus on and however you
> get there is fantastic. more replies inline:
>
> On Sep 25, 2015, at 1:24 PM, Eric Stevens <mi...@gmail.com> wrote:
>
> > compaction usually is the limiter for most clusters, so the difference
> between async versus unlogged batch ends up being minor or worse..non
> existent cause the hardware and data model combination result in compaction
> being the main throttle.
>
> If your number of records to load per second is predetermined (as would be
> the case in any production use case), then this doesn't make any difference
> on compaction whether loaded as batches vs as single statements, your
> cluster needs to support the same number and shape of mutates either way.
>
>
> Not everyone is as grown up about their cluster sizing. Lots of folks are
> still stuck on maximum utilization, ironically these same people tend to
> focus on using spindles for storage and so will ultimately end up having to
> throttle ingest to allow compaction to catch up. Anyway in these admittedly
> awful situations throttling of ingest is all too common as the commit log
> can basically easily outstrip compaction.
>
>
> > if you add in token awareness to your batch..you’ve basically
> eliminated the primary complaint of using unlogged batches so why not do
> that.
>
> This is absolutely the right idea if your driver supports it, but the gain
> is smaller than I would have expected based on the warnings
> of imminent doom when we've had this conversation before.  If your driver
> supports token awareness, use that to group statements by primary replica
> and concurrently execute those that way.  Here's the code we're using (in
> Scala using the Java driver):
>
> def groupByFirstReplica()(implicit session: CQLSession): Map[Host, CQLBatch] = {
>   val meta = session.getCluster.getMetadata
>   statements.groupBy { st =>
>     try {
>       meta.getReplicas(st.getKeyspace, st.getRoutingKey).iterator().next
>     } catch { case NonFatal(e) =>
>       null
>     }
>   } mapValues { st => CQLBatch(st) }
> }
>
> We now have a map of primary host to sub-batch for all the statements in
> our logical batch.  We can now do either of these (depending on how greedy
> we want to be in our client; Future.traverse is preferred and nicer,
> Future.sequence is greedier and more resource intensive):
>
> Future.sequence(groupByFirstReplica().values.map(_.execute())).map(_.flatten)
> Future.traverse(groupByFirstReplica().values) { _.execute() }.map(_.flatten)
>
> We get back Future[Iterable[ResultSet]] - this future completes when the
> logical batch's sub-batches have all completed.
>
> Note that with the DSE Java driver, for the above to succeed in its
> intent, the statements need to be prepared statements (for st.getRoutingKey
> to return non-null), and either the keyspace has to be fully defined in the
> CQL, or you have to have set the correct keyspace when you created the
> connection (for st.getKeyspace to return non-null).  Otherwise the values
> given to meta.getReplicas will fail to resolve a primary host which results
> in doing token-unaware batches (i.e. you'll get back a Map(null ->
> allStatements)).  However those same criteria are required for single
> statements to be token aware.
>
>
> This is excellent stuff, my only concern with primary replicas is for
> people with uneven partitions, and the occasionally stupidly fat one. I’d
> rather spread those writes around the other replicas instead of beating up
> the primary one. However, for a well modeled partition key the approach you
> outline is probably optimal.
>
>
>
>
> On Fri, Sep 25, 2015 at 7:30 AM, Ryan Svihla <rs...@foundev.pro> wrote:
>
>> Generally this is all correct but I cannot emphasize enough how much this
>> “just depends” and today I generally move people to async inserts first
>> before trying to micro-optimize some things to keep in mind.
>>
>>
>>    - compaction usually is the limiter for most clusters, so the
>>    difference between async versus unlogged batch ends up being minor or
>>    worse..non existent cause the hardware and data model combination result in
>>    compaction being the main throttle.
>>    - if you add in token awareness to your batch..you’ve basically
>>    eliminated the primary complaint of using unlogged batches so why not do
>>    that. When I was at DataStax I made some similar suggestions for token
>>    aware batch after seeing the perf improvements with Spark writes using
>>    unlogged batch. Several others did as well so I’m not the first one with
>>    this idea.
>>    - write size makes in my experience the largest difference BY FAR
>>    about which is faster. and the number is largely irrelevant compared to the
>>    total payload size. Depending on the hardware and etc a good rule of thumb
>>    is writes below 1k bytes tend to get really inefficient and writes that are
>>    over 100k tend to slow down total throughput. I’ll reemphasize this magic
>>    number has been different on almost every cluster I’ve tuned.
>>
>>
>> In summary all this means is, too small or too large of writes are slow,
>> and unlogged batches may involve some extra hops, if you eliminate the
>> extra hops by token awareness then it just comes down to write size
>> optimization.
>>
>> On Sep 24, 2015, at 5:18 PM, Eric Stevens <mi...@gmail.com> wrote:
>>
>> > I side-tracked some punctual benchmarks and stumbled on the
>> observations of unlogged inserts being *A LOT* faster than the async
>> counterparts.
>>
>> My own testing agrees very strongly with this.  When this topic came up
>> on this list before, there was a concern that batch coordination produces
>> GC pressure in your cluster because you're involving nodes which aren't *strictly
>> speaking* necessary to be involved.
>>
>> Our own testing shows some small impact on this front, but really
>> lightweight GC tuning mitigated the effects by putting a little more room
>> in Xmn (if you're still on CMS garbage collector).  On G1GC (which is what
>> we run in production) we weren't able to measure a difference.
>>
>> Our testing shows data loads being as much as 5x to 8x faster when using
>> small concurrent batches over using single statements concurrently.  We
>> tried three different concurrency models.
>>
>> To save on coordinator overhead, we group the statements in our "batch"
>> by replica (using the functionality exposed by the DataStax Java driver),
>> and do essentially token aware batching.  This still has a *small* amount
>> of additional coordinator overhead (since the data size of the unit of work
>> is larger, and sits in memory in the coordinator longer).  We've been
>> running this way successfully for months with *sustained* rates north of
>> 50,000 mutates per second.  We burst *much* higher.
>>
>> Through trial and error we determined we got diminishing returns in the
>> realm of 100 statements per token-aware batch.  It looks like your own data
>> bears that out as well.  I'm sure that's workload dependent though.
>>
>> I've been disagreed with on this topic in this list in the past despite
>> the numbers I was able to post.  Nobody has shown me numbers (nor anything
>> else concrete) that contradict my position though, so I stand by it.  There's
>> no question in my mind, if your mutates are of any significant volume and
>> you care about the performance of them, token aware unlogged batching is
>> the right strategy.  When we reduce our batch sizes or switch to single
>> async statements, we fall over immediately.
>>
>> On Tue, Sep 22, 2015 at 7:54 AM, Gerard Maas <ge...@gmail.com>
>> wrote:
>>
>>> General advice advocates for individual async inserts as the fastest way
>>> to insert data into Cassandra. Our insertion mechanism is based on that
>>> model and recently we have been evaluating performance, looking to measure
>>> and optimize our ingestion rate.
>>>
>>> I side-tracked some punctual benchmarks and stumbled on the observations
>>> of unlogged inserts being *A LOT* faster than the async counterparts.
>>>
>>> In our tests, unlogged batch shows increased throughput and lower
>>> cluster CPU usage, so I'm wondering where the tradeoff might be.
>>>
>>> I compiled those observations in this document that I'm sharing and
>>> opening up for comments.  Are we observing some artifact or should we set
>>> the record straight for unlogged batches to achieve better insertion
>>> throughput?
>>>
>>>
>>> https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI
>>>
>>> Let me know.
>>>
>>> Kind regards,
>>>
>>> Gerard.
>>>
>>
>>
>> Regards,
>>
>> Ryan Svihla
>>
>>
>
>

Re: To batch or not to batch: A question for fast inserts

Posted by Ryan Svihla <rs...@foundev.pro>.

I think my main point is still, unlogged token aware batches are great, but if you’re writes are large enough, they may actually hurt rather than help, and likewise if your writes are too small, async only is likely only going to hurt. I’d say the average user I’ve had to help (with my selection bias) has individual writes already on the large size of optimal so batching frequently hurts them. Also they tend not to do async in the first place.

In summary, batch or not is IMO the wrong area to focus, total write payload sizing for your cluster is the factor to focus on and however you get there is fantastic. more replies inline:

> On Sep 25, 2015, at 1:24 PM, Eric Stevens <mi...@gmail.com> wrote:
> 
> > compaction usually is the limiter for most clusters, so the difference between async versus unlogged batch ends up being minor or worse..non existent cause the hardware and data model combination result in compaction being the main throttle.
> 
> If your number of records to load per second is predetermined (as would be the case in any production use case), then this doesn't make any difference on compaction whether loaded as batches vs as single statements, your cluster needs to support the same number and shape of mutates either way.

Not everyone is as grown up about their cluster sizing. Lots of folks are still stuck on maximum utilization, ironically these same people tend to focus on using spindles for storage and so will ultimately end up having to throttle ingest to allow compaction to catch up. Anyway in these admittedly awful situations throttling of ingest is all too common as the commit log can basically easily outstrip compaction. 

> 
> > if you add in token awareness to your batch..you’ve basically eliminated the primary complaint of using unlogged batches so why not do that. 
> 
> This is absolutely the right idea if your driver supports it, but the gain is smaller than I would have expected based on the warnings of imminent doom when we've had this conversation before.  If your driver supports token awareness, use that to group statements by primary replica and concurrently execute those that way.  Here's the code we're using (in Scala using the Java driver):
> def groupByFirstReplica()(implicit session: CQLSession): Map[Host, CQLBatch] = {
>   val meta = session.getCluster.getMetadata
>   statements.groupBy { st =>
>     try {
>       meta.getReplicas(st.getKeyspace, st.getRoutingKey).iterator().next
>     } catch { case NonFatal(e) =>
>       null
>     }
>   } mapValues { st => CQLBatch(st) }
> }
> We now have a map of primary host to sub-batch for all the statements in our logical batch.  We can now do either of these (depending on how greedy we want to be in our client; Future.traverse is preferred and nicer, Future.sequence is greedier and more resource intensive):
> Future.sequence(groupByFirstReplica().values.map(_.execute())).map(_.flatten)
> Future.traverse(groupByFirstReplica().values) { _.execute() }.map(_.flatten)
> We get back Future[Iterable[ResultSet]] - this future completes when the logical batch's sub-batches have all completed.
> 
> Note that with the DSE Java driver, for the above to succeed in its intent, the statements need to be prepared statements (for st.getRoutingKey to return non-null), and either the keyspace has to be fully defined in the CQL, or you have to have set the correct keyspace when you created the connection (for st.getKeyspace to return non-null).  Otherwise the values given to meta.getReplicas will fail to resolve a primary host which results in doing token-unaware batches (i.e. you'll get back a Map(null -> allStatements)).  However those same criteria are required for single statements to be token aware.
> 

This is excellent stuff, my only concern with primary replicas is for people with uneven partitions, and the occasionally stupidly fat one. I’d rather spread those writes around the other replicas instead of beating up the primary one. However, for a well modeled partition key the approach you outline is probably optimal.

> 
> 
> 
> On Fri, Sep 25, 2015 at 7:30 AM, Ryan Svihla <rs@foundev.pro <ma...@foundev.pro>> wrote:
> Generally this is all correct but I cannot emphasize enough how much this “just depends” and today I generally move people to async inserts first before trying to micro-optimize some things to keep in mind.
> 
> compaction usually is the limiter for most clusters, so the difference between async versus unlogged batch ends up being minor or worse..non existent cause the hardware and data model combination result in compaction being the main throttle.
> if you add in token awareness to your batch..you’ve basically eliminated the primary complaint of using unlogged batches so why not do that. When I was at DataStax I made some similar suggestions for token aware batch after seeing the perf improvements with Spark writes using unlogged batch. Several others did as well so I’m not the first one with this idea.
> write size makes in my experience the largest difference BY FAR about which is faster. and the number is largely irrelevant compared to the total payload size. Depending on the hardware and etc a good rule of thumb is writes below 1k bytes tend to get really inefficient and writes that are over 100k tend to slow down total throughput. I’ll reemphasize this magic number has been different on almost every cluster I’ve tuned.
> 
> In summary all this means is, too small or too large of writes are slow, and unlogged batches may involve some extra hops, if you eliminate the extra hops by token awareness then it just comes down to write size optimization.
> 
>> On Sep 24, 2015, at 5:18 PM, Eric Stevens <mightye@gmail.com <ma...@gmail.com>> wrote:
>> 
>> > I side-tracked some punctual benchmarks and stumbled on the observations of unlogged inserts being *A LOT* faster than the async counterparts.
>> 
>> My own testing agrees very strongly with this.  When this topic came up on this list before, there was a concern that batch coordination produces GC pressure in your cluster because you're involving nodes which aren't strictly speaking necessary to be involved.  
>> 
>> Our own testing shows some small impact on this front, but really lightweight GC tuning mitigated the effects by putting a little more room in Xmn (if you're still on CMS garbage collector).  On G1GC (which is what we run in production) we weren't able to measure a difference. 
>> 
>> Our testing shows data loads being as much as 5x to 8x faster when using small concurrent batches over using single statements concurrently.  We tried three different concurrency models.
>> 
>> To save on coordinator overhead, we group the statements in our "batch" by replica (using the functionality exposed by the DataStax Java driver), and do essentially token aware batching.  This still has a small amount of additional coordinator overhead (since the data size of the unit of work is larger, and sits in memory in the coordinator longer).  We've been running this way successfully for months with sustained rates north of 50,000 mutates per second.  We burst much higher.
>> 
>> Through trial and error we determined we got diminishing returns in the realm of 100 statements per token-aware batch.  It looks like your own data bears that out as well.  I'm sure that's workload dependent though.
>> 
>> I've been disagreed with on this topic in this list in the past despite the numbers I was able to post.  Nobody has shown me numbers (nor anything else concrete) that contradict my position though, so I stand by it.  There's no question in my mind, if your mutates are of any significant volume and you care about the performance of them, token aware unlogged batching is the right strategy.  When we reduce our batch sizes or switch to single async statements, we fall over immediately.  
>> 
>> On Tue, Sep 22, 2015 at 7:54 AM, Gerard Maas <gerard.maas@gmail.com <ma...@gmail.com>> wrote:
>> General advice advocates for individual async inserts as the fastest way to insert data into Cassandra. Our insertion mechanism is based on that model and recently we have been evaluating performance, looking to measure and optimize our ingestion rate.
>> 
>> I side-tracked some punctual benchmarks and stumbled on the observations of unlogged inserts being *A LOT* faster than the async counterparts.
>> 
>> In our tests, unlogged batch shows increased throughput and lower cluster CPU usage, so I'm wondering where the tradeoff might be.
>> 
>> I compiled those observations in this document that I'm sharing and opening up for comments.  Are we observing some artifact or should we set the record straight for unlogged batches to achieve better insertion throughput?
>> 
>> https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI <https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI>
>> 
>> Let me know.
>> 
>> Kind regards, 
>> 
>> Gerard.
>> 
> 
> Regards,
> 
> Ryan Svihla
> 
>

Re: To batch or not to batch: A question for fast inserts

Posted by Eric Stevens <mi...@gmail.com>.

> compaction usually is the limiter for most clusters, so the difference
between async versus unlogged batch ends up being minor or worse..non
existent cause the hardware and data model combination result in compaction
being the main throttle.

If your number of records to load per second is predetermined (as would be
the case in any production use case), then this doesn't make any difference
on compaction whether loaded as batches vs as single statements, your
cluster needs to support the same number and shape of mutates either way.

> if you add in token awareness to your batch..you’ve basically eliminated
the primary complaint of using unlogged batches so why not do that.

This is absolutely the right idea if your driver supports it, but the gain
is smaller than I would have expected based on the warnings
of imminent doom when we've had this conversation before.  If your driver
supports token awareness, use that to group statements by primary replica
and concurrently execute those that way.  Here's the code we're using (in
Scala using the Java driver):

def groupByFirstReplica()(implicit session: CQLSession): Map[Host, CQLBatch] = {
  val meta = session.getCluster.getMetadata
  statements.groupBy { st =>
    try {
      meta.getReplicas(st.getKeyspace, st.getRoutingKey).iterator().next
    } catch { case NonFatal(e) =>
      null
    }
  } mapValues { st => CQLBatch(st) }
}

We now have a map of primary host to sub-batch for all the statements in
our logical batch.  We can now do either of these (depending on how greedy
we want to be in our client; Future.traverse is preferred and nicer,
Future.sequence is greedier and more resource intensive):

Future.sequence(groupByFirstReplica().values.map(_.execute())).map(_.flatten)
Future.traverse(groupByFirstReplica().values) { _.execute() }.map(_.flatten)

We get back Future[Iterable[ResultSet]] - this future completes when the
logical batch's sub-batches have all completed.

Note that with the DSE Java driver, for the above to succeed in its intent,
the statements need to be prepared statements (for st.getRoutingKey to
return non-null), and either the keyspace has to be fully defined in the
CQL, or you have to have set the correct keyspace when you created the
connection (for st.getKeyspace to return non-null).  Otherwise the values
given to meta.getReplicas will fail to resolve a primary host which results
in doing token-unaware batches (i.e. you'll get back a Map(null ->
allStatements)).  However those same criteria are required for single
statements to be token aware.




On Fri, Sep 25, 2015 at 7:30 AM, Ryan Svihla <rs...@foundev.pro> wrote:

> Generally this is all correct but I cannot emphasize enough how much this
> “just depends” and today I generally move people to async inserts first
> before trying to micro-optimize some things to keep in mind.
>
>
>    - compaction usually is the limiter for most clusters, so the
>    difference between async versus unlogged batch ends up being minor or
>    worse..non existent cause the hardware and data model combination result in
>    compaction being the main throttle.
>    - if you add in token awareness to your batch..you’ve basically
>    eliminated the primary complaint of using unlogged batches so why not do
>    that. When I was at DataStax I made some similar suggestions for token
>    aware batch after seeing the perf improvements with Spark writes using
>    unlogged batch. Several others did as well so I’m not the first one with
>    this idea.
>    - write size makes in my experience the largest difference BY FAR
>    about which is faster. and the number is largely irrelevant compared to the
>    total payload size. Depending on the hardware and etc a good rule of thumb
>    is writes below 1k bytes tend to get really inefficient and writes that are
>    over 100k tend to slow down total throughput. I’ll reemphasize this magic
>    number has been different on almost every cluster I’ve tuned.
>
>
> In summary all this means is, too small or too large of writes are slow,
> and unlogged batches may involve some extra hops, if you eliminate the
> extra hops by token awareness then it just comes down to write size
> optimization.
>
> On Sep 24, 2015, at 5:18 PM, Eric Stevens <mi...@gmail.com> wrote:
>
> > I side-tracked some punctual benchmarks and stumbled on the
> observations of unlogged inserts being *A LOT* faster than the async
> counterparts.
>
> My own testing agrees very strongly with this.  When this topic came up on
> this list before, there was a concern that batch coordination produces GC
> pressure in your cluster because you're involving nodes which aren't *strictly
> speaking* necessary to be involved.
>
> Our own testing shows some small impact on this front, but really
> lightweight GC tuning mitigated the effects by putting a little more room
> in Xmn (if you're still on CMS garbage collector).  On G1GC (which is what
> we run in production) we weren't able to measure a difference.
>
> Our testing shows data loads being as much as 5x to 8x faster when using
> small concurrent batches over using single statements concurrently.  We
> tried three different concurrency models.
>
> To save on coordinator overhead, we group the statements in our "batch" by
> replica (using the functionality exposed by the DataStax Java driver), and
> do essentially token aware batching.  This still has a *small* amount of
> additional coordinator overhead (since the data size of the unit of work is
> larger, and sits in memory in the coordinator longer).  We've been running
> this way successfully for months with *sustained* rates north of 50,000
> mutates per second.  We burst *much* higher.
>
> Through trial and error we determined we got diminishing returns in the
> realm of 100 statements per token-aware batch.  It looks like your own data
> bears that out as well.  I'm sure that's workload dependent though.
>
> I've been disagreed with on this topic in this list in the past despite
> the numbers I was able to post.  Nobody has shown me numbers (nor anything
> else concrete) that contradict my position though, so I stand by it.  There's
> no question in my mind, if your mutates are of any significant volume and
> you care about the performance of them, token aware unlogged batching is
> the right strategy.  When we reduce our batch sizes or switch to single
> async statements, we fall over immediately.
>
> On Tue, Sep 22, 2015 at 7:54 AM, Gerard Maas <ge...@gmail.com>
> wrote:
>
>> General advice advocates for individual async inserts as the fastest way
>> to insert data into Cassandra. Our insertion mechanism is based on that
>> model and recently we have been evaluating performance, looking to measure
>> and optimize our ingestion rate.
>>
>> I side-tracked some punctual benchmarks and stumbled on the observations
>> of unlogged inserts being *A LOT* faster than the async counterparts.
>>
>> In our tests, unlogged batch shows increased throughput and lower cluster
>> CPU usage, so I'm wondering where the tradeoff might be.
>>
>> I compiled those observations in this document that I'm sharing and
>> opening up for comments.  Are we observing some artifact or should we set
>> the record straight for unlogged batches to achieve better insertion
>> throughput?
>>
>>
>> https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI
>>
>> Let me know.
>>
>> Kind regards,
>>
>> Gerard.
>>
>
>
> Regards,
>
> Ryan Svihla
>
>

Re: To batch or not to batch: A question for fast inserts

Posted by Ryan Svihla <rs...@foundev.pro>.

Generally this is all correct but I cannot emphasize enough how much this “just depends” and today I generally move people to async inserts first before trying to micro-optimize some things to keep in mind.

compaction usually is the limiter for most clusters, so the difference between async versus unlogged batch ends up being minor or worse..non existent cause the hardware and data model combination result in compaction being the main throttle.
if you add in token awareness to your batch..you’ve basically eliminated the primary complaint of using unlogged batches so why not do that. When I was at DataStax I made some similar suggestions for token aware batch after seeing the perf improvements with Spark writes using unlogged batch. Several others did as well so I’m not the first one with this idea.
write size makes in my experience the largest difference BY FAR about which is faster. and the number is largely irrelevant compared to the total payload size. Depending on the hardware and etc a good rule of thumb is writes below 1k bytes tend to get really inefficient and writes that are over 100k tend to slow down total throughput. I’ll reemphasize this magic number has been different on almost every cluster I’ve tuned.

In summary all this means is, too small or too large of writes are slow, and unlogged batches may involve some extra hops, if you eliminate the extra hops by token awareness then it just comes down to write size optimization.

> On Sep 24, 2015, at 5:18 PM, Eric Stevens <mi...@gmail.com> wrote:
> 
> > I side-tracked some punctual benchmarks and stumbled on the observations of unlogged inserts being *A LOT* faster than the async counterparts.
> 
> My own testing agrees very strongly with this.  When this topic came up on this list before, there was a concern that batch coordination produces GC pressure in your cluster because you're involving nodes which aren't strictly speaking necessary to be involved.  
> 
> Our own testing shows some small impact on this front, but really lightweight GC tuning mitigated the effects by putting a little more room in Xmn (if you're still on CMS garbage collector).  On G1GC (which is what we run in production) we weren't able to measure a difference. 
> 
> Our testing shows data loads being as much as 5x to 8x faster when using small concurrent batches over using single statements concurrently.  We tried three different concurrency models.
> 
> To save on coordinator overhead, we group the statements in our "batch" by replica (using the functionality exposed by the DataStax Java driver), and do essentially token aware batching.  This still has a small amount of additional coordinator overhead (since the data size of the unit of work is larger, and sits in memory in the coordinator longer).  We've been running this way successfully for months with sustained rates north of 50,000 mutates per second.  We burst much higher.
> 
> Through trial and error we determined we got diminishing returns in the realm of 100 statements per token-aware batch.  It looks like your own data bears that out as well.  I'm sure that's workload dependent though.
> 
> I've been disagreed with on this topic in this list in the past despite the numbers I was able to post.  Nobody has shown me numbers (nor anything else concrete) that contradict my position though, so I stand by it.  There's no question in my mind, if your mutates are of any significant volume and you care about the performance of them, token aware unlogged batching is the right strategy.  When we reduce our batch sizes or switch to single async statements, we fall over immediately.  
> 
> On Tue, Sep 22, 2015 at 7:54 AM, Gerard Maas <gerard.maas@gmail.com <ma...@gmail.com>> wrote:
> General advice advocates for individual async inserts as the fastest way to insert data into Cassandra. Our insertion mechanism is based on that model and recently we have been evaluating performance, looking to measure and optimize our ingestion rate.
> 
> I side-tracked some punctual benchmarks and stumbled on the observations of unlogged inserts being *A LOT* faster than the async counterparts.
> 
> In our tests, unlogged batch shows increased throughput and lower cluster CPU usage, so I'm wondering where the tradeoff might be.
> 
> I compiled those observations in this document that I'm sharing and opening up for comments.  Are we observing some artifact or should we set the record straight for unlogged batches to achieve better insertion throughput?
> 
> https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI <https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI>
> 
> Let me know.
> 
> Kind regards, 
> 
> Gerard.
> 

Regards,

Ryan Svihla

Re: To batch or not to batch: A question for fast inserts

Posted by Eric Stevens <mi...@gmail.com>.

> I side-tracked some punctual benchmarks and stumbled on the observations
of unlogged inserts being *A LOT* faster than the async counterparts.

My own testing agrees very strongly with this.  When this topic came up on
this list before, there was a concern that batch coordination produces GC
pressure in your cluster because you're involving nodes which aren't *strictly
speaking* necessary to be involved.

Our own testing shows some small impact on this front, but really
lightweight GC tuning mitigated the effects by putting a little more room
in Xmn (if you're still on CMS garbage collector).  On G1GC (which is what
we run in production) we weren't able to measure a difference.

Our testing shows data loads being as much as 5x to 8x faster when using
small concurrent batches over using single statements concurrently.  We
tried three different concurrency models.

To save on coordinator overhead, we group the statements in our "batch" by
replica (using the functionality exposed by the DataStax Java driver), and
do essentially token aware batching.  This still has a *small* amount of
additional coordinator overhead (since the data size of the unit of work is
larger, and sits in memory in the coordinator longer).  We've been running
this way successfully for months with *sustained* rates north of 50,000
mutates per second.  We burst *much* higher.

Through trial and error we determined we got diminishing returns in the
realm of 100 statements per token-aware batch.  It looks like your own data
bears that out as well.  I'm sure that's workload dependent though.

I've been disagreed with on this topic in this list in the past despite the
numbers I was able to post.  Nobody has shown me numbers (nor anything else
concrete) that contradict my position though, so I stand by it.  There's no
question in my mind, if your mutates are of any significant volume and you
care about the performance of them, token aware unlogged batching is the
right strategy.  When we reduce our batch sizes or switch to single async
statements, we fall over immediately.

On Tue, Sep 22, 2015 at 7:54 AM, Gerard Maas <ge...@gmail.com> wrote:

> General advice advocates for individual async inserts as the fastest way
> to insert data into Cassandra. Our insertion mechanism is based on that
> model and recently we have been evaluating performance, looking to measure
> and optimize our ingestion rate.
>
> I side-tracked some punctual benchmarks and stumbled on the observations
> of unlogged inserts being *A LOT* faster than the async counterparts.
>
> In our tests, unlogged batch shows increased throughput and lower cluster
> CPU usage, so I'm wondering where the tradeoff might be.
>
> I compiled those observations in this document that I'm sharing and
> opening up for comments.  Are we observing some artifact or should we set
> the record straight for unlogged batches to achieve better insertion
> throughput?
>
>
> https://docs.google.com/document/d/1qSIJ46cmjKggxm1yxboI-KhYJh1gnA6RK-FkfUg6FrI
>
> Let me know.
>
> Kind regards,
>
> Gerard.
>