You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by onmstester onmstester <on...@zoho.com> on 2018/03/18 08:23:43 UTC

Cassandra client tuning

I need to insert some millions records in seconds in Cassandra. Using one client with asyncExecute with folllowing configs:

maxConnectionsPerHost = 5

maxRequestsPerHost = 32K

maxAsyncQueue at client side = 100K


I could achieve  25% of throughtput i needed, client CPU is more than 80% and increasing number of threads cause some execAsync to fail, so configs above are the best the client could handle. Cassandra nodes cpu is less than 30% in average. The data has no locality in sake of partition keys and i can't use createSStable mechanism. Is there any tuning which i'm missing in client side, cause the server side is already tuned with datastax recomendations.

Sent using Zoho Mail

Re: Cassandra client tuning

Posted by Ben Slater <be...@instaclustr.com>.

“* 1000 statements in in each batch” sounds like you are doing batching in
both cases. I wouldn't expect things to get better with larger sizes than
that. We’ve generally found more like 100 is the sweet spot but I’m sure it’s
data specific.

On Sun, 18 Mar 2018 at 21:17 onmstester onmstester <on...@zoho.com>
wrote:

> I'm using a queue of 100 ExecuteAsyncs * 1000 statements in in each batch
> = 100K insert queue in non-batch scenario.
> Using more than 1000 statememnts per batch throws batch limit exception
> and some documents recommend no to change batch_size_limit??!
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ---- On Sun, 18 Mar 2018 13:14:54 +0330 *Ben Slater
> <ben.slater@instaclustr.com <be...@instaclustr.com>>* wrote ----
>
> When you say batch was worth than async in terms of throughput are you
> comparing throughput with the same number of threads or something? I would
> have thought if you have much less CPU usage on the client with batching
> and your Cassandra cluster doesn’t sound terribly stressed then there is
> room to increase threads on the client to up throughput (unless your
> bottlenecked on IO or something)?
>
> On Sun, 18 Mar 2018 at 20:27 onmstester onmstester <on...@zoho.com>
> wrote:
>
> --
>
>
> *Ben Slater*
> *Chief Product Officer <https://www.instaclustr.com/>*
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
>    <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> Input data does not preserve good locality and I've already tested batch
> insert, it was worse than executeAsync in case of throughput but much less
> CPU usage at client side.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ---- On Sun, 18 Mar 2018 12:46:02 +0330 *Ben Slater
> <ben.slater@instaclustr.com <be...@instaclustr.com>>* wrote ----
>
>
> You will probably find grouping writes into small batches improves overall
> performance (if you are not doing it already). See the following
> presentation for some more info:
> https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes
>
> Cheers
> Ben
>
> On Sun, 18 Mar 2018 at 19:23 onmstester onmstester <on...@zoho.com>
> wrote:
>
> --
>
>
> *Ben Slater**Chief Product Officer <https://www.instaclustr.com/>*
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
>    <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> I need to insert some millions records in seconds in Cassandra. Using one
> client with asyncExecute with folllowing configs:
> maxConnectionsPerHost = 5
> maxRequestsPerHost = 32K
> maxAsyncQueue at client side = 100K
>
> I could achieve  25% of throughtput i needed, client CPU is more than 80%
> and increasing number of threads cause some execAsync to fail, so configs
> above are the best the client could handle. Cassandra nodes cpu is less
> than 30% in average. The data has no locality in sake of partition keys and
> i can't use createSStable mechanism. Is there any tuning which i'm missing
> in client side, cause the server side is already tuned with datastax
> recomendations.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Re: Cassandra client tuning

Posted by onmstester onmstester <on...@zoho.com>.

I'm using a queue of 100 ExecuteAsyncs * 1000 statements in in each batch = 100K insert queue in non-batch scenario.

Using more than 1000 statememnts per batch throws batch limit exception and some documents recommend no to change batch_size_limit??!

Sent using Zoho Mail

---- On Sun, 18 Mar 2018 13:14:54 +0330 Ben Slater &lt;ben.slater@instaclustr.com&gt; wrote ----

When you say batch was worth than async in terms of throughput are you comparing throughput with the same number of threads or something? I would have thought if you have much less CPU usage on the client with batching and your Cassandra cluster doesn’t sound terribly stressed then there is room to increase threads on the client to up throughput (unless your bottlenecked on IO or something)?

On Sun, 18 Mar 2018 at 20:27 onmstester onmstester &lt;onmstester@zoho.com&gt; wrote:

Ben Slater
Chief Product Officer

Read our latest technical blog posts here.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message.

Input data does not preserve good locality and I've already tested batch insert, it was worse than executeAsync in case of throughput but much less CPU usage at client side.

Sent using Zoho Mail

---- On Sun, 18 Mar 2018 12:46:02 +0330 Ben Slater &lt;ben.slater@instaclustr.com&gt; wrote ----

You will probably find grouping writes into small batches improves overall performance (if you are not doing it already). See the following presentation for some more info: https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes

Cheers

Ben

On Sun, 18 Mar 2018 at 19:23 onmstester onmstester &lt;onmstester@zoho.com&gt; wrote:

Ben Slater
Chief Product Officer

Read our latest technical blog posts here.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA).

I need to insert some millions records in seconds in Cassandra. Using one client with asyncExecute with folllowing configs:

maxConnectionsPerHost = 5

maxRequestsPerHost = 32K

maxAsyncQueue at client side = 100K

I could achieve 25% of throughtput i needed, client CPU is more than 80% and increasing number of threads cause some execAsync to fail, so configs above are the best the client could handle. Cassandra nodes cpu is less than 30% in average. The data has no locality in sake of partition keys and i can't use createSStable mechanism. Is there any tuning which i'm missing in client side, cause the server side is already tuned with datastax recomendations.

Sent using Zoho Mail

Re: Cassandra client tuning

Posted by Ben Slater <be...@instaclustr.com>.

When you say batch was worth than async in terms of throughput are you
comparing throughput with the same number of threads or something? I would
have thought if you have much less CPU usage on the client with batching
and your Cassandra cluster doesn’t sound terribly stressed then there is
room to increase threads on the client to up throughput (unless your
bottlenecked on IO or something)?

On Sun, 18 Mar 2018 at 20:27 onmstester onmstester <on...@zoho.com>
wrote:

> Input data does not preserve good locality and I've already tested batch
> insert, it was worse than executeAsync in case of throughput but much less
> CPU usage at client side.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ---- On Sun, 18 Mar 2018 12:46:02 +0330 *Ben Slater
> <ben.slater@instaclustr.com <be...@instaclustr.com>>* wrote ----
>
> You will probably find grouping writes into small batches improves overall
> performance (if you are not doing it already). See the following
> presentation for some more info:
> https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes
>
> Cheers
> Ben
>
> On Sun, 18 Mar 2018 at 19:23 onmstester onmstester <on...@zoho.com>
> wrote:
>
> --
>
>
> *Ben Slater*
> *Chief Product Officer <https://www.instaclustr.com/>*
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
>    <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> I need to insert some millions records in seconds in Cassandra. Using one
> client with asyncExecute with folllowing configs:
> maxConnectionsPerHost = 5
> maxRequestsPerHost = 32K
> maxAsyncQueue at client side = 100K
>
> I could achieve  25% of throughtput i needed, client CPU is more than 80%
> and increasing number of threads cause some execAsync to fail, so configs
> above are the best the client could handle. Cassandra nodes cpu is less
> than 30% in average. The data has no locality in sake of partition keys and
> i can't use createSStable mechanism. Is there any tuning which i'm missing
> in client side, cause the server side is already tuned with datastax
> recomendations.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Re: Cassandra client tuning

Posted by onmstester onmstester <on...@zoho.com>.

Input data does not preserve good locality and I've already tested batch insert, it was worse than executeAsync in case of throughput but much less CPU usage at client side.

Sent using Zoho Mail

---- On Sun, 18 Mar 2018 12:46:02 +0330 Ben Slater &lt;ben.slater@instaclustr.com&gt; wrote ----

Cheers

Ben

On Sun, 18 Mar 2018 at 19:23 onmstester onmstester &lt;onmstester@zoho.com&gt; wrote:

Ben Slater
Chief Product Officer

Read our latest technical blog posts here.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA).

I need to insert some millions records in seconds in Cassandra. Using one client with asyncExecute with folllowing configs:

maxConnectionsPerHost = 5

maxRequestsPerHost = 32K

maxAsyncQueue at client side = 100K

Sent using Zoho Mail

Re: Cassandra client tuning

Posted by Ben Slater <be...@instaclustr.com>.

You will probably find grouping writes into small batches improves overall
performance (if you are not doing it already). See the following
presentation for some more info:
https://www.slideshare.net/Instaclustr/microbatching-highperformance-writes

Cheers
Ben

On Sun, 18 Mar 2018 at 19:23 onmstester onmstester <on...@zoho.com>
wrote:

> I need to insert some millions records in seconds in Cassandra. Using one
> client with asyncExecute with folllowing configs:
> maxConnectionsPerHost = 5
> maxRequestsPerHost = 32K
> maxAsyncQueue at client side = 100K
>
> I could achieve  25% of throughtput i needed, client CPU is more than 80%
> and increasing number of threads cause some execAsync to fail, so configs
> above are the best the client could handle. Cassandra nodes cpu is less
> than 30% in average. The data has no locality in sake of partition keys and
> i can't use createSStable mechanism. Is there any tuning which i'm missing
> in client side, cause the server side is already tuned with datastax
> recomendations.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> --

*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.