You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Oleksandr Petrov <ol...@gmail.com> on 2013/04/18 18:43:10 UTC

Using map type with composite primary key causes significant performance decrease

Hi,

I'm trying to persist some event data, I've tried to identify the
bottleneck, and it seems to work like that:

If I create a table with primary key based on (application, environment,
type and emitted_at):

CREATE TABLE events (application varchar, environment varchar, type
varchar, additional_info map<varchar, varchar>, hostname varchar,
emitted_at timestamp,
*PRIMARY KEY (application, environment, type, emitted_at));*

And insert events via CQL, prepared statements:

INSERT INTO events (environment, application, hostname, emitted_at, type,
additional_info) VALUES (?, ?, ?, ?, ?, ?);

Values are: "local" "analytics" "noname" #inst
"2013-04-18T16:37:02.723-00:00" "event_type" {"some" "value"}

After about 1-2K inserts I get significant performance decrease.

I've tried using only emitted_at (timestamp) as a primary key, OR writing
additional_info data as a serialized JSON (varchar) instead of Map. Both
scenarios seem to solve the performance degradation.

I'm using Cassandra 1.2.3 from DataStax repository, running it on 2-core
machine with 2GB Ram.

What could I do wrong here? What may cause performance issues?..
Thank you


-- 
alex p

Re: Using map type with composite primary key causes significant performance decrease

Posted by aaron morton <aa...@thelastpickle.com>.
> Write performance decreases.
>  
Check the logs for WARN messages from the GCInspector. With 2Gb and only 2 Cores you may be seeing ParNew compaction which pauses the server. 

> Sometimes I have to wait 3-4 seconds to get a count even though there're only couple of thousand small entries in a table.
Count of columns in a row ? 
Could also be a GC issue. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/04/2013, at 9:48 AM, Oleksandr Petrov <ol...@gmail.com> wrote:

> Write performance decreases.
> 
> Reads are basically blocked, too. Sometimes I have to wait 3-4 seconds to get a count even though there're only couple of thousand small entries in a table.
> 
> 
> On Thu, Apr 18, 2013 at 8:37 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> After about 1-2K inserts I get significant performance decrease.
> 
> A decrease in performance doing what ? 
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 19/04/2013, at 4:43 AM, Oleksandr Petrov <ol...@gmail.com> wrote:
> 
>> Hi,
>> 
>> I'm trying to persist some event data, I've tried to identify the bottleneck, and it seems to work like that:
>> 
>> If I create a table with primary key based on (application, environment, type and emitted_at):
>> 
>> CREATE TABLE events (application varchar, environment varchar, type varchar, additional_info map<varchar, varchar>, hostname varchar, emitted_at timestamp, 
>> PRIMARY KEY (application, environment, type, emitted_at));
>> 
>> And insert events via CQL, prepared statements:
>> 
>> INSERT INTO events (environment, application, hostname, emitted_at, type, additional_info) VALUES (?, ?, ?, ?, ?, ?);
>> 
>> Values are: "local" "analytics" "noname" #inst "2013-04-18T16:37:02.723-00:00" "event_type" {"some" "value"}
>> 
>> After about 1-2K inserts I get significant performance decrease.
>> 
>> I've tried using only emitted_at (timestamp) as a primary key, OR writing additional_info data as a serialized JSON (varchar) instead of Map. Both scenarios seem to solve the performance degradation.
>> 
>> I'm using Cassandra 1.2.3 from DataStax repository, running it on 2-core machine with 2GB Ram.
>> 
>> What could I do wrong here? What may cause performance issues?.. 
>> Thank you
>> 
>> 
>> -- 
>> alex p
> 
> 
> 
> 
> -- 
> alex p


Re: Using map type with composite primary key causes significant performance decrease

Posted by Oleksandr Petrov <ol...@gmail.com>.
Write performance decreases.

Reads are basically blocked, too. Sometimes I have to wait 3-4 seconds to
get a count even though there're only couple of thousand small entries in a
table.


On Thu, Apr 18, 2013 at 8:37 PM, aaron morton <aa...@thelastpickle.com>wrote:

> After about 1-2K inserts I get significant performance decrease.
>
> A decrease in performance doing what ?
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/04/2013, at 4:43 AM, Oleksandr Petrov <ol...@gmail.com>
> wrote:
>
> Hi,
>
> I'm trying to persist some event data, I've tried to identify the
> bottleneck, and it seems to work like that:
>
> If I create a table with primary key based on (application, environment,
> type and emitted_at):
>
> CREATE TABLE events (application varchar, environment varchar, type
> varchar, additional_info map<varchar, varchar>, hostname varchar,
> emitted_at timestamp,
> *PRIMARY KEY (application, environment, type, emitted_at));*
>
> And insert events via CQL, prepared statements:
>
> INSERT INTO events (environment, application, hostname, emitted_at, type,
> additional_info) VALUES (?, ?, ?, ?, ?, ?);
>
> Values are: "local" "analytics" "noname" #inst "2013-04-18T16:37:02.723-00:00"
> "event_type" {"some" "value"}
>
> After about 1-2K inserts I get significant performance decrease.
>
> I've tried using only emitted_at (timestamp) as a primary key, OR writing
> additional_info data as a serialized JSON (varchar) instead of Map. Both
> scenarios seem to solve the performance degradation.
>
> I'm using Cassandra 1.2.3 from DataStax repository, running it on 2-core
> machine with 2GB Ram.
>
> What could I do wrong here? What may cause performance issues?..
> Thank you
>
>
> --
> alex p
>
>
>


-- 
alex p

Re: Using map type with composite primary key causes significant performance decrease

Posted by aaron morton <aa...@thelastpickle.com>.
> After about 1-2K inserts I get significant performance decrease.
A decrease in performance doing what ? 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/04/2013, at 4:43 AM, Oleksandr Petrov <ol...@gmail.com> wrote:

> Hi,
> 
> I'm trying to persist some event data, I've tried to identify the bottleneck, and it seems to work like that:
> 
> If I create a table with primary key based on (application, environment, type and emitted_at):
> 
> CREATE TABLE events (application varchar, environment varchar, type varchar, additional_info map<varchar, varchar>, hostname varchar, emitted_at timestamp, 
> PRIMARY KEY (application, environment, type, emitted_at));
> 
> And insert events via CQL, prepared statements:
> 
> INSERT INTO events (environment, application, hostname, emitted_at, type, additional_info) VALUES (?, ?, ?, ?, ?, ?);
> 
> Values are: "local" "analytics" "noname" #inst "2013-04-18T16:37:02.723-00:00" "event_type" {"some" "value"}
> 
> After about 1-2K inserts I get significant performance decrease.
> 
> I've tried using only emitted_at (timestamp) as a primary key, OR writing additional_info data as a serialized JSON (varchar) instead of Map. Both scenarios seem to solve the performance degradation.
> 
> I'm using Cassandra 1.2.3 from DataStax repository, running it on 2-core machine with 2GB Ram.
> 
> What could I do wrong here? What may cause performance issues?.. 
> Thank you
> 
> 
> -- 
> alex p