You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Martin Bligh (JIRA)" <ji...@apache.org> on 2014/04/29 12:59:15 UTC
[jira] [Reopened] (CASSANDRA-7103) Very poor performance with simple setup

     [ https://issues.apache.org/jira/browse/CASSANDRA-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martin Bligh reopened CASSANDRA-7103:
-------------------------------------


We're not understanding each other. I'm not updating the same row, I'm just inserting new ones, so 5417 doesn't seem to be relevant.

There are two fundamental and very serious performance problems in here:

1. There's a 70x performance degradation over time as I insert a few thousand rows, that's fixed by running "nodetool flush".

2. Scaling of parallel row inserts (just new inserts, not updates) on a single table per node is terrible. 10s per insert with 64 writers ?

If you want me to break this out into two separate bugs, thats fine, I probably should have done that to start with. 

> Very poor performance with simple setup
> ---------------------------------------
>
>                 Key: CASSANDRA-7103
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7103
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Fedora 19 (also happens on Ubuntu), Cassandra 2.0.7. dsc standard install
>            Reporter: Martin Bligh
>
> Single node (this is just development, 32GB 20 core server), single disk array.
> Create the following table:
> {code}
> CREATE TABLE reut (
>   time_order bigint,
>   time_start bigint,
>   ack_us map<int, int>,
>   gc_strategy map<text, int>,
>   gc_strategy_symbol map<text, int>,
>   gc_symbol map<text, int>,
>   ge_strategy map<text, int>,
>   ge_strategy_symbol map<text, int>,
>   ge_symbol map<text, int>,
>   go_strategy map<text, int>,
>   go_strategy_symbol map<text, int>,
>   go_symbol map<text, int>,
>   message_type map<text, int>,
>   PRIMARY KEY (time_order, time_start)
> ) WITH
>   bloom_filter_fp_chance=0.010000 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.100000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='99.0PERCENTILE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={};
> {code}
> Now I just insert data into it (using python driver, async insert, prepared insert statement). Each row only fills out one of the gc_*, go_*, or ge_* columns, and there's something like 20-100 entries per map column, occasionally 1000, but it's nothing huge. 
> First run 685 inserts in 1.004860 seconds (681.687053 Operations/s).
> OK, not great, but that's fine.
> Now throw 50,000 rows at it.
> Now run the first run again, and it takes 53s to do the same insert of 685 rows - I'm getting about 10 rows per second. 
> It's not IO bound - "iostat 1" shows quiescent for 9 seconds, then ~640KB write, then sleeps again - seems like the fflush sync.
> Run "nodetool flush" and performance goes back to as before!!!!
> Not sure why this gets so slow - I think it just builds huge commit logs and memtables, but never writes out to the data/ directory with sstables because I only have one table? That doesn't seem like a good situation. 
> Worse ... if you let the python driver just throw stuff at it async (I think this allows up to 128 request if I understand the underlying protocol, then it gets so slow that a single write takes over 10s and times out). Seems to be some sort of synchronization problem in Java ... if I limit the concurrent async requests to the left column below, I get the number of seconds elapsed on the right:
> 1: 103 seconds
> 2: 63 seconds
> 8: 53 seconds
> 16: 53 seconds
> 32: 66 seconds
> 64: so slow it explodes in timeouts on write (over 10s each).
> I guess there's some thundering herd type locking issue in whatever Java primitive you are using to lock concurrent access to a single table. I know some of the Java concurrent.* stuff has this issue. So for the other tests above, I was limiting async writes to 16 pending.



--
This message was sent by Atlassian JIRA
(v6.2#6252)