You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by malcolm smith <ma...@treehousesystems.com> on 2010/03/26 16:45:04 UTC
Newbie Performance Question
I've been getting a feel for the performance elements of Cassandra using
version 0.51. I've done similar tests on HBase before, but Cassandra has
some very appealing aspects that I would like to pursue.
However I'm not seeing the what seems like the common level of performance
others are seeing.
Perf summary:
My test program inserts 100K 5 character strings with 10 bytes of value data
in a single row / column family. The column family is raw byte sorted.
Single thread inserts - yields 277 inserts per second with ZERO consistency
level (or 3.61 milliseconds per insert)
Single thread inserts - yields 207 inserts per second with ONE consistency
level (or 4.83 milliseconds per insert)
With 5 threads (actually 5 processes running simultaneously inserting to 5
different top level key values)
Five thread inserts - yields 94 inserts per second with ONE consistency
level (or 11 milliseconds per insert)
I see people on this maillist with 3,000 or more inserts per second so it
seems like I'm off by an order of magnitude or more.
Also a similar test on HBase with a single thread gets me 3,333 inserts per
second on the same laptop machine.
Background: I'm running the standalone (single node) on 2 core 64-bit Dell
laptop - runs Ubuntu 9.10 / 2.6.31-20-generic with 8GB RAM and 240Gb SSD
disk drive. (See java setup below).
Sun 6 Java VM -
-Xdebug -Xms512M -Xmx1G -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90
-XX:+AggressiveOpts
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=1 -XX:+CMSParallelRemarkEnabled
-XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8085
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dstorage-config=/etc/cassandra -Dcassandra-foreground=yes
I've used the default storage-conf.xml but changed these values:
<FlushDataBufferSizeInMB>320</FlushDataBufferSizeInMB>
<FlushIndexBufferSizeInMB>80</FlushIndexBufferSizeInMB>
<MemtableSizeInMB>128</MemtableSizeInMB>
<MemtableObjectCountInMillions>0.5</MemtableObjectCountInMillions>
<MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
<ConcurrentReads>8</ConcurrentReads>
This is the perl code I'm using for the test. Note that the timestamps and
the values are pre-calculated in another loop to try to isolate the
cassandra elements from everything else.
$key = "testrow" . $procID;
$client->insert(
'Keyspace1',
$key,
Net::Cassandra::Backend::ColumnPath->new({ column_family =>
'Super1',super_column => 'test-super7', column => $i }),
$data{$i}->{'val'},
$data{$i}->{'time'},
Net::Cassandra::Backend::ConsistencyLevel::ONE
);
Thanks in advance for your help.
Re: Newbie Performance Question
Posted by malcolm smith <ma...@treehousesystems.com>.
Ok - so I guess that between 1400 and 3500 inserts per second is reasonably
good results -- we are going to continue working on our custom code but it
seems like we need a design that uses lots of row-keys and fewer column
family keys and is heavily threaded.
Thanks for your help in pointing out this utility/test harness.
On Fri, Mar 26, 2010 at 4:14 PM, Scott White <sc...@gmail.com> wrote:
> Right, that's what I meant, thanks for the correction.
>
> On Fri, Mar 26, 2010 at 1:11 PM, Brandon Williams <dr...@gmail.com>wrote:
>
>> On Fri, Mar 26, 2010 at 3:08 PM, Scott White <sc...@gmail.com>wrote:
>>
>>> Yep I believe those are inserts per second. Take the last line:
>>>
>>> "811653,1666,250"
>>>
>>> I believe that's telling you that during that 10 second interval you did
>>> 1666 inserts but your overall insert rate is 811653/250 = 3246.612
>>> inserts/sec.
>>>
>>
>> Actually it averaged 1666 inserts per second in that 10 second interval,
>> but you're correct on the average.
>>
>> -Brandon
>>
>
>
Re: Newbie Performance Question
Posted by Scott White <sc...@gmail.com>.
Right, that's what I meant, thanks for the correction.
On Fri, Mar 26, 2010 at 1:11 PM, Brandon Williams <dr...@gmail.com> wrote:
> On Fri, Mar 26, 2010 at 3:08 PM, Scott White <sc...@gmail.com> wrote:
>
>> Yep I believe those are inserts per second. Take the last line:
>>
>> "811653,1666,250"
>>
>> I believe that's telling you that during that 10 second interval you did
>> 1666 inserts but your overall insert rate is 811653/250 = 3246.612
>> inserts/sec.
>>
>
> Actually it averaged 1666 inserts per second in that 10 second interval,
> but you're correct on the average.
>
> -Brandon
>
Re: Newbie Performance Question
Posted by Brandon Williams <dr...@gmail.com>.
On Fri, Mar 26, 2010 at 3:08 PM, Scott White <sc...@gmail.com> wrote:
> Yep I believe those are inserts per second. Take the last line:
>
> "811653,1666,250"
>
> I believe that's telling you that during that 10 second interval you did
> 1666 inserts but your overall insert rate is 811653/250 = 3246.612
> inserts/sec.
>
Actually it averaged 1666 inserts per second in that 10 second interval, but
you're correct on the average.
-Brandon
Re: Newbie Performance Question
Posted by Scott White <sc...@gmail.com>.
Yep I believe those are inserts per second. Take the last line:
"811653,1666,250"
I believe that's telling you that during that 10 second interval you did
1666 inserts but your overall insert rate is 811653/250 = 3246.612
inserts/sec.
Timeouts may be due to your machine(s) being fully saturated? Not sure.
Scott
On Fri, Mar 26, 2010 at 1:00 PM, malcolm smith <
malsmith@treehousesystems.com> wrote:
> Ok I ran the stress test with out of box settings -- 50 threads and 1M row
> inserts. It seems to get as high as 4400 ops per second and as low as 968.
> Am I reading these correctly as inserts per second?
>
> These are results below. But is also generates timeouts and failures in
> the python code like:
>
> Process Inserter-20:
> Traceback (most recent call last):
> File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in
> _bootstrap
> self.run()
> File
> "/home/malsmith/dev/apache-cassandra-0.5.1-src/test/system/stress.py", line
> 80, in run
> self.cclient.batch_insert('Keyspace1', key, cfmap,
> ConsistencyLevel.ONE)
> File
> "/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
> line 583, in batch_insert
> self.recv_batch_insert()
> File
> "/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
> line 611, in recv_batch_insert
> raise result.te
> TimedOutException: TimedOutException()
> self.recv_batch_insert()
> File
> "/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
> line 611, in recv_batch_insert
> raise result.te
> TimedOutException: TimedOutException()
> self.recv_batch_insert()
> File
> "/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
> line 611, in recv_batch_insert
> raise result.te
> TimedOutException: TimedOutException()
>
>
> total,interval_op_rate,elapsed_time
> 48318,4831,10
> 58006,968,20
> 75447,1744,30
> 118266,4281,40
> 160906,4264,50
> 191501,3059,60
> 235144,4364,70
> 270721,3557,80
> 308977,3825,90
> 353383,4440,100
> 386573,3319,110
> 411550,2497,120
> 445391,3384,130
> 476990,3159,140
> 491169,1417,150
> 512848,2167,160
> 547812,3496,170
> 583997,3618,180
> 609193,2519,190
> 653878,4468,200
> 687692,3381,210
> 711378,2368,220
> 755527,4414,230
> 794985,3945,240
> 811653,1666,250
> ~
>
>
> ~
> On Fri, Mar 26, 2010 at 12:25 PM, Brandon Williams <dr...@gmail.com>wrote:
>
>> On Fri, Mar 26, 2010 at 10:45 AM, malcolm smith <
>> malsmith@treehousesystems.com> wrote:
>>
>>> I've been getting a feel for the performance elements of Cassandra using
>>> version 0.51. I've done similar tests on HBase before, but Cassandra has
>>> some very appealing aspects that I would like to pursue.
>>>
>>> However I'm not seeing the what seems like the common level of
>>> performance others are seeing.
>>>
>>
>> Can you test with stress.py? In 0.5, I think it was in test/stress.
>>
>> -Brandon
>>
>
>
Re: Newbie Performance Question
Posted by malcolm smith <ma...@treehousesystems.com>.
Ok I ran the stress test with out of box settings -- 50 threads and 1M row
inserts. It seems to get as high as 4400 ops per second and as low as 968.
Am I reading these correctly as inserts per second?
These are results below. But is also generates timeouts and failures in the
python code like:
Process Inserter-20:
Traceback (most recent call last):
File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in
_bootstrap
self.run()
File
"/home/malsmith/dev/apache-cassandra-0.5.1-src/test/system/stress.py", line
80, in run
self.cclient.batch_insert('Keyspace1', key, cfmap, ConsistencyLevel.ONE)
File
"/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
line 583, in batch_insert
self.recv_batch_insert()
File
"/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
line 611, in recv_batch_insert
raise result.te
TimedOutException: TimedOutException()
self.recv_batch_insert()
File
"/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
line 611, in recv_batch_insert
raise result.te
TimedOutException: TimedOutException()
self.recv_batch_insert()
File
"/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
line 611, in recv_batch_insert
raise result.te
TimedOutException: TimedOutException()
total,interval_op_rate,elapsed_time
48318,4831,10
58006,968,20
75447,1744,30
118266,4281,40
160906,4264,50
191501,3059,60
235144,4364,70
270721,3557,80
308977,3825,90
353383,4440,100
386573,3319,110
411550,2497,120
445391,3384,130
476990,3159,140
491169,1417,150
512848,2167,160
547812,3496,170
583997,3618,180
609193,2519,190
653878,4468,200
687692,3381,210
711378,2368,220
755527,4414,230
794985,3945,240
811653,1666,250
~
~
On Fri, Mar 26, 2010 at 12:25 PM, Brandon Williams <dr...@gmail.com> wrote:
> On Fri, Mar 26, 2010 at 10:45 AM, malcolm smith <
> malsmith@treehousesystems.com> wrote:
>
>> I've been getting a feel for the performance elements of Cassandra using
>> version 0.51. I've done similar tests on HBase before, but Cassandra has
>> some very appealing aspects that I would like to pursue.
>>
>> However I'm not seeing the what seems like the common level of
>> performance others are seeing.
>>
>
> Can you test with stress.py? In 0.5, I think it was in test/stress.
>
> -Brandon
>
Re: Newbie Performance Question
Posted by Brandon Williams <dr...@gmail.com>.
On Fri, Mar 26, 2010 at 10:45 AM, malcolm smith <
malsmith@treehousesystems.com> wrote:
> I've been getting a feel for the performance elements of Cassandra using
> version 0.51. I've done similar tests on HBase before, but Cassandra has
> some very appealing aspects that I would like to pursue.
>
> However I'm not seeing the what seems like the common level of performance
> others are seeing.
>
Can you test with stress.py? In 0.5, I think it was in test/stress.
-Brandon