You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by malcolm smith <ma...@treehousesystems.com> on 2010/03/26 16:45:04 UTC

Newbie Performance Question

I've been getting a feel for the performance elements of Cassandra using
version 0.51.  I've done similar tests on HBase before, but Cassandra has
some very appealing aspects that I would like to pursue.

However I'm not seeing the what seems like the common level of performance
others are seeing.

Perf summary:

My test program inserts 100K 5 character strings with 10 bytes of value data
in a single row / column family.  The column family is raw byte sorted.

Single thread inserts - yields 277 inserts per second with ZERO consistency
level (or 3.61 milliseconds per insert)
Single thread inserts - yields 207 inserts per second with ONE consistency
level (or 4.83 milliseconds per insert)

With 5 threads (actually 5 processes running simultaneously inserting to 5
different top level key values)
Five thread inserts    - yields  94 inserts per second with ONE consistency
level (or 11 milliseconds per insert)

I see people on this maillist with 3,000 or more inserts per second so it
seems like I'm off by an order of magnitude or more.

Also a similar test on HBase with a single thread gets me 3,333 inserts per
second on the same laptop machine.


Background:  I'm running the standalone (single node) on 2 core 64-bit Dell
laptop - runs Ubuntu 9.10 / 2.6.31-20-generic with 8GB RAM and 240Gb SSD
disk drive.   (See java setup below).

Sun 6 Java VM -
-Xdebug -Xms512M -Xmx1G -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90
-XX:+AggressiveOpts
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=1 -XX:+CMSParallelRemarkEnabled
-XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8085
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dstorage-config=/etc/cassandra -Dcassandra-foreground=yes

I've used the default storage-conf.xml but changed these values:
 <FlushDataBufferSizeInMB>320</FlushDataBufferSizeInMB>
 <FlushIndexBufferSizeInMB>80</FlushIndexBufferSizeInMB>
<MemtableSizeInMB>128</MemtableSizeInMB>
<MemtableObjectCountInMillions>0.5</MemtableObjectCountInMillions>
<MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
<ConcurrentReads>8</ConcurrentReads>

This is the perl code I'm using for the test.  Note that the timestamps and
the values are pre-calculated in another loop to try to isolate the
cassandra elements from everything else.

$key = "testrow" . $procID;

$client->insert(
'Keyspace1',
$key,
Net::Cassandra::Backend::ColumnPath->new({ column_family =>
'Super1',super_column => 'test-super7', column => $i }),
$data{$i}->{'val'},
$data{$i}->{'time'},
Net::Cassandra::Backend::ConsistencyLevel::ONE
);



Thanks in advance for your help.

Re: Newbie Performance Question

Posted by malcolm smith <ma...@treehousesystems.com>.
Ok - so I guess that between 1400 and 3500 inserts per second is reasonably
good results -- we are going to continue working on our custom code but it
seems like we need a design that uses lots of row-keys and fewer column
family keys and is heavily threaded.

Thanks for your help in pointing out this utility/test harness.

On Fri, Mar 26, 2010 at 4:14 PM, Scott White <sc...@gmail.com> wrote:

> Right, that's what I meant, thanks for the correction.
>
> On Fri, Mar 26, 2010 at 1:11 PM, Brandon Williams <dr...@gmail.com>wrote:
>
>> On Fri, Mar 26, 2010 at 3:08 PM, Scott White <sc...@gmail.com>wrote:
>>
>>> Yep I believe those are inserts per second. Take the last line:
>>>
>>> "811653,1666,250"
>>>
>>> I believe that's telling you that during that 10 second interval you did
>>> 1666 inserts but your overall insert rate is 811653/250 = 3246.612
>>> inserts/sec.
>>>
>>
>> Actually it averaged 1666 inserts per second in that 10 second interval,
>> but you're correct on the average.
>>
>> -Brandon
>>
>
>

Re: Newbie Performance Question

Posted by Scott White <sc...@gmail.com>.
Right, that's what I meant, thanks for the correction.

On Fri, Mar 26, 2010 at 1:11 PM, Brandon Williams <dr...@gmail.com> wrote:

> On Fri, Mar 26, 2010 at 3:08 PM, Scott White <sc...@gmail.com> wrote:
>
>> Yep I believe those are inserts per second. Take the last line:
>>
>> "811653,1666,250"
>>
>> I believe that's telling you that during that 10 second interval you did
>> 1666 inserts but your overall insert rate is 811653/250 = 3246.612
>> inserts/sec.
>>
>
> Actually it averaged 1666 inserts per second in that 10 second interval,
> but you're correct on the average.
>
> -Brandon
>

Re: Newbie Performance Question

Posted by Brandon Williams <dr...@gmail.com>.
On Fri, Mar 26, 2010 at 3:08 PM, Scott White <sc...@gmail.com> wrote:

> Yep I believe those are inserts per second. Take the last line:
>
> "811653,1666,250"
>
> I believe that's telling you that during that 10 second interval you did
> 1666 inserts but your overall insert rate is 811653/250 = 3246.612
> inserts/sec.
>

Actually it averaged 1666 inserts per second in that 10 second interval, but
you're correct on the average.

-Brandon

Re: Newbie Performance Question

Posted by Scott White <sc...@gmail.com>.
Yep I believe those are inserts per second. Take the last line:

"811653,1666,250"

I believe that's telling you that during that 10 second interval you did
1666 inserts but your overall insert rate is 811653/250 = 3246.612
inserts/sec.

Timeouts may be due to your machine(s) being fully saturated? Not sure.

Scott

On Fri, Mar 26, 2010 at 1:00 PM, malcolm smith <
malsmith@treehousesystems.com> wrote:

> Ok I ran the stress test with out of box settings -- 50 threads and 1M row
> inserts.   It seems to get as high as 4400 ops per second and as low as 968.
>  Am I reading these correctly as inserts per second?
>
> These are results below.  But is also generates timeouts and failures in
> the python code like:
>
> Process Inserter-20:
> Traceback (most recent call last):
>   File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in
> _bootstrap
>     self.run()
>   File
> "/home/malsmith/dev/apache-cassandra-0.5.1-src/test/system/stress.py", line
> 80, in run
>     self.cclient.batch_insert('Keyspace1', key, cfmap,
> ConsistencyLevel.ONE)
>   File
> "/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
> line 583, in batch_insert
>     self.recv_batch_insert()
>   File
> "/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
> line 611, in recv_batch_insert
>     raise result.te
> TimedOutException: TimedOutException()
>     self.recv_batch_insert()
>   File
> "/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
> line 611, in recv_batch_insert
>     raise result.te
> TimedOutException: TimedOutException()
>     self.recv_batch_insert()
>   File
> "/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
> line 611, in recv_batch_insert
>     raise result.te
> TimedOutException: TimedOutException()
>
>
> total,interval_op_rate,elapsed_time
> 48318,4831,10
> 58006,968,20
> 75447,1744,30
> 118266,4281,40
> 160906,4264,50
> 191501,3059,60
> 235144,4364,70
> 270721,3557,80
> 308977,3825,90
> 353383,4440,100
> 386573,3319,110
> 411550,2497,120
> 445391,3384,130
> 476990,3159,140
> 491169,1417,150
> 512848,2167,160
> 547812,3496,170
> 583997,3618,180
> 609193,2519,190
> 653878,4468,200
> 687692,3381,210
> 711378,2368,220
> 755527,4414,230
> 794985,3945,240
> 811653,1666,250
> ~
>
>
> ~
> On Fri, Mar 26, 2010 at 12:25 PM, Brandon Williams <dr...@gmail.com>wrote:
>
>> On Fri, Mar 26, 2010 at 10:45 AM, malcolm smith <
>> malsmith@treehousesystems.com> wrote:
>>
>>> I've been getting a feel for the performance elements of Cassandra using
>>> version 0.51.  I've done similar tests on HBase before, but Cassandra has
>>> some very appealing aspects that I would like to pursue.
>>>
>>>  However I'm not seeing the what seems like the common level of
>>> performance others are seeing.
>>>
>>
>> Can you test with stress.py?  In 0.5, I think it was in test/stress.
>>
>> -Brandon
>>
>
>

Re: Newbie Performance Question

Posted by malcolm smith <ma...@treehousesystems.com>.
Ok I ran the stress test with out of box settings -- 50 threads and 1M row
inserts.   It seems to get as high as 4400 ops per second and as low as 968.
 Am I reading these correctly as inserts per second?

These are results below.  But is also generates timeouts and failures in the
python code like:

Process Inserter-20:
Traceback (most recent call last):
  File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in
_bootstrap
    self.run()
  File
"/home/malsmith/dev/apache-cassandra-0.5.1-src/test/system/stress.py", line
80, in run
    self.cclient.batch_insert('Keyspace1', key, cfmap, ConsistencyLevel.ONE)
  File
"/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
line 583, in batch_insert
    self.recv_batch_insert()
  File
"/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
line 611, in recv_batch_insert
    raise result.te
TimedOutException: TimedOutException()
    self.recv_batch_insert()
  File
"/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
line 611, in recv_batch_insert
    raise result.te
TimedOutException: TimedOutException()
    self.recv_batch_insert()
  File
"/home/malsmith/dev/apache-cassandra-0.5.1-src/interface/gen-py/cassandra/Cassandra.py",
line 611, in recv_batch_insert
    raise result.te
TimedOutException: TimedOutException()


total,interval_op_rate,elapsed_time
48318,4831,10
58006,968,20
75447,1744,30
118266,4281,40
160906,4264,50
191501,3059,60
235144,4364,70
270721,3557,80
308977,3825,90
353383,4440,100
386573,3319,110
411550,2497,120
445391,3384,130
476990,3159,140
491169,1417,150
512848,2167,160
547812,3496,170
583997,3618,180
609193,2519,190
653878,4468,200
687692,3381,210
711378,2368,220
755527,4414,230
794985,3945,240
811653,1666,250
~


~
On Fri, Mar 26, 2010 at 12:25 PM, Brandon Williams <dr...@gmail.com> wrote:

> On Fri, Mar 26, 2010 at 10:45 AM, malcolm smith <
> malsmith@treehousesystems.com> wrote:
>
>> I've been getting a feel for the performance elements of Cassandra using
>> version 0.51.  I've done similar tests on HBase before, but Cassandra has
>> some very appealing aspects that I would like to pursue.
>>
>>  However I'm not seeing the what seems like the common level of
>> performance others are seeing.
>>
>
> Can you test with stress.py?  In 0.5, I think it was in test/stress.
>
> -Brandon
>

Re: Newbie Performance Question

Posted by Brandon Williams <dr...@gmail.com>.
On Fri, Mar 26, 2010 at 10:45 AM, malcolm smith <
malsmith@treehousesystems.com> wrote:

> I've been getting a feel for the performance elements of Cassandra using
> version 0.51.  I've done similar tests on HBase before, but Cassandra has
> some very appealing aspects that I would like to pursue.
>
> However I'm not seeing the what seems like the common level of performance
> others are seeing.
>

Can you test with stress.py?  In 0.5, I think it was in test/stress.

-Brandon