You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Martin Probst (RobHost Support)" <su...@robhost.de> on 2010/03/18 19:03:42 UTC

write performance thrift interfaces

Hi,

we've tested the write performance on a single and dual node cluster and the results are strangely poor. We've got about 30 inserts per second which seems a little bit slow?! The strange about is, that the node's we've used (single-cpu, 3gb ram, single disk) got a load of 0.02-0.05 while the inserts run. There is no disk/network bottlenecks, no swapping. The insert-script reads some values from stdin and inserts this values into cassandra. So it seems thats the used thrift interface are the bootleneck (we've tested php, perl, java, results a nearly the same). We've used version 0.5.1 and the default config with adjusted ColumnFamilies.

Are there some hints or suggestions?

Cheers,
Martin

Re: write performance thrift interfaces

Posted by "Martin Probst (RobHost Support)" <su...@robhost.de>.
How did you mean that, are there some config adjustments, or did you mean the inserting client?

Martin

Am 18.03.2010 um 19:18 schrieb Jonathan Ellis:

> Perhaps you're only inserting with a single thread?
> 
> On Thu, Mar 18, 2010 at 1:03 PM, Martin Probst (RobHost Support)
> <su...@robhost.de> wrote:
>> Hi,
>> 
>> we've tested the write performance on a single and dual node cluster and the results are strangely poor. We've got about 30 inserts per second which seems a little bit slow?! The strange about is, that the node's we've used (single-cpu, 3gb ram, single disk) got a load of 0.02-0.05 while the inserts run. There is no disk/network bottlenecks, no swapping. The insert-script reads some values from stdin and inserts this values into cassandra. So it seems thats the used thrift interface are the bootleneck (we've tested php, perl, java, results a nearly the same). We've used version 0.5.1 and the default config with adjusted ColumnFamilies.
>> 
>> Are there some hints or suggestions?
>> 
>> Cheers,
>> Martin


Re: write performance thrift interfaces

Posted by Jonathan Ellis <jb...@gmail.com>.
Perhaps you're only inserting with a single thread?

On Thu, Mar 18, 2010 at 1:03 PM, Martin Probst (RobHost Support)
<su...@robhost.de> wrote:
> Hi,
>
> we've tested the write performance on a single and dual node cluster and the results are strangely poor. We've got about 30 inserts per second which seems a little bit slow?! The strange about is, that the node's we've used (single-cpu, 3gb ram, single disk) got a load of 0.02-0.05 while the inserts run. There is no disk/network bottlenecks, no swapping. The insert-script reads some values from stdin and inserts this values into cassandra. So it seems thats the used thrift interface are the bootleneck (we've tested php, perl, java, results a nearly the same). We've used version 0.5.1 and the default config with adjusted ColumnFamilies.
>
> Are there some hints or suggestions?
>
> Cheers,
> Martin

Re: write performance thrift interfaces

Posted by "Martin Probst (RobHost Support)" <su...@robhost.de>.
Hi Brandon,

i've recoded my client (using threads). Now i'am getting round about 240 inserts per second (i think the bottleneck is know the virtualized hardware --> single cpu). The stress.py script gives about 50 inserts/sec.

I'll test cassandra on real hw to see if it's perform better under a multicore system next week, so thanks for your help.

Cheers,
Martin

Am 18.03.2010 um 19:24 schrieb Brandon Williams:

> On Thu, Mar 18, 2010 at 1:22 PM, Martin Probst (RobHost Support) <su...@robhost.de> wrote:
> Hi Tom,
> 
> no we're not using a connection pool, only pure java on cmd.
> 
> Cheers,
> Martin
> 
> 
> The second graph here is relevant: http://spyced.blogspot.com/2010/01/cassandra-05.html
> 
> Rather than create your own benchmark, I recommend using contrib/py_stress/stress.py which is designed for this.
> 
> -Brandon


Re: write performance thrift interfaces

Posted by Brandon Williams <dr...@gmail.com>.
On Thu, Mar 18, 2010 at 1:22 PM, Martin Probst (RobHost Support) <
support@robhost.de> wrote:

> Hi Tom,
>
> no we're not using a connection pool, only pure java on cmd.
>
> Cheers,
> Martin
>
>
The second graph here is relevant:
http://spyced.blogspot.com/2010/01/cassandra-05.html

Rather than create your own benchmark, I recommend using
contrib/py_stress/stress.py which is designed for this.

-Brandon

Re: write performance thrift interfaces

Posted by "Martin Probst (RobHost Support)" <su...@robhost.de>.
Hi Tom,

no we're not using a connection pool, only pure java on cmd.

Cheers,
Martin

Am 18.03.2010 um 19:18 schrieb Tom Chen:

> Hi Martin,
> 
> Are you using a connection pool? I have been able to get about a 1000+ inserts with java code on one cassandra node with small values(100 bytes). 
> 
> Tom
> 
> 
> On Thu, Mar 18, 2010 at 11:08 AM, Roger Schildmeijer <sc...@gmail.com> wrote:
> Yes, 30 writes / s sounds a little bit poor.
> 
> Maybe you could show your benchmark code? And what adjustments had to be done to the CF?
> 
> // Roger
> 
> 
> On 18 mar 2010, at 19.03em, Martin Probst (RobHost Support) wrote:
> 
> > Hi,
> >
> > we've tested the write performance on a single and dual node cluster and the results are strangely poor. We've got about 30 inserts per second which seems a little bit slow?! The strange about is, that the node's we've used (single-cpu, 3gb ram, single disk) got a load of 0.02-0.05 while the inserts run. There is no disk/network bottlenecks, no swapping. The insert-script reads some values from stdin and inserts this values into cassandra. So it seems thats the used thrift interface are the bootleneck (we've tested php, perl, java, results a nearly the same). We've used version 0.5.1 and the default config with adjusted ColumnFamilies.
> >
> > Are there some hints or suggestions?
> >
> > Cheers,
> > Martin
> 
> 
> 
> 
> -- 
> Tom Chen
> Software Architect
> GOGII, Inc
> tom@gogii.net
> 650-468-6318


Re: write performance thrift interfaces

Posted by Tom Chen <to...@gogii.net>.
Hi Martin,

Are you using a connection pool? I have been able to get about a 1000+
inserts with java code on one cassandra node with small values(100 bytes).

Tom


On Thu, Mar 18, 2010 at 11:08 AM, Roger Schildmeijer <schildmeijer@gmail.com
> wrote:

> Yes, 30 writes / s sounds a little bit poor.
>
> Maybe you could show your benchmark code? And what adjustments had to be
> done to the CF?
>
> // Roger
>
>
> On 18 mar 2010, at 19.03em, Martin Probst (RobHost Support) wrote:
>
> > Hi,
> >
> > we've tested the write performance on a single and dual node cluster and
> the results are strangely poor. We've got about 30 inserts per second which
> seems a little bit slow?! The strange about is, that the node's we've used
> (single-cpu, 3gb ram, single disk) got a load of 0.02-0.05 while the inserts
> run. There is no disk/network bottlenecks, no swapping. The insert-script
> reads some values from stdin and inserts this values into cassandra. So it
> seems thats the used thrift interface are the bootleneck (we've tested php,
> perl, java, results a nearly the same). We've used version 0.5.1 and the
> default config with adjusted ColumnFamilies.
> >
> > Are there some hints or suggestions?
> >
> > Cheers,
> > Martin
>
>


-- 
Tom Chen
Software Architect
GOGII, Inc
tom@gogii.net
650-468-6318

Re: write performance thrift interfaces

Posted by "Martin Probst (RobHost Support)" <su...@robhost.de>.
Hi Roger,

we've only adjusted the names for the keyspaces and the columnfamilies. This is the second perl benchmark code, which switches the node after 100 datasets:

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper qw( Dumper );

use Net::Cassandra;

my $host1 = "localhost";
my $host2 = "10.11.12.1";
my $keyspace = "TestCassandra";
my $column = "Customer";
$| = 1;

my $cassandra1 = Net::Cassandra->new( hostname => $host1 );
my $client1    = $cassandra1->client;
my $cassandra2 = Net::Cassandra->new( hostname => $host2 );
my $client2    = $cassandra2->client;

my @stuff;
my $i = 0;
my $c_level = Net::Cassandra::Backend::ConsistencyLevel::ZERO;
while( my $line = <STDIN> )
{

    @stuff = split( /\|/, $line );
    my $ts = time();
    my $client = ( ( $i % 100 ) == 0 ) ? $client2 : $client1;
    
    my $c1 = new Net::Cassandra::Backend::Column( { name => 'NAME', value => $stuff[1], timestamp => $ts } );
    my $c2 = new Net::Cassandra::Backend::Column( { name => 'ADDRESS', value => $stuff[2], timestamp => $ts } );
    my $c3 = new Net::Cassandra::Backend::Column( { name => 'MAIL', value => $stuff[3], timestamp => $ts } );
    my $c4 = new Net::Cassandra::Backend::Column( { name => 'CONTACT', value => $stuff[4], timestamp => $ts } );
    my $c5 = new Net::Cassandra::Backend::Column( { name => 'TEL', value => $stuff[5], timestamp => $ts } );
    my $c6 = new Net::Cassandra::Backend::Column( { name => 'WEB', value => $stuff[6], timestamp => $ts } );

    my $cs1 = new  Net::Cassandra::Backend::ColumnOrSuperColumn( { column => $c1 } );
    my $cs2 = new  Net::Cassandra::Backend::ColumnOrSuperColumn( { column => $c2 } );
    my $cs3 = new  Net::Cassandra::Backend::ColumnOrSuperColumn( { column => $c3 } );
    my $cs4 = new  Net::Cassandra::Backend::ColumnOrSuperColumn( { column => $c4 } );
    my $cs5 = new  Net::Cassandra::Backend::ColumnOrSuperColumn( { column => $c5 } );
    my $cs6 = new  Net::Cassandra::Backend::ColumnOrSuperColumn( { column => $c6 } );
    
    my %mutation = ( $column => [$cs1, $cs2, $cs3, $cs4, $cs5, $cs6]);

    $client->batch_insert(
        $keyspace,
        $stuff[0],
        \%mutation,
        $c_level,
    );

$i++;
print "insert element $i\r";

}

exit(0);

Thanks in advance,
Martin

Am 18.03.2010 um 19:08 schrieb Roger Schildmeijer:

> Yes, 30 writes / s sounds a little bit poor. 
> 
> Maybe you could show your benchmark code? And what adjustments had to be done to the CF?
> 
> // Roger
> 
> 
> On 18 mar 2010, at 19.03em, Martin Probst (RobHost Support) wrote:
> 
>> Hi,
>> 
>> we've tested the write performance on a single and dual node cluster and the results are strangely poor. We've got about 30 inserts per second which seems a little bit slow?! The strange about is, that the node's we've used (single-cpu, 3gb ram, single disk) got a load of 0.02-0.05 while the inserts run. There is no disk/network bottlenecks, no swapping. The insert-script reads some values from stdin and inserts this values into cassandra. So it seems thats the used thrift interface are the bootleneck (we've tested php, perl, java, results a nearly the same). We've used version 0.5.1 and the default config with adjusted ColumnFamilies.
>> 
>> Are there some hints or suggestions?
>> 
>> Cheers,
>> Martin
> 


Re: write performance thrift interfaces

Posted by Roger Schildmeijer <sc...@gmail.com>.
Yes, 30 writes / s sounds a little bit poor. 

Maybe you could show your benchmark code? And what adjustments had to be done to the CF?

// Roger


On 18 mar 2010, at 19.03em, Martin Probst (RobHost Support) wrote:

> Hi,
> 
> we've tested the write performance on a single and dual node cluster and the results are strangely poor. We've got about 30 inserts per second which seems a little bit slow?! The strange about is, that the node's we've used (single-cpu, 3gb ram, single disk) got a load of 0.02-0.05 while the inserts run. There is no disk/network bottlenecks, no swapping. The insert-script reads some values from stdin and inserts this values into cassandra. So it seems thats the used thrift interface are the bootleneck (we've tested php, perl, java, results a nearly the same). We've used version 0.5.1 and the default config with adjusted ColumnFamilies.
> 
> Are there some hints or suggestions?
> 
> Cheers,
> Martin