You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Heath Oderman <he...@526valley.com> on 2010/04/08 22:59:39 UTC

Very new user needs some troubleshooting pointers

Hi All,

I'm brand new to Cassandra and know absolutely nothing, so please forgive me
in advance.

A friend and I have each setup a few Cassandra stand alone nodes, completely
default.

His: Mac OSX Snow Leopard
     Mac Book Pro
     Intel Duo Core
     4GB Ram
     5400 rpm disk

Mine: debian 5.x (lenny) with the deb pack from
http://www.apache.org/dist/cassandra/debian
     2  Desktops
     Intel duo core
     4GB ram
     7200 sata drives

    1 blade
     8gb ram
     10000 rpm disk
     dual xeon

    (i have a windows box too like the 2 desktops)

    (each of those machines is stand alone)


My debian boxes are brand new installs, nothing else running, purely console
environments, only SSH & Cassandra installed.

The Cassandra configs are the *default configs* with only 'ListenAddress'
and 'ThriftAddress' changed to the ext ip for those boxes.

We generated a C# library with Thrift to connect to these servers.  We wrote
a simple c# app that loops 10,000 times and does a

         _client.batch_insert(_keyspace, map.Key.GetValue(o,
null).ToString(), dict, ConsistencyLevel.ONE);

"batch_insert" I guess is the key bit up there.

The reason that I'm writing is that the batch_insert call takes 400,000
ticks every time it is called when running against the debian boxes.  Any of
them.

The result is that 10,000 inserts against his machine takes about 30
seconds, and it takes about 1 min 45 seconds against any of my servers.
 (longer against the windows 7 server.)

The MacBookPro is faster while I would expect to be slower.  (the macbook
pro is his laptop and he's running mail and all kinds of other stuff
simultaneously.)

I'm on a gigabit network, iostat / top / bmon all show that the Cassandra
server isn't working very hard.

Performance mon on my windows client show my computer running the loop is
hardly working.

I am writing to you to ask where I might go to get information on comparing
the environments, improving my performance, etc.  I've been googling all day
and haven't been able to figure anything out.

If this is the wrong forum, sorry!

Thanks for any help/suggestions you might have.
Stu

Re: Very new user needs some troubleshooting pointers

Posted by Jonathan Ellis <jb...@gmail.com>.

If you're only seeing 1-2 RPS then you should turn on debug logging to
see where the latency is.

On Fri, Apr 9, 2010 at 11:14 AM, Mark Jones <MJ...@imagehawk.com> wrote:
> Sounds like we are some experiencing the same problems. (I’m using 0.6RC1) I
> have a 3 node cluster with 8GB/machine (dual core CPU).  I’m peaking on
> inserts at about 6000-7000/second running 40 threads.  Separate spindles for
> commitlog and data…..
>
>
>
> My read speed is atrocious, 800/sec sustained (starts off at 1800+/second
> and falls back to 800/sec).  Of course that is only if I read from the
> “correct” node.  Depending on the moment, 2 of the nodes will return
> 1-2/second instead of 800, and only one node will return 800/second.  And if
> I spread the reads across many nodes, all the performance drops.   nodetool
> loadbalance can change which node is the “golden” node, but I don’t know
> why.  I have doubled the # of concurrent read threads and seen some
> performance improvement, (that was the last thing I tried, and eeked out
> another 150/second)
>
>
>
> So much about Cassandra makes we WANT it to work, I mean look at the fact
> that all nodes are essentially equal, that it replicates from rack to rack,
> from DC to DC, now, if I could just make it perform.
>
>
>
> My machines are basically idle (a large amount of IOWait, but the time is
> spent in the pending queue, vs the device svctime).  So far I’ve got little
> insight into what could be wrong, I’ve increased the key cache 10X using
> JConsole but the hit rate is still at times abysmal.
>
>
>
> I’m writing 400-800 byte blobs with an 8 byte key (supercolumn) and a 12
> byte “subkey”, then a 5 byte column name, something that would seem to be
> right up Cassandra’s alley.
>
>
>
> Right now I’m reworking my test to dump it into MySQL on the same machines,
> so I can compare the two for speed, because either I’ve got crap for
> hardware, or there is something rotten in Denmark.
>
>
>
> From: Heath Oderman [mailto:heath@526valley.com]
> Sent: Friday, April 09, 2010 10:40 AM
> To: user@cassandra.apache.org
> Subject: Re: Very new user needs some troubleshooting pointers
>
>
>
> Thanks for the reply Jonathan!
>
>
>
> I started with multi threaded tests, but when my performance was so much
> slower than my buddy's I switched to one to try to isolate and identify the
> differences.  I got tunnel vision and kept on with the one thread tests.
>
>
>
> I'll modify the tests and try again.
>
>
>
> Thanks,
>
> Stu
>
>
>
> On Fri, Apr 9, 2010 at 11:34 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>
> A single-threaded test is meaningless.  You need a multithreaded (or
> multiprocess) benchmark like the one in contrib/py_stress.
>
> Picture worth 1000 words:
> http://spyced.blogspot.com/2010/01/cassandra-05.html
>
> On Thu, Apr 8, 2010 at 3:59 PM, Heath Oderman <he...@526valley.com> wrote:
>> Hi All,
>> I'm brand new to Cassandra and know absolutely nothing, so please forgive
>> me
>> in advance.
>> A friend and I have each setup a few Cassandra stand alone nodes,
>> completely
>> default.
>> His: Mac OSX Snow Leopard
>>      Mac Book Pro
>>      Intel Duo Core
>>      4GB Ram
>>      5400 rpm disk
>> Mine: debian 5.x (lenny) with the deb pack from
>> http://www.apache.org/dist/cassandra/debian
>>      2  Desktops
>>      Intel duo core
>>      4GB ram
>>      7200 sata drives
>>     1 blade
>>      8gb ram
>>      10000 rpm disk
>>      dual xeon
>>     (i have a windows box too like the 2 desktops)
>>
>>     (each of those machines is stand alone)
>>
>> My debian boxes are brand new installs, nothing else running, purely
>> console
>> environments, only SSH & Cassandra installed.
>> The Cassandra configs are the *default configs* with only 'ListenAddress'
>> and 'ThriftAddress' changed to the ext ip for those boxes.
>> We generated a C# library with Thrift to connect to these servers.  We
>> wrote
>> a simple c# app that loops 10,000 times and does a
>>          _client.batch_insert(_keyspace, map.Key.GetValue(o,
>> null).ToString(), dict, ConsistencyLevel.ONE);
>> "batch_insert" I guess is the key bit up there.
>> The reason that I'm writing is that the batch_insert call takes 400,000
>> ticks every time it is called when running against the debian boxes.  Any
>> of
>> them.
>> The result is that 10,000 inserts against his machine takes about 30
>> seconds, and it takes about 1 min 45 seconds against any of my servers.
>>  (longer against the windows 7 server.)
>> The MacBookPro is faster while I would expect to be slower.  (the macbook
>> pro is his laptop and he's running mail and all kinds of other stuff
>> simultaneously.)
>> I'm on a gigabit network, iostat / top / bmon all show that the Cassandra
>> server isn't working very hard.
>> Performance mon on my windows client show my computer running the loop is
>> hardly working.
>> I am writing to you to ask where I might go to get information on
>> comparing
>> the environments, improving my performance, etc.  I've been googling all
>> day
>> and haven't been able to figure anything out.
>> If this is the wrong forum, sorry!
>> Thanks for any help/suggestions you might have.
>> Stu
>>
>>
>>
>>
>
>

Re: RE: Very new user needs some troubleshooting pointers

Posted by Heath Oderman <he...@526valley.com>.

Will do, thanks for the advice. :)

On Apr 9, 2010 12:28 PM, "Jonathan Ellis" <jb...@gmail.com> wrote:

The jit on debian may take longer to warm up by default.

Do 100k ops first before benchmarking.

Benchmark with multiple threads.

And use a known benchmark first like py_stress.


On Fri, Apr 9, 2010 at 11:23 AM, Heath Oderman <he...@526valley.com> wrote:
> What's interesting fo...

Re: RE: Very new user needs some troubleshooting pointers

Posted by Eric Evans <ee...@rackspace.com>.

On Fri, 2010-04-09 at 11:28 -0500, Jonathan Ellis wrote:
> The jit on debian may take longer to warm up by default.

Also, the Debian package will pull in OpenJDK by default, but there is
nothing to stop you from using the Sun JVM (which I assume is what's in
use on the other machines). It is even packaged in the non-free section
(sun-java6-jdk). If you do this, you'll have to set JAVA_HOME
(in /etc/default/cassandra) accordingly.

I'm not aware of any issues running under OpenJDK (performance or
otherwise), but if you're attempting to isolate the differences it might
be worth a try (and I'd interested in hearing about it if did).

-- 
Eric Evans
eevans@rackspace.com

Re: RE: Very new user needs some troubleshooting pointers

Posted by Jonathan Ellis <jb...@gmail.com>.

The jit on debian may take longer to warm up by default.

Do 100k ops first before benchmarking.

Benchmark with multiple threads.

And use a known benchmark first like py_stress.

On Fri, Apr 9, 2010 at 11:23 AM, Heath Oderman <he...@526valley.com> wrote:
> What's interesting for my case is that I put a timer around the thrift
> method to insert_batch
>
> Every iteration of that call against debian (any hardware, same network or
> in amazon cloud with windows machine in ec2 as well) takes 400,000 ticks.
> Super consistent.  One thread.
>
> My friends setup with cassandra on osx takes 400,000 ticks for the first
> insert, vthen drops to 20,000 ticks for every consecutive call.
>
> That's what is so strange.
>
> On Apr 9, 2010 12:15 PM, "Mark Jones" <MJ...@imagehawk.com> wrote:
>
> Sounds like we are some experiencing the same problems. (I’m using 0.6RC1) I
> have a 3 node cluster with 8GB/machine (dual core CPU).  I’m peaking on
> inserts at about 6000-7000/second running 40 threads.  Separate spindles for
> commitlog and data…..
>
>
>
> My read speed is atrocious, 800/sec sustained (starts off at 1800+/second
> and falls back to 800/sec).  Of course that is only if I read from the
> “correct” node.  Depending on the moment, 2 of the nodes will return
> 1-2/second instead of 800, and only one node will return 800/second.  And if
> I spread the reads across many nodes, all the performance drops.   nodetool
> loadbalance can change which node is the “golden” node, but I don’t know
> why.  I have doubled the # of concurrent read threads and seen some
> performance improvement, (that was the last thing I tried, and eeked out
> another 150/second)
>
>
>
> So much about Cassandra makes we WANT it to work, I mean look at the fact
> that all nodes are essentially equal, that it replicates from rack to rack,
> from DC to DC, now, if I could just make it perform.
>
>
>
> My machines are basically idle (a large amount of IOWait, but the time is
> spent in the pending queue, vs the device svctime).  So far I’ve got little
> insight into what could be wrong, I’ve increased the key cache 10X using
> JConsole but the hit rate is still at times abysmal.
>
>
>
> I’m writing 400-800 byte blobs with an 8 byte key (supercolumn) and a 12
> byte “subkey”, then a 5 byte column name, something that would seem to be
> right up Cassandra’s alley.
>
>
>
> Right now I’m reworking my test to dump it into MySQL on the same machines,
> so I can compare the two for speed, because either I’ve got crap for
> hardware, or there is something rotten in Denmark.
>
>
>
> From: Heath Oderman [mailto:heath@526valley.com]
> Sent: Friday, April 09, 2010 10:40 AM
> To: user@cassandra.apache.org
> Subject: Re: Very new user needs some troubleshooting pointers
>
>
>
> Thanks for the reply Jonathan!
>
>
>
> I started with multi threaded tests, but when my performance...

RE: RE: Very new user needs some troubleshooting pointers

Posted by Mark Jones <MJ...@imagehawk.com>.

I'm seeing an average write time of 20-30ms/insert with between the 60-67 million row point.
(I think at this point I was actually running 80 threads simultaneously, 2 40 thread clients).

From: Heath Oderman [mailto:heath@526valley.com]
Sent: Friday, April 09, 2010 11:23 AM
To: user@cassandra.apache.org
Subject: Re: RE: Very new user needs some troubleshooting pointers

What's interesting for my case is that I put a timer around the thrift method to insert_batch

Every iteration of that call against debian (any hardware, same network or in amazon cloud with windows machine in ec2 as well) takes 400,000 ticks.  Super consistent.  One thread.

My friends setup with cassandra on osx takes 400,000 ticks for the first insert, vthen drops to 20,000 ticks for every consecutive call.

That's what is so strange.
On Apr 9, 2010 12:15 PM, "Mark Jones" <MJ...@imagehawk.com>> wrote:
Sounds like we are some experiencing the same problems. (I'm using 0.6RC1) I have a 3 node cluster with 8GB/machine (dual core CPU).  I'm peaking on inserts at about 6000-7000/second running 40 threads.  Separate spindles for commitlog and data.....

My read speed is atrocious, 800/sec sustained (starts off at 1800+/second and falls back to 800/sec).  Of course that is only if I read from the "correct" node.  Depending on the moment, 2 of the nodes will return 1-2/second instead of 800, and only one node will return 800/second.  And if I spread the reads across many nodes, all the performance drops.   nodetool loadbalance can change which node is the "golden" node, but I don't know why.  I have doubled the # of concurrent read threads and seen some performance improvement, (that was the last thing I tried, and eeked out another 150/second)

So much about Cassandra makes we WANT it to work, I mean look at the fact that all nodes are essentially equal, that it replicates from rack to rack, from DC to DC, now, if I could just make it perform.

My machines are basically idle (a large amount of IOWait, but the time is spent in the pending queue, vs the device svctime).  So far I've got little insight into what could be wrong, I've increased the key cache 10X using JConsole but the hit rate is still at times abysmal.

I'm writing 400-800 byte blobs with an 8 byte key (supercolumn) and a 12 byte "subkey", then a 5 byte column name, something that would seem to be right up Cassandra's alley.

Right now I'm reworking my test to dump it into MySQL on the same machines, so I can compare the two for speed, because either I've got crap for hardware, or there is something rotten in Denmark.

From: Heath Oderman [mailto:heath@526valley.com<ma...@526valley.com>]
Sent: Friday, April 09, 2010 10:40 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Very new user needs some troubleshooting pointers

Thanks for the reply Jonathan!

I started with multi threaded tests, but when my performance...

Re: RE: Very new user needs some troubleshooting pointers

Posted by Heath Oderman <he...@526valley.com>.

What's interesting for my case is that I put a timer around the thrift
method to insert_batch

Every iteration of that call against debian (any hardware, same network or
in amazon cloud with windows machine in ec2 as well) takes 400,000 ticks.
Super consistent.  One thread.

My friends setup with cassandra on osx takes 400,000 ticks for the first
insert, vthen drops to 20,000 ticks for every consecutive call.

That's what is so strange.

On Apr 9, 2010 12:15 PM, "Mark Jones" <MJ...@imagehawk.com> wrote:

 Sounds like we are some experiencing the same problems. (I’m using 0.6RC1)
I have a 3 node cluster with 8GB/machine (dual core CPU).  I’m peaking on
inserts at about 6000-7000/second running 40 threads.  Separate spindles for
commitlog and data…..



My read speed is atrocious, 800/sec sustained (starts off at 1800+/second
and falls back to 800/sec).  Of course that is only if I read from the
“correct” node.  Depending on the moment, 2 of the nodes will return
1-2/second instead of 800, and only one node will return 800/second.  And if
I spread the reads across many nodes, all the performance drops.   nodetool
loadbalance can change which node is the “golden” node, but I don’t know
why.  I have doubled the # of concurrent read threads and seen some
performance improvement, (that was the last thing I tried, and eeked out
another 150/second)



So much about Cassandra makes we WANT it to work, I mean look at the fact
that all nodes are essentially equal, that it replicates from rack to rack,
from DC to DC, now, if I could just make it perform.



My machines are basically idle (a large amount of IOWait, but the time is
spent in the pending queue, vs the device svctime).  So far I’ve got little
insight into what could be wrong, I’ve increased the key cache 10X using
JConsole but the hit rate is still at times abysmal.



I’m writing 400-800 byte blobs with an 8 byte key (supercolumn) and a 12
byte “subkey”, then a 5 byte column name, something that would seem to be
right up Cassandra’s alley.



Right now I’m reworking my test to dump it into MySQL on the same machines,
so I can compare the two for speed, because either I’ve got crap for
hardware, or there is something rotten in Denmark.



*From:* Heath Oderman [mailto:heath@526valley.com]
*Sent:* Friday, April 09, 2010 10:40 AM
*To:* user@cassandra.apache.org
*Subject:* Re: Very new user needs some troubleshooting pointers





Thanks for the reply Jonathan!



I started with multi threaded tests, but when my performance...

RE: Very new user needs some troubleshooting pointers

Posted by Mark Jones <MJ...@imagehawk.com>.

Sounds like we are some experiencing the same problems. (I'm using 0.6RC1) I have a 3 node cluster with 8GB/machine (dual core CPU).  I'm peaking on inserts at about 6000-7000/second running 40 threads.  Separate spindles for commitlog and data.....

My read speed is atrocious, 800/sec sustained (starts off at 1800+/second and falls back to 800/sec).  Of course that is only if I read from the "correct" node.  Depending on the moment, 2 of the nodes will return 1-2/second instead of 800, and only one node will return 800/second.  And if I spread the reads across many nodes, all the performance drops.   nodetool loadbalance can change which node is the "golden" node, but I don't know why.  I have doubled the # of concurrent read threads and seen some performance improvement, (that was the last thing I tried, and eeked out another 150/second)

So much about Cassandra makes we WANT it to work, I mean look at the fact that all nodes are essentially equal, that it replicates from rack to rack, from DC to DC, now, if I could just make it perform.

My machines are basically idle (a large amount of IOWait, but the time is spent in the pending queue, vs the device svctime).  So far I've got little insight into what could be wrong, I've increased the key cache 10X using JConsole but the hit rate is still at times abysmal.

I'm writing 400-800 byte blobs with an 8 byte key (supercolumn) and a 12 byte "subkey", then a 5 byte column name, something that would seem to be right up Cassandra's alley.

Right now I'm reworking my test to dump it into MySQL on the same machines, so I can compare the two for speed, because either I've got crap for hardware, or there is something rotten in Denmark.

From: Heath Oderman [mailto:heath@526valley.com]
Sent: Friday, April 09, 2010 10:40 AM
To: user@cassandra.apache.org
Subject: Re: Very new user needs some troubleshooting pointers

Thanks for the reply Jonathan!

I started with multi threaded tests, but when my performance was so much slower than my buddy's I switched to one to try to isolate and identify the differences.  I got tunnel vision and kept on with the one thread tests.

I'll modify the tests and try again.

Thanks,
Stu

On Fri, Apr 9, 2010 at 11:34 AM, Jonathan Ellis <jb...@gmail.com>> wrote:
A single-threaded test is meaningless.  You need a multithreaded (or
multiprocess) benchmark like the one in contrib/py_stress.

Picture worth 1000 words: http://spyced.blogspot.com/2010/01/cassandra-05.html

On Thu, Apr 8, 2010 at 3:59 PM, Heath Oderman <he...@526valley.com>> wrote:
> Hi All,
> I'm brand new to Cassandra and know absolutely nothing, so please forgive me
> in advance.
> A friend and I have each setup a few Cassandra stand alone nodes, completely
> default.
> His: Mac OSX Snow Leopard
>      Mac Book Pro
>      Intel Duo Core
>      4GB Ram
>      5400 rpm disk
> Mine: debian 5.x (lenny) with the deb pack from
> http://www.apache.org/dist/cassandra/debian
>      2  Desktops
>      Intel duo core
>      4GB ram
>      7200 sata drives
>     1 blade
>      8gb ram
>      10000 rpm disk
>      dual xeon
>     (i have a windows box too like the 2 desktops)
>
>     (each of those machines is stand alone)
>
> My debian boxes are brand new installs, nothing else running, purely console
> environments, only SSH & Cassandra installed.
> The Cassandra configs are the *default configs* with only 'ListenAddress'
> and 'ThriftAddress' changed to the ext ip for those boxes.
> We generated a C# library with Thrift to connect to these servers.  We wrote
> a simple c# app that loops 10,000 times and does a
>          _client.batch_insert(_keyspace, map.Key.GetValue(o,
> null).ToString(), dict, ConsistencyLevel.ONE);
> "batch_insert" I guess is the key bit up there.
> The reason that I'm writing is that the batch_insert call takes 400,000
> ticks every time it is called when running against the debian boxes.  Any of
> them.
> The result is that 10,000 inserts against his machine takes about 30
> seconds, and it takes about 1 min 45 seconds against any of my servers.
>  (longer against the windows 7 server.)
> The MacBookPro is faster while I would expect to be slower.  (the macbook
> pro is his laptop and he's running mail and all kinds of other stuff
> simultaneously.)
> I'm on a gigabit network, iostat / top / bmon all show that the Cassandra
> server isn't working very hard.
> Performance mon on my windows client show my computer running the loop is
> hardly working.
> I am writing to you to ask where I might go to get information on comparing
> the environments, improving my performance, etc.  I've been googling all day
> and haven't been able to figure anything out.
> If this is the wrong forum, sorry!
> Thanks for any help/suggestions you might have.
> Stu
>
>
>
>

Re: Very new user needs some troubleshooting pointers

Posted by Heath Oderman <he...@526valley.com>.

Thanks for the reply Jonathan!

I started with multi threaded tests, but when my performance was so much
slower than my buddy's I switched to one to try to isolate and identify the
differences.  I got tunnel vision and kept on with the one thread tests.

I'll modify the tests and try again.

Thanks,
Stu

On Fri, Apr 9, 2010 at 11:34 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> A single-threaded test is meaningless.  You need a multithreaded (or
> multiprocess) benchmark like the one in contrib/py_stress.
>
> Picture worth 1000 words:
> http://spyced.blogspot.com/2010/01/cassandra-05.html
>
> On Thu, Apr 8, 2010 at 3:59 PM, Heath Oderman <he...@526valley.com> wrote:
> > Hi All,
> > I'm brand new to Cassandra and know absolutely nothing, so please forgive
> me
> > in advance.
> > A friend and I have each setup a few Cassandra stand alone nodes,
> completely
> > default.
> > His: Mac OSX Snow Leopard
> >      Mac Book Pro
> >      Intel Duo Core
> >      4GB Ram
> >      5400 rpm disk
> > Mine: debian 5.x (lenny) with the deb pack from
> > http://www.apache.org/dist/cassandra/debian
> >      2  Desktops
> >      Intel duo core
> >      4GB ram
> >      7200 sata drives
> >     1 blade
> >      8gb ram
> >      10000 rpm disk
> >      dual xeon
> >     (i have a windows box too like the 2 desktops)
> >
> >     (each of those machines is stand alone)
> >
> > My debian boxes are brand new installs, nothing else running, purely
> console
> > environments, only SSH & Cassandra installed.
> > The Cassandra configs are the *default configs* with only 'ListenAddress'
> > and 'ThriftAddress' changed to the ext ip for those boxes.
> > We generated a C# library with Thrift to connect to these servers.  We
> wrote
> > a simple c# app that loops 10,000 times and does a
> >          _client.batch_insert(_keyspace, map.Key.GetValue(o,
> > null).ToString(), dict, ConsistencyLevel.ONE);
> > "batch_insert" I guess is the key bit up there.
> > The reason that I'm writing is that the batch_insert call takes 400,000
> > ticks every time it is called when running against the debian boxes.  Any
> of
> > them.
> > The result is that 10,000 inserts against his machine takes about 30
> > seconds, and it takes about 1 min 45 seconds against any of my servers.
> >  (longer against the windows 7 server.)
> > The MacBookPro is faster while I would expect to be slower.  (the macbook
> > pro is his laptop and he's running mail and all kinds of other stuff
> > simultaneously.)
> > I'm on a gigabit network, iostat / top / bmon all show that the Cassandra
> > server isn't working very hard.
> > Performance mon on my windows client show my computer running the loop is
> > hardly working.
> > I am writing to you to ask where I might go to get information on
> comparing
> > the environments, improving my performance, etc.  I've been googling all
> day
> > and haven't been able to figure anything out.
> > If this is the wrong forum, sorry!
> > Thanks for any help/suggestions you might have.
> > Stu
> >
> >
> >
> >
>

Re: Very new user needs some troubleshooting pointers

Posted by Jonathan Ellis <jb...@gmail.com>.

A single-threaded test is meaningless.  You need a multithreaded (or
multiprocess) benchmark like the one in contrib/py_stress.

Picture worth 1000 words: http://spyced.blogspot.com/2010/01/cassandra-05.html

On Thu, Apr 8, 2010 at 3:59 PM, Heath Oderman <he...@526valley.com> wrote:
> Hi All,
> I'm brand new to Cassandra and know absolutely nothing, so please forgive me
> in advance.
> A friend and I have each setup a few Cassandra stand alone nodes, completely
> default.
> His: Mac OSX Snow Leopard
>      Mac Book Pro
>      Intel Duo Core
>      4GB Ram
>      5400 rpm disk
> Mine: debian 5.x (lenny) with the deb pack from
> http://www.apache.org/dist/cassandra/debian
>      2  Desktops
>      Intel duo core
>      4GB ram
>      7200 sata drives
>     1 blade
>      8gb ram
>      10000 rpm disk
>      dual xeon
>     (i have a windows box too like the 2 desktops)
>
>     (each of those machines is stand alone)
>
> My debian boxes are brand new installs, nothing else running, purely console
> environments, only SSH & Cassandra installed.
> The Cassandra configs are the *default configs* with only 'ListenAddress'
> and 'ThriftAddress' changed to the ext ip for those boxes.
> We generated a C# library with Thrift to connect to these servers.  We wrote
> a simple c# app that loops 10,000 times and does a
>          _client.batch_insert(_keyspace, map.Key.GetValue(o,
> null).ToString(), dict, ConsistencyLevel.ONE);
> "batch_insert" I guess is the key bit up there.
> The reason that I'm writing is that the batch_insert call takes 400,000
> ticks every time it is called when running against the debian boxes.  Any of
> them.
> The result is that 10,000 inserts against his machine takes about 30
> seconds, and it takes about 1 min 45 seconds against any of my servers.
>  (longer against the windows 7 server.)
> The MacBookPro is faster while I would expect to be slower.  (the macbook
> pro is his laptop and he's running mail and all kinds of other stuff
> simultaneously.)
> I'm on a gigabit network, iostat / top / bmon all show that the Cassandra
> server isn't working very hard.
> Performance mon on my windows client show my computer running the loop is
> hardly working.
> I am writing to you to ask where I might go to get information on comparing
> the environments, improving my performance, etc.  I've been googling all day
> and haven't been able to figure anything out.
> If this is the wrong forum, sorry!
> Thanks for any help/suggestions you might have.
> Stu
>
>
>
>