You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jeff Williams <je...@wherethebitsroam.com> on 2012/04/03 13:08:16 UTC

Write performance compared to Postgresql

Hi,

I am looking at cassandra for a logging application. We currently log to a Postgresql database.

I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?

jeff@transcoder01:~$ ruby cassandra-bm.rb 
cassandra
  3.170000   0.480000   3.650000 ( 12.032212)
jeff@transcoder01:~$ ruby cassandra-bm.rb 
postgres
  2.140000   0.330000   2.470000 (  7.002601)

Regards,
Jeff


RE: Write performance compared to Postgresql

Posted by Jeremiah Jordan <JE...@morningstar.com>.
So Cassandra may or may not be faster than your current system when you have a couple connections.  Where it is faster, and scales, is when you get hundreds of clients across many nodes.

See:
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

With 60 clients running 200 threads each they were able to get 10K writes per second per server, and as you added servers from 48-288 you still got 10K writes per second, so the aggregate writes per second went from 48*10K to 288*10K

-Jeremiah

________________________________________
From: Jeff Williams [jeffw@wherethebitsroam.com]
Sent: Tuesday, April 03, 2012 10:09 AM
To: user@cassandra.apache.org
Subject: Re: Write performance compared to Postgresql

Vitalii,

Yep, that sounds like a good idea. Do you have any more information about how you're doing that? Which client?

Because even with 3 concurrent client nodes, my single postgresql server is still out performing my 2 node cassandra cluster, although the gap is narrowing.

Jeff

On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:

> Note that having tons of TCP connections is not good. We are using async client to issue multiple calls over single connection at same time. You can do the same.
>
> Best regards, Vitalii Tymchyshyn.
>
> 03.04.12 16:18, Jeff Williams написав(ла):
>> Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!
>>
>> Jeff
>>
>> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
>>
>>> Hi Jeff,
>>>
>>> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
>>>
>>> Jake
>>>
>>>
>>>
>>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com>  wrote:
>>>
>>>> Hi,
>>>>
>>>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>>>>
>>>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>>>>
>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>> cassandra
>>>> 3.170000   0.480000   3.650000 ( 12.032212)
>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>> postgres
>>>> 2.140000   0.330000   2.470000 (  7.002601)
>>>>
>>>> Regards,
>>>> Jeff
>>>>
>>>> <cassandra-bm.rb>
>


Re: Write performance compared to Postgresql

Posted by Віталій Тимчишин <ti...@gmail.com>.
Hello.

We are using java async thrift client.
As of ruby, it seems you need to use something like
http://www.mikeperham.com/2010/02/09/cassandra-and-eventmachine/
(Not sure as I know nothing about ruby).

Best regards, Vitalii Tymchyshyn


2012/4/3 Jeff Williams <je...@wherethebitsroam.com>

> Vitalii,
>
> Yep, that sounds like a good idea. Do you have any more information about
> how you're doing that? Which client?
>
> Because even with 3 concurrent client nodes, my single postgresql server
> is still out performing my 2 node cassandra cluster, although the gap is
> narrowing.
>
> Jeff
>
> On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:
>
> > Note that having tons of TCP connections is not good. We are using async
> client to issue multiple calls over single connection at same time. You can
> do the same.
> >
> > Best regards, Vitalii Tymchyshyn.
> >
> > 03.04.12 16:18, Jeff Williams написав(ла):
> >> Ok, so you think the write speed is limited by the client and protocol,
> rather than the cassandra backend? This sounds reasonable, and fits with
> our use case, as we will have several servers writing. However, a bit
> harder to test!
> >>
> >> Jeff
> >>
> >> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
> >>
> >>> Hi Jeff,
> >>>
> >>> Writing serially over one connection will be slower. If you run many
> threads hitting the server at once you will see throughput improve.
> >>>
> >>> Jake
> >>>
> >>>
> >>>
> >>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com>
>  wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I am looking at cassandra for a logging application. We currently log
> to a Postgresql database.
> >>>>
> >>>> I set up 2 cassandra servers for testing. I did a benchmark where I
> had 100 hashes representing logs entries, read from a json file. I then
> looped over these to do 10,000 log inserts. I repeated the same writing to
> a postgresql instance on one of the cassandra servers. The script is
> attached. The cassandra writes appear to perform a lot worse. Is this
> expected?
> >>>>
> >>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
> >>>> cassandra
> >>>> 3.170000   0.480000   3.650000 ( 12.032212)
> >>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
> >>>> postgres
> >>>> 2.140000   0.330000   2.470000 (  7.002601)
> >>>>
> >>>> Regards,
> >>>> Jeff
> >>>>
> >>>> <cassandra-bm.rb>
> >
>
>


-- 
Best regards,
 Vitalii Tymchyshyn

Re: Write performance compared to Postgresql

Posted by Jeff Williams <je...@wherethebitsroam.com>.
Just to follow this up, I repeated the test with a multi-threaded java (Hector) client and was able to get much better performance - 10,000 rows in just over a second. So it looks like the client latency was the killer and I have since read that the ruby thrift implementation is not the fastest.

On Apr 4, 2012, at 9:11 AM, Jeff Williams wrote:

> On three machines on the same subnet as the two cassandra nodes.
> 
> On Apr 3, 2012, at 6:40 PM, Collard, David L (Dave) wrote:
> 
>> Where is your client running?
>> 
>> -----Original Message-----
>> From: Jeff Williams [mailto:jeffw@wherethebitsroam.com] 
>> Sent: Tuesday, April 03, 2012 11:09 AM
>> To: user@cassandra.apache.org
>> Subject: Re: Write performance compared to Postgresql
>> 
>> Vitalii,
>> 
>> Yep, that sounds like a good idea. Do you have any more information about how you're doing that? Which client?
>> 
>> Because even with 3 concurrent client nodes, my single postgresql server is still out performing my 2 node cassandra cluster, although the gap is narrowing.
>> 
>> Jeff
>> 
>> On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:
>> 
>>> Note that having tons of TCP connections is not good. We are using async client to issue multiple calls over single connection at same time. You can do the same.
>>> 
>>> Best regards, Vitalii Tymchyshyn.
>>> 
>>> 03.04.12 16:18, Jeff Williams написав(ла):
>>>> Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!
>>>> 
>>>> Jeff
>>>> 
>>>> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
>>>> 
>>>>> Hi Jeff,
>>>>> 
>>>>> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
>>>>> 
>>>>> Jake
>>>>> 
>>>>> 
>>>>> 
>>>>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com>  wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>>>>>> 
>>>>>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>>>>>> 
>>>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>>>> cassandra
>>>>>> 3.170000   0.480000   3.650000 ( 12.032212)
>>>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>>>> postgres
>>>>>> 2.140000   0.330000   2.470000 (  7.002601)
>>>>>> 
>>>>>> Regards,
>>>>>> Jeff
>>>>>> 
>>>>>> <cassandra-bm.rb>
>>> 
>> 
> 


Re: Write performance compared to Postgresql

Posted by Jeff Williams <je...@wherethebitsroam.com>.
On three machines on the same subnet as the two cassandra nodes.

On Apr 3, 2012, at 6:40 PM, Collard, David L (Dave) wrote:

> Where is your client running?
> 
> -----Original Message-----
> From: Jeff Williams [mailto:jeffw@wherethebitsroam.com] 
> Sent: Tuesday, April 03, 2012 11:09 AM
> To: user@cassandra.apache.org
> Subject: Re: Write performance compared to Postgresql
> 
> Vitalii,
> 
> Yep, that sounds like a good idea. Do you have any more information about how you're doing that? Which client?
> 
> Because even with 3 concurrent client nodes, my single postgresql server is still out performing my 2 node cassandra cluster, although the gap is narrowing.
> 
> Jeff
> 
> On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:
> 
>> Note that having tons of TCP connections is not good. We are using async client to issue multiple calls over single connection at same time. You can do the same.
>> 
>> Best regards, Vitalii Tymchyshyn.
>> 
>> 03.04.12 16:18, Jeff Williams написав(ла):
>>> Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!
>>> 
>>> Jeff
>>> 
>>> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
>>> 
>>>> Hi Jeff,
>>>> 
>>>> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
>>>> 
>>>> Jake
>>>> 
>>>> 
>>>> 
>>>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com>  wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>>>>> 
>>>>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>>>>> 
>>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>>> cassandra
>>>>> 3.170000   0.480000   3.650000 ( 12.032212)
>>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>>> postgres
>>>>> 2.140000   0.330000   2.470000 (  7.002601)
>>>>> 
>>>>> Regards,
>>>>> Jeff
>>>>> 
>>>>> <cassandra-bm.rb>
>> 
> 


RE: Write performance compared to Postgresql

Posted by "Collard, David L (Dave)" <da...@alcatel-lucent.com>.
Where is your client running?

-----Original Message-----
From: Jeff Williams [mailto:jeffw@wherethebitsroam.com] 
Sent: Tuesday, April 03, 2012 11:09 AM
To: user@cassandra.apache.org
Subject: Re: Write performance compared to Postgresql

Vitalii,

Yep, that sounds like a good idea. Do you have any more information about how you're doing that? Which client?

Because even with 3 concurrent client nodes, my single postgresql server is still out performing my 2 node cassandra cluster, although the gap is narrowing.

Jeff

On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:

> Note that having tons of TCP connections is not good. We are using async client to issue multiple calls over single connection at same time. You can do the same.
> 
> Best regards, Vitalii Tymchyshyn.
> 
> 03.04.12 16:18, Jeff Williams написав(ла):
>> Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!
>> 
>> Jeff
>> 
>> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
>> 
>>> Hi Jeff,
>>> 
>>> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
>>> 
>>> Jake
>>> 
>>> 
>>> 
>>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com>  wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>>>> 
>>>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>>>> 
>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>> cassandra
>>>> 3.170000   0.480000   3.650000 ( 12.032212)
>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>> postgres
>>>> 2.140000   0.330000   2.470000 (  7.002601)
>>>> 
>>>> Regards,
>>>> Jeff
>>>> 
>>>> <cassandra-bm.rb>
> 


Re: Write performance compared to Postgresql

Posted by Jeff Williams <je...@wherethebitsroam.com>.
Vitalii,

Yep, that sounds like a good idea. Do you have any more information about how you're doing that? Which client?

Because even with 3 concurrent client nodes, my single postgresql server is still out performing my 2 node cassandra cluster, although the gap is narrowing.

Jeff

On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:

> Note that having tons of TCP connections is not good. We are using async client to issue multiple calls over single connection at same time. You can do the same.
> 
> Best regards, Vitalii Tymchyshyn.
> 
> 03.04.12 16:18, Jeff Williams написав(ла):
>> Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!
>> 
>> Jeff
>> 
>> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
>> 
>>> Hi Jeff,
>>> 
>>> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
>>> 
>>> Jake
>>> 
>>> 
>>> 
>>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com>  wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>>>> 
>>>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>>>> 
>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>> cassandra
>>>> 3.170000   0.480000   3.650000 ( 12.032212)
>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>> postgres
>>>> 2.140000   0.330000   2.470000 (  7.002601)
>>>> 
>>>> Regards,
>>>> Jeff
>>>> 
>>>> <cassandra-bm.rb>
> 


Re: Write performance compared to Postgresql

Posted by Vitalii Tymchyshyn <ti...@gmail.com>.
Note that having tons of TCP connections is not good. We are using async 
client to issue multiple calls over single connection at same time. You 
can do the same.

Best regards, Vitalii Tymchyshyn.

03.04.12 16:18, Jeff Williams написав(ла):
> Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!
>
> Jeff
>
> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
>
>> Hi Jeff,
>>
>> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
>>
>> Jake
>>
>>
>>
>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com>  wrote:
>>
>>> Hi,
>>>
>>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>>>
>>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>>>
>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>> cassandra
>>> 3.170000   0.480000   3.650000 ( 12.032212)
>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>> postgres
>>> 2.140000   0.330000   2.470000 (  7.002601)
>>>
>>> Regards,
>>> Jeff
>>>
>>> <cassandra-bm.rb>


Re: Write performance compared to Postgresql

Posted by Jeff Williams <je...@wherethebitsroam.com>.
Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!

Jeff

On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:

> Hi Jeff,
> 
> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve. 
> 
> Jake
> 
> 
> 
> On Apr 3, 2012, at 7:08 AM, Jeff Williams <je...@wherethebitsroam.com> wrote:
> 
>> Hi,
>> 
>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>> 
>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>> 
>> jeff@transcoder01:~$ ruby cassandra-bm.rb 
>> cassandra
>> 3.170000   0.480000   3.650000 ( 12.032212)
>> jeff@transcoder01:~$ ruby cassandra-bm.rb 
>> postgres
>> 2.140000   0.330000   2.470000 (  7.002601)
>> 
>> Regards,
>> Jeff
>> 
>> <cassandra-bm.rb>


Re: Write performance compared to Postgresql

Posted by Jake Luciani <ja...@gmail.com>.
Hi Jeff,

Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve. 

Jake

 

On Apr 3, 2012, at 7:08 AM, Jeff Williams <je...@wherethebitsroam.com> wrote:

> Hi,
> 
> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
> 
> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
> 
> jeff@transcoder01:~$ ruby cassandra-bm.rb 
> cassandra
>  3.170000   0.480000   3.650000 ( 12.032212)
> jeff@transcoder01:~$ ruby cassandra-bm.rb 
> postgres
>  2.140000   0.330000   2.470000 (  7.002601)
> 
> Regards,
> Jeff
> 
> <cassandra-bm.rb>