You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jeff Williams <je...@wherethebitsroam.com> on 2012/04/03 13:08:16 UTC
Write performance compared to Postgresql
Hi,
I am looking at cassandra for a logging application. We currently log to a Postgresql database.
I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
jeff@transcoder01:~$ ruby cassandra-bm.rb
cassandra
3.170000 0.480000 3.650000 ( 12.032212)
jeff@transcoder01:~$ ruby cassandra-bm.rb
postgres
2.140000 0.330000 2.470000 ( 7.002601)
Regards,
Jeff
RE: Write performance compared to Postgresql
Posted by Jeremiah Jordan <JE...@morningstar.com>.
So Cassandra may or may not be faster than your current system when you have a couple connections. Where it is faster, and scales, is when you get hundreds of clients across many nodes.
See:
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
With 60 clients running 200 threads each they were able to get 10K writes per second per server, and as you added servers from 48-288 you still got 10K writes per second, so the aggregate writes per second went from 48*10K to 288*10K
-Jeremiah
________________________________________
From: Jeff Williams [jeffw@wherethebitsroam.com]
Sent: Tuesday, April 03, 2012 10:09 AM
To: user@cassandra.apache.org
Subject: Re: Write performance compared to Postgresql
Vitalii,
Yep, that sounds like a good idea. Do you have any more information about how you're doing that? Which client?
Because even with 3 concurrent client nodes, my single postgresql server is still out performing my 2 node cassandra cluster, although the gap is narrowing.
Jeff
On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:
> Note that having tons of TCP connections is not good. We are using async client to issue multiple calls over single connection at same time. You can do the same.
>
> Best regards, Vitalii Tymchyshyn.
>
> 03.04.12 16:18, Jeff Williams написав(ла):
>> Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!
>>
>> Jeff
>>
>> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
>>
>>> Hi Jeff,
>>>
>>> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
>>>
>>> Jake
>>>
>>>
>>>
>>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>>>>
>>>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>>>>
>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>> cassandra
>>>> 3.170000 0.480000 3.650000 ( 12.032212)
>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>> postgres
>>>> 2.140000 0.330000 2.470000 ( 7.002601)
>>>>
>>>> Regards,
>>>> Jeff
>>>>
>>>> <cassandra-bm.rb>
>
Re: Write performance compared to Postgresql
Posted by Віталій Тимчишин <ti...@gmail.com>.
Hello.
We are using java async thrift client.
As of ruby, it seems you need to use something like
http://www.mikeperham.com/2010/02/09/cassandra-and-eventmachine/
(Not sure as I know nothing about ruby).
Best regards, Vitalii Tymchyshyn
2012/4/3 Jeff Williams <je...@wherethebitsroam.com>
> Vitalii,
>
> Yep, that sounds like a good idea. Do you have any more information about
> how you're doing that? Which client?
>
> Because even with 3 concurrent client nodes, my single postgresql server
> is still out performing my 2 node cassandra cluster, although the gap is
> narrowing.
>
> Jeff
>
> On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:
>
> > Note that having tons of TCP connections is not good. We are using async
> client to issue multiple calls over single connection at same time. You can
> do the same.
> >
> > Best regards, Vitalii Tymchyshyn.
> >
> > 03.04.12 16:18, Jeff Williams написав(ла):
> >> Ok, so you think the write speed is limited by the client and protocol,
> rather than the cassandra backend? This sounds reasonable, and fits with
> our use case, as we will have several servers writing. However, a bit
> harder to test!
> >>
> >> Jeff
> >>
> >> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
> >>
> >>> Hi Jeff,
> >>>
> >>> Writing serially over one connection will be slower. If you run many
> threads hitting the server at once you will see throughput improve.
> >>>
> >>> Jake
> >>>
> >>>
> >>>
> >>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com>
> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I am looking at cassandra for a logging application. We currently log
> to a Postgresql database.
> >>>>
> >>>> I set up 2 cassandra servers for testing. I did a benchmark where I
> had 100 hashes representing logs entries, read from a json file. I then
> looped over these to do 10,000 log inserts. I repeated the same writing to
> a postgresql instance on one of the cassandra servers. The script is
> attached. The cassandra writes appear to perform a lot worse. Is this
> expected?
> >>>>
> >>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
> >>>> cassandra
> >>>> 3.170000 0.480000 3.650000 ( 12.032212)
> >>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
> >>>> postgres
> >>>> 2.140000 0.330000 2.470000 ( 7.002601)
> >>>>
> >>>> Regards,
> >>>> Jeff
> >>>>
> >>>> <cassandra-bm.rb>
> >
>
>
--
Best regards,
Vitalii Tymchyshyn
Re: Write performance compared to Postgresql
Posted by Jeff Williams <je...@wherethebitsroam.com>.
Just to follow this up, I repeated the test with a multi-threaded java (Hector) client and was able to get much better performance - 10,000 rows in just over a second. So it looks like the client latency was the killer and I have since read that the ruby thrift implementation is not the fastest.
On Apr 4, 2012, at 9:11 AM, Jeff Williams wrote:
> On three machines on the same subnet as the two cassandra nodes.
>
> On Apr 3, 2012, at 6:40 PM, Collard, David L (Dave) wrote:
>
>> Where is your client running?
>>
>> -----Original Message-----
>> From: Jeff Williams [mailto:jeffw@wherethebitsroam.com]
>> Sent: Tuesday, April 03, 2012 11:09 AM
>> To: user@cassandra.apache.org
>> Subject: Re: Write performance compared to Postgresql
>>
>> Vitalii,
>>
>> Yep, that sounds like a good idea. Do you have any more information about how you're doing that? Which client?
>>
>> Because even with 3 concurrent client nodes, my single postgresql server is still out performing my 2 node cassandra cluster, although the gap is narrowing.
>>
>> Jeff
>>
>> On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:
>>
>>> Note that having tons of TCP connections is not good. We are using async client to issue multiple calls over single connection at same time. You can do the same.
>>>
>>> Best regards, Vitalii Tymchyshyn.
>>>
>>> 03.04.12 16:18, Jeff Williams написав(ла):
>>>> Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!
>>>>
>>>> Jeff
>>>>
>>>> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
>>>>
>>>>> Hi Jeff,
>>>>>
>>>>> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
>>>>>
>>>>> Jake
>>>>>
>>>>>
>>>>>
>>>>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>>>>>>
>>>>>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>>>>>>
>>>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>>>> cassandra
>>>>>> 3.170000 0.480000 3.650000 ( 12.032212)
>>>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>>>> postgres
>>>>>> 2.140000 0.330000 2.470000 ( 7.002601)
>>>>>>
>>>>>> Regards,
>>>>>> Jeff
>>>>>>
>>>>>> <cassandra-bm.rb>
>>>
>>
>
Re: Write performance compared to Postgresql
Posted by Jeff Williams <je...@wherethebitsroam.com>.
On three machines on the same subnet as the two cassandra nodes.
On Apr 3, 2012, at 6:40 PM, Collard, David L (Dave) wrote:
> Where is your client running?
>
> -----Original Message-----
> From: Jeff Williams [mailto:jeffw@wherethebitsroam.com]
> Sent: Tuesday, April 03, 2012 11:09 AM
> To: user@cassandra.apache.org
> Subject: Re: Write performance compared to Postgresql
>
> Vitalii,
>
> Yep, that sounds like a good idea. Do you have any more information about how you're doing that? Which client?
>
> Because even with 3 concurrent client nodes, my single postgresql server is still out performing my 2 node cassandra cluster, although the gap is narrowing.
>
> Jeff
>
> On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:
>
>> Note that having tons of TCP connections is not good. We are using async client to issue multiple calls over single connection at same time. You can do the same.
>>
>> Best regards, Vitalii Tymchyshyn.
>>
>> 03.04.12 16:18, Jeff Williams написав(ла):
>>> Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!
>>>
>>> Jeff
>>>
>>> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
>>>
>>>> Hi Jeff,
>>>>
>>>> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
>>>>
>>>> Jake
>>>>
>>>>
>>>>
>>>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>>>>>
>>>>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>>>>>
>>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>>> cassandra
>>>>> 3.170000 0.480000 3.650000 ( 12.032212)
>>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>>> postgres
>>>>> 2.140000 0.330000 2.470000 ( 7.002601)
>>>>>
>>>>> Regards,
>>>>> Jeff
>>>>>
>>>>> <cassandra-bm.rb>
>>
>
RE: Write performance compared to Postgresql
Posted by "Collard, David L (Dave)" <da...@alcatel-lucent.com>.
Where is your client running?
-----Original Message-----
From: Jeff Williams [mailto:jeffw@wherethebitsroam.com]
Sent: Tuesday, April 03, 2012 11:09 AM
To: user@cassandra.apache.org
Subject: Re: Write performance compared to Postgresql
Vitalii,
Yep, that sounds like a good idea. Do you have any more information about how you're doing that? Which client?
Because even with 3 concurrent client nodes, my single postgresql server is still out performing my 2 node cassandra cluster, although the gap is narrowing.
Jeff
On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:
> Note that having tons of TCP connections is not good. We are using async client to issue multiple calls over single connection at same time. You can do the same.
>
> Best regards, Vitalii Tymchyshyn.
>
> 03.04.12 16:18, Jeff Williams написав(ла):
>> Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!
>>
>> Jeff
>>
>> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
>>
>>> Hi Jeff,
>>>
>>> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
>>>
>>> Jake
>>>
>>>
>>>
>>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>>>>
>>>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>>>>
>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>> cassandra
>>>> 3.170000 0.480000 3.650000 ( 12.032212)
>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>> postgres
>>>> 2.140000 0.330000 2.470000 ( 7.002601)
>>>>
>>>> Regards,
>>>> Jeff
>>>>
>>>> <cassandra-bm.rb>
>
Re: Write performance compared to Postgresql
Posted by Jeff Williams <je...@wherethebitsroam.com>.
Vitalii,
Yep, that sounds like a good idea. Do you have any more information about how you're doing that? Which client?
Because even with 3 concurrent client nodes, my single postgresql server is still out performing my 2 node cassandra cluster, although the gap is narrowing.
Jeff
On Apr 3, 2012, at 4:08 PM, Vitalii Tymchyshyn wrote:
> Note that having tons of TCP connections is not good. We are using async client to issue multiple calls over single connection at same time. You can do the same.
>
> Best regards, Vitalii Tymchyshyn.
>
> 03.04.12 16:18, Jeff Williams написав(ла):
>> Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!
>>
>> Jeff
>>
>> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
>>
>>> Hi Jeff,
>>>
>>> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
>>>
>>> Jake
>>>
>>>
>>>
>>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>>>>
>>>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>>>>
>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>> cassandra
>>>> 3.170000 0.480000 3.650000 ( 12.032212)
>>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>>> postgres
>>>> 2.140000 0.330000 2.470000 ( 7.002601)
>>>>
>>>> Regards,
>>>> Jeff
>>>>
>>>> <cassandra-bm.rb>
>
Re: Write performance compared to Postgresql
Posted by Vitalii Tymchyshyn <ti...@gmail.com>.
Note that having tons of TCP connections is not good. We are using async
client to issue multiple calls over single connection at same time. You
can do the same.
Best regards, Vitalii Tymchyshyn.
03.04.12 16:18, Jeff Williams написав(ла):
> Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!
>
> Jeff
>
> On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
>
>> Hi Jeff,
>>
>> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
>>
>> Jake
>>
>>
>>
>> On Apr 3, 2012, at 7:08 AM, Jeff Williams<je...@wherethebitsroam.com> wrote:
>>
>>> Hi,
>>>
>>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>>>
>>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>>>
>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>> cassandra
>>> 3.170000 0.480000 3.650000 ( 12.032212)
>>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>>> postgres
>>> 2.140000 0.330000 2.470000 ( 7.002601)
>>>
>>> Regards,
>>> Jeff
>>>
>>> <cassandra-bm.rb>
Re: Write performance compared to Postgresql
Posted by Jeff Williams <je...@wherethebitsroam.com>.
Ok, so you think the write speed is limited by the client and protocol, rather than the cassandra backend? This sounds reasonable, and fits with our use case, as we will have several servers writing. However, a bit harder to test!
Jeff
On Apr 3, 2012, at 1:27 PM, Jake Luciani wrote:
> Hi Jeff,
>
> Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
>
> Jake
>
>
>
> On Apr 3, 2012, at 7:08 AM, Jeff Williams <je...@wherethebitsroam.com> wrote:
>
>> Hi,
>>
>> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>>
>> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>>
>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>> cassandra
>> 3.170000 0.480000 3.650000 ( 12.032212)
>> jeff@transcoder01:~$ ruby cassandra-bm.rb
>> postgres
>> 2.140000 0.330000 2.470000 ( 7.002601)
>>
>> Regards,
>> Jeff
>>
>> <cassandra-bm.rb>
Re: Write performance compared to Postgresql
Posted by Jake Luciani <ja...@gmail.com>.
Hi Jeff,
Writing serially over one connection will be slower. If you run many threads hitting the server at once you will see throughput improve.
Jake
On Apr 3, 2012, at 7:08 AM, Jeff Williams <je...@wherethebitsroam.com> wrote:
> Hi,
>
> I am looking at cassandra for a logging application. We currently log to a Postgresql database.
>
> I set up 2 cassandra servers for testing. I did a benchmark where I had 100 hashes representing logs entries, read from a json file. I then looped over these to do 10,000 log inserts. I repeated the same writing to a postgresql instance on one of the cassandra servers. The script is attached. The cassandra writes appear to perform a lot worse. Is this expected?
>
> jeff@transcoder01:~$ ruby cassandra-bm.rb
> cassandra
> 3.170000 0.480000 3.650000 ( 12.032212)
> jeff@transcoder01:~$ ruby cassandra-bm.rb
> postgres
> 2.140000 0.330000 2.470000 ( 7.002601)
>
> Regards,
> Jeff
>
> <cassandra-bm.rb>