You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Julian Simon <js...@jules.com.au> on 2010/03/30 07:42:34 UTC

Poor performance; PHP & Thrift to blame

Hi,

I've been trying to benchmark Cassandra for our use case and have been
seeing poor performance on both writes and (extremely) poor
performance on reads.

Using Cassandra 0.51 stable & thrift-0.2.0.

It turns out all the CPU time is going to the PHP client process - the
JVM operating the Cassandra server isn't breaking much of a sweat.

For reads the latency is often up to 1 second to fetch a row
containing ~2000 columns, or around 300ms to fetch a 500-column wide
row.  This is with get_slice(), and a predicate specifying the start &
finish range.

Using cachegrind and inspecting the code inside the Thrift bindings
makes it pretty clear why the performance is so bad, particularly on
reads. The biggest culprit is the translation code which casts data
back and forth into binary representations for sending over the wire
to the Cassandra server.

There seems to be some 32-bit specific code which iterates heavily
apparently due to a limitation in PHPs implementation of LONGs.

However, testing on a 64-bit host doesn't yield any performance improvement.

More surprisingly, if I compile and enable the PHP native thrift
bindings (following this guide
https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
read performance actually degrades by another 50%.  I have verified
that the Thrift code is recognizing and using the native PHP functions
provided by the library.

I've tested all of this on both 32-bit and 64-bit installations of
both PHP 5.1 & 5.2.  Results are the same in all cases.

My environment is on vanilla CentOS 5.4 server installations inside
VMWare on a 4 core 64bit host with plenty of RAM and fast disks.

Has anyone been able to produce decent performance with PHP &
Cassandra?  If so, how have you done it?

Thanks,
Jules

Re: Poor performance; PHP & Thrift to blame

Posted by David Strauss <da...@fourkitchens.com>.
On 2010-03-30 05:42, Julian Simon wrote:
> More surprisingly, if I compile and enable the PHP native thrift
> bindings (following this guide
> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
> read performance actually degrades by another 50%.  I have verified
> that the Thrift code is recognizing and using the native PHP functions
> provided by the library.

I'm the author of that guide. If offloading to the C-based extension
does not speed things up, something is wrong. As noted in the guide, it
had a massive effect on the performance I got. However, I was testing
with many threads making many small requests (under 10 columns each).

-- 
David Strauss
   | david@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]


Re: Poor performance; PHP & Thrift to blame

Posted by Juho Mäkinen <ju...@gmail.com>.
Beware that the native thrift php bindings has a bug which might
change provided argument types. Check out the bug report which I
filled:
https://issues.apache.org/jira/browse/THRIFT-796

 - Garo

On Fri, Aug 20, 2010 at 10:35 AM, sasha <sa...@gmail.com> wrote:
> Julian Simon <jsimon <at> jules.com.au> writes:
>
>>
>> Hi,
>>
>> I've been trying to benchmark Cassandra for our use case and have been
>> seeing poor performance on both writes and (extremely) poor
>> performance on reads.
>>
>> Using Cassandra 0.51 stable & thrift-0.2.0.
>>
>> It turns out all the CPU time is going to the PHP client process - the
>> JVM operating the Cassandra server isn't breaking much of a sweat.
>>
>> For reads the latency is often up to 1 second to fetch a row
>> containing ~2000 columns, or around 300ms to fetch a 500-column wide
>> row.  This is with get_slice(), and a predicate specifying the start &
>> finish range.
>>
>> Using cachegrind and inspecting the code inside the Thrift bindings
>> makes it pretty clear why the performance is so bad, particularly on
>> reads. The biggest culprit is the translation code which casts data
>> back and forth into binary representations for sending over the wire
>> to the Cassandra server.
>>
>> There seems to be some 32-bit specific code which iterates heavily
>> apparently due to a limitation in PHPs implementation of LONGs.
>>
>> However, testing on a 64-bit host doesn't yield any performance improvement.
>>
>> More surprisingly, if I compile and enable the PHP native thrift
>> bindings (following this guide
>> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
>> read performance actually degrades by another 50%.  I have verified
>> that the Thrift code is recognizing and using the native PHP functions
>> provided by the library.
>>
>> I've tested all of this on both 32-bit and 64-bit installations of
>> both PHP 5.1 & 5.2.  Results are the same in all cases.
>>
>> My environment is on vanilla CentOS 5.4 server installations inside
>> VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
>>
>> Has anyone been able to produce decent performance with PHP &
>> Cassandra?  If so, how have you done it?
>>
>> Thanks,
>> Jules
>>
>>
>
>
> I had exactly the same problem: without native thrift bindings the performance
> was low and PHP used too much CPU. But when I compiled
> and enabled the native thrift bindings (following this guide https://
> wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP), the performance
> became even lower, it degraded SEVERAL TIMES (although CPU usage decreased too).
>
> With the several random tries I discovered, that the buffer size matters. I
> mean the second and the third arguments for "new TBufferedTransport($socket, X,
> Y)". But the most surprising fact is that it matters much more when using
> native thrift bindings than when not using them.
>
> I.e.:
> - get_range_slices without native thrift bindings (either small or large buffer
> size): ~1sec.
> - get_range_slices with native thrift bindings and small buffer size (1024):
> ~5sec!
> - get_range_slices with native thrift bindings and large buffer size (40960):
> ~0.1sec.
>
> I don't know why!!
>
> P.S.: cassandra 0.6.3.
>
>
>

SV: Poor performance; PHP & Thrift to blame

Posted by Thorvaldsson Justus <ju...@svenskaspel.se>.
Seach the mailing list http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/
This is already been addressed and is a php issue only
The time 5 sec is a timeout because if I remember correctly packet size is too small or something like it
You can config so it stops being a porblem but I don't use php so search the maillist

-----Ursprungligt meddelande-----
Från: sasha [mailto:sasha2048@gmail.com] 
Skickat: den 20 augusti 2010 09:36
Till: user@cassandra.apache.org
Ämne: Re: Poor performance; PHP &amp; Thrift to blame

Julian Simon <jsimon <at> jules.com.au> writes:

> 
> Hi,
> 
> I've been trying to benchmark Cassandra for our use case and have been
> seeing poor performance on both writes and (extremely) poor
> performance on reads.
> 
> Using Cassandra 0.51 stable & thrift-0.2.0.
> 
> It turns out all the CPU time is going to the PHP client process - the
> JVM operating the Cassandra server isn't breaking much of a sweat.
> 
> For reads the latency is often up to 1 second to fetch a row
> containing ~2000 columns, or around 300ms to fetch a 500-column wide
> row.  This is with get_slice(), and a predicate specifying the start &
> finish range.
> 
> Using cachegrind and inspecting the code inside the Thrift bindings
> makes it pretty clear why the performance is so bad, particularly on
> reads. The biggest culprit is the translation code which casts data
> back and forth into binary representations for sending over the wire
> to the Cassandra server.
> 
> There seems to be some 32-bit specific code which iterates heavily
> apparently due to a limitation in PHPs implementation of LONGs.
> 
> However, testing on a 64-bit host doesn't yield any performance improvement.
> 
> More surprisingly, if I compile and enable the PHP native thrift
> bindings (following this guide
> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
> read performance actually degrades by another 50%.  I have verified
> that the Thrift code is recognizing and using the native PHP functions
> provided by the library.
> 
> I've tested all of this on both 32-bit and 64-bit installations of
> both PHP 5.1 & 5.2.  Results are the same in all cases.
> 
> My environment is on vanilla CentOS 5.4 server installations inside
> VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
> 
> Has anyone been able to produce decent performance with PHP &
> Cassandra?  If so, how have you done it?
> 
> Thanks,
> Jules
> 
> 


I had exactly the same problem: without native thrift bindings the performance 
was low and PHP used too much CPU. But when I compiled 
and enabled the native thrift bindings (following this guide https://
wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP), the performance 
became even lower, it degraded SEVERAL TIMES (although CPU usage decreased too).

With the several random tries I discovered, that the buffer size matters. I 
mean the second and the third arguments for "new TBufferedTransport($socket, X, 
Y)". But the most surprising fact is that it matters much more when using 
native thrift bindings than when not using them.

I.e.:
- get_range_slices without native thrift bindings (either small or large buffer 
size): ~1sec.
- get_range_slices with native thrift bindings and small buffer size (1024): 
~5sec!
- get_range_slices with native thrift bindings and large buffer size (40960): 
~0.1sec.

I don't know why!!

P.S.: cassandra 0.6.3.



Re: Poor performance; PHP & Thrift to blame

Posted by sasha <sa...@gmail.com>.
Julian Simon <jsimon <at> jules.com.au> writes:

> 
> Hi,
> 
> I've been trying to benchmark Cassandra for our use case and have been
> seeing poor performance on both writes and (extremely) poor
> performance on reads.
> 
> Using Cassandra 0.51 stable & thrift-0.2.0.
> 
> It turns out all the CPU time is going to the PHP client process - the
> JVM operating the Cassandra server isn't breaking much of a sweat.
> 
> For reads the latency is often up to 1 second to fetch a row
> containing ~2000 columns, or around 300ms to fetch a 500-column wide
> row.  This is with get_slice(), and a predicate specifying the start &
> finish range.
> 
> Using cachegrind and inspecting the code inside the Thrift bindings
> makes it pretty clear why the performance is so bad, particularly on
> reads. The biggest culprit is the translation code which casts data
> back and forth into binary representations for sending over the wire
> to the Cassandra server.
> 
> There seems to be some 32-bit specific code which iterates heavily
> apparently due to a limitation in PHPs implementation of LONGs.
> 
> However, testing on a 64-bit host doesn't yield any performance improvement.
> 
> More surprisingly, if I compile and enable the PHP native thrift
> bindings (following this guide
> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
> read performance actually degrades by another 50%.  I have verified
> that the Thrift code is recognizing and using the native PHP functions
> provided by the library.
> 
> I've tested all of this on both 32-bit and 64-bit installations of
> both PHP 5.1 & 5.2.  Results are the same in all cases.
> 
> My environment is on vanilla CentOS 5.4 server installations inside
> VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
> 
> Has anyone been able to produce decent performance with PHP &
> Cassandra?  If so, how have you done it?
> 
> Thanks,
> Jules
> 
> 


I had exactly the same problem: without native thrift bindings the performance 
was low and PHP used too much CPU. But when I compiled 
and enabled the native thrift bindings (following this guide https://
wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP), the performance 
became even lower, it degraded SEVERAL TIMES (although CPU usage decreased too).

With the several random tries I discovered, that the buffer size matters. I 
mean the second and the third arguments for "new TBufferedTransport($socket, X, 
Y)". But the most surprising fact is that it matters much more when using 
native thrift bindings than when not using them.

I.e.:
- get_range_slices without native thrift bindings (either small or large buffer 
size): ~1sec.
- get_range_slices with native thrift bindings and small buffer size (1024): 
~5sec!
- get_range_slices with native thrift bindings and large buffer size (40960): 
~0.1sec.

I don't know why!!

P.S.: cassandra 0.6.3.



Re: Poor performance; PHP & Thrift to blame

Posted by Julian Simon <js...@jules.com.au>.
Well, the app is written in PHP, and in order to use Cassandra for the
(small) aspect of the app which could make use of its' benefits, the
client code will need to be in PHP and run fairly speedily.

Hence my testing with PHP.

I suppose another question for me is: Are there any alternative
interfaces to Cassandra that don't involve the Thrift layer?



On Tue, Mar 30, 2010 at 11:15 PM, David Timothy Strauss
<da...@fourkitchens.com> wrote:
> This sounds like the sort of analysis that shouldn't be done in PHP. Have you tried Hadoop + Cassandra 0.6?
>
> -----Original Message-----
> From: Julian Simon <js...@jules.com.au>
> Date: Tue, 30 Mar 2010 22:21:22
> To: <us...@cassandra.apache.org>
> Subject: Re: Poor performance; PHP & Thrift to blame
>
> Yes I tested it with and without APC - it had a negligible impact on
> performance.
>
> This didn't surprise me - most of the optimization that APC offers is
> in the parsing of PHP code; seeing as the benchmark is a single PHP
> process the code parsing overhead occurs outside the benchmark loop.
>
> Does anyone have any benchmarks for larger Cassandra queries from PHP
> similar to what I'm trying to do?  The performance bottlenecks don't
> show up on 1,5,10, or even 100 column query sets - only for larger
> sets or query loops.
>
> Anyone doing time series analysis?  This is the sort of use case where
> I'd expect to see much larger query sets.
>
> I suppose Facebook and Digg are only pulling out small column sets, so
> they wouldn't necessarily notice this issue.
>
>
>
> On Tue, Mar 30, 2010 at 8:00 PM, David Timothy Strauss
> <da...@fourkitchens.com> wrote:
>> Without APC, there should be even more of an improvement with the Thrift PHP extension.
>>
>> ----- "Rauan Maemirov" <ra...@maemirov.com> wrote:
>>
>>> What about APC? Did you turn it on?
>>>
>>> 2010/3/30 Julian Simon <js...@jules.com.au>:
>>> > Hi,
>>> >
>>> > I've been trying to benchmark Cassandra for our use case and have
>>> been
>>> > seeing poor performance on both writes and (extremely) poor
>>> > performance on reads.
>>> >
>>> > Using Cassandra 0.51 stable & thrift-0.2.0.
>>> >
>>> > It turns out all the CPU time is going to the PHP client process -
>>> the
>>> > JVM operating the Cassandra server isn't breaking much of a sweat.
>>> >
>>> > For reads the latency is often up to 1 second to fetch a row
>>> > containing ~2000 columns, or around 300ms to fetch a 500-column
>>> wide
>>> > row.  This is with get_slice(), and a predicate specifying the start
>>> &
>>> > finish range.
>>> >
>>> > Using cachegrind and inspecting the code inside the Thrift bindings
>>> > makes it pretty clear why the performance is so bad, particularly
>>> on
>>> > reads. The biggest culprit is the translation code which casts data
>>> > back and forth into binary representations for sending over the
>>> wire
>>> > to the Cassandra server.
>>> >
>>> > There seems to be some 32-bit specific code which iterates heavily
>>> > apparently due to a limitation in PHPs implementation of LONGs.
>>> >
>>> > However, testing on a 64-bit host doesn't yield any performance
>>> improvement.
>>> >
>>> > More surprisingly, if I compile and enable the PHP native thrift
>>> > bindings (following this guide
>>> > https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
>>> > read performance actually degrades by another 50%.  I have verified
>>> > that the Thrift code is recognizing and using the native PHP
>>> functions
>>> > provided by the library.
>>> >
>>> > I've tested all of this on both 32-bit and 64-bit installations of
>>> > both PHP 5.1 & 5.2.  Results are the same in all cases.
>>> >
>>> > My environment is on vanilla CentOS 5.4 server installations inside
>>> > VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
>>> >
>>> > Has anyone been able to produce decent performance with PHP &
>>> > Cassandra?  If so, how have you done it?
>>> >
>>> > Thanks,
>>> > Jules
>>> >
>>
>> --
>> David Strauss
>>   | david@fourkitchens.com
>>   | +1 512 577 5827 [mobile]
>> Four Kitchens
>>   | http://fourkitchens.com
>>   | +1 512 454 6659 [office]
>>   | +1 512 870 8453 [direct]
>>
>

Re: Poor performance; PHP & Thrift to blame

Posted by David Strauss <da...@fourkitchens.com>.
On 2010-03-30 12:51, yaw wrote:
> I have seen your guide at
> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP.
> 
> I use  Cassandra with a PHP client ..
> Until now, I am using Thrift PHP classes that I found into Pandra
> project (high level PHP client) as I was unable to install or build
> thrift compiler on my old Etch Debian OS.

You pretty much have to if you want to generate a Thrift client
supporting the right API for the version of Cassandra you're running. If
it's too hard to generate on your Etch machine, spin up a VM (locally or
on some cloud) and do it on there. The generated PHP client is
completely portable as long as it's used against the same Cassandra
server version.

> I can not found native PHP extension you are speaking about... I don't
> understand if this extension can replace PHP classes that are generated 
> with thrift compiler

The generated Thrift client automatically makes use of the PHP
extension, if available, for certain small, computationally intense
portions of the code. You have to use TBinaryProtocolAccelerated for
this to work.

-- 
David Strauss
   | david@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]


Re: Poor performance; PHP & Thrift to blame

Posted by yaw <ya...@gmail.com>.
Hi David,
I have seen your guide at
https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP.

I use  Cassandra with a PHP client ..
Until now, I am using Thrift PHP classes that I found into Pandra project
(high level PHP client) as I was unable to install or build thrift compiler
on my old Etch Debian OS.


I can not found native PHP extension you are speaking about... I don't
understand if this extension can replace PHP classes that are generated
with thrift compiler


2010/3/30 David Timothy Strauss <da...@fourkitchens.com>

> This sounds like the sort of analysis that shouldn't be done in PHP. Have
> you tried Hadoop + Cassandra 0.6?
>
> -----Original Message-----
> From: Julian Simon <js...@jules.com.au>
> Date: Tue, 30 Mar 2010 22:21:22
> To: <us...@cassandra.apache.org>
> Subject: Re: Poor performance; PHP & Thrift to blame
>
> Yes I tested it with and without APC - it had a negligible impact on
> performance.
>
> This didn't surprise me - most of the optimization that APC offers is
> in the parsing of PHP code; seeing as the benchmark is a single PHP
> process the code parsing overhead occurs outside the benchmark loop.
>
> Does anyone have any benchmarks for larger Cassandra queries from PHP
> similar to what I'm trying to do?  The performance bottlenecks don't
> show up on 1,5,10, or even 100 column query sets - only for larger
> sets or query loops.
>
> Anyone doing time series analysis?  This is the sort of use case where
> I'd expect to see much larger query sets.
>
> I suppose Facebook and Digg are only pulling out small column sets, so
> they wouldn't necessarily notice this issue.
>
>
>
> On Tue, Mar 30, 2010 at 8:00 PM, David Timothy Strauss
> <da...@fourkitchens.com> wrote:
> > Without APC, there should be even more of an improvement with the Thrift
> PHP extension.
> >
> > ----- "Rauan Maemirov" <ra...@maemirov.com> wrote:
> >
> >> What about APC? Did you turn it on?
> >>
> >> 2010/3/30 Julian Simon <js...@jules.com.au>:
> >> > Hi,
> >> >
> >> > I've been trying to benchmark Cassandra for our use case and have
> >> been
> >> > seeing poor performance on both writes and (extremely) poor
> >> > performance on reads.
> >> >
> >> > Using Cassandra 0.51 stable & thrift-0.2.0.
> >> >
> >> > It turns out all the CPU time is going to the PHP client process -
> >> the
> >> > JVM operating the Cassandra server isn't breaking much of a sweat.
> >> >
> >> > For reads the latency is often up to 1 second to fetch a row
> >> > containing ~2000 columns, or around 300ms to fetch a 500-column
> >> wide
> >> > row.  This is with get_slice(), and a predicate specifying the start
> >> &
> >> > finish range.
> >> >
> >> > Using cachegrind and inspecting the code inside the Thrift bindings
> >> > makes it pretty clear why the performance is so bad, particularly
> >> on
> >> > reads. The biggest culprit is the translation code which casts data
> >> > back and forth into binary representations for sending over the
> >> wire
> >> > to the Cassandra server.
> >> >
> >> > There seems to be some 32-bit specific code which iterates heavily
> >> > apparently due to a limitation in PHPs implementation of LONGs.
> >> >
> >> > However, testing on a 64-bit host doesn't yield any performance
> >> improvement.
> >> >
> >> > More surprisingly, if I compile and enable the PHP native thrift
> >> > bindings (following this guide
> >> > https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
> >> > read performance actually degrades by another 50%.  I have verified
> >> > that the Thrift code is recognizing and using the native PHP
> >> functions
> >> > provided by the library.
> >> >
> >> > I've tested all of this on both 32-bit and 64-bit installations of
> >> > both PHP 5.1 & 5.2.  Results are the same in all cases.
> >> >
> >> > My environment is on vanilla CentOS 5.4 server installations inside
> >> > VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
> >> >
> >> > Has anyone been able to produce decent performance with PHP &
> >> > Cassandra?  If so, how have you done it?
> >> >
> >> > Thanks,
> >> > Jules
> >> >
> >
> > --
> > David Strauss
> >   | david@fourkitchens.com
> >   | +1 512 577 5827 [mobile]
> > Four Kitchens
> >   | http://fourkitchens.com
> >   | +1 512 454 6659 [office]
> >   | +1 512 870 8453 [direct]
> >
>

Re: Poor performance; PHP & Thrift to blame

Posted by David Timothy Strauss <da...@fourkitchens.com>.
This sounds like the sort of analysis that shouldn't be done in PHP. Have you tried Hadoop + Cassandra 0.6?

-----Original Message-----
From: Julian Simon <js...@jules.com.au>
Date: Tue, 30 Mar 2010 22:21:22 
To: <us...@cassandra.apache.org>
Subject: Re: Poor performance; PHP & Thrift to blame

Yes I tested it with and without APC - it had a negligible impact on
performance.

This didn't surprise me - most of the optimization that APC offers is
in the parsing of PHP code; seeing as the benchmark is a single PHP
process the code parsing overhead occurs outside the benchmark loop.

Does anyone have any benchmarks for larger Cassandra queries from PHP
similar to what I'm trying to do?  The performance bottlenecks don't
show up on 1,5,10, or even 100 column query sets - only for larger
sets or query loops.

Anyone doing time series analysis?  This is the sort of use case where
I'd expect to see much larger query sets.

I suppose Facebook and Digg are only pulling out small column sets, so
they wouldn't necessarily notice this issue.



On Tue, Mar 30, 2010 at 8:00 PM, David Timothy Strauss
<da...@fourkitchens.com> wrote:
> Without APC, there should be even more of an improvement with the Thrift PHP extension.
>
> ----- "Rauan Maemirov" <ra...@maemirov.com> wrote:
>
>> What about APC? Did you turn it on?
>>
>> 2010/3/30 Julian Simon <js...@jules.com.au>:
>> > Hi,
>> >
>> > I've been trying to benchmark Cassandra for our use case and have
>> been
>> > seeing poor performance on both writes and (extremely) poor
>> > performance on reads.
>> >
>> > Using Cassandra 0.51 stable & thrift-0.2.0.
>> >
>> > It turns out all the CPU time is going to the PHP client process -
>> the
>> > JVM operating the Cassandra server isn't breaking much of a sweat.
>> >
>> > For reads the latency is often up to 1 second to fetch a row
>> > containing ~2000 columns, or around 300ms to fetch a 500-column
>> wide
>> > row.  This is with get_slice(), and a predicate specifying the start
>> &
>> > finish range.
>> >
>> > Using cachegrind and inspecting the code inside the Thrift bindings
>> > makes it pretty clear why the performance is so bad, particularly
>> on
>> > reads. The biggest culprit is the translation code which casts data
>> > back and forth into binary representations for sending over the
>> wire
>> > to the Cassandra server.
>> >
>> > There seems to be some 32-bit specific code which iterates heavily
>> > apparently due to a limitation in PHPs implementation of LONGs.
>> >
>> > However, testing on a 64-bit host doesn't yield any performance
>> improvement.
>> >
>> > More surprisingly, if I compile and enable the PHP native thrift
>> > bindings (following this guide
>> > https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
>> > read performance actually degrades by another 50%.  I have verified
>> > that the Thrift code is recognizing and using the native PHP
>> functions
>> > provided by the library.
>> >
>> > I've tested all of this on both 32-bit and 64-bit installations of
>> > both PHP 5.1 & 5.2.  Results are the same in all cases.
>> >
>> > My environment is on vanilla CentOS 5.4 server installations inside
>> > VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
>> >
>> > Has anyone been able to produce decent performance with PHP &
>> > Cassandra?  If so, how have you done it?
>> >
>> > Thanks,
>> > Jules
>> >
>
> --
> David Strauss
>   | david@fourkitchens.com
>   | +1 512 577 5827 [mobile]
> Four Kitchens
>   | http://fourkitchens.com
>   | +1 512 454 6659 [office]
>   | +1 512 870 8453 [direct]
>

Re: Poor performance; PHP & Thrift to blame

Posted by Julian Simon <js...@jules.com.au>.
Yes I tested it with and without APC - it had a negligible impact on
performance.

This didn't surprise me - most of the optimization that APC offers is
in the parsing of PHP code; seeing as the benchmark is a single PHP
process the code parsing overhead occurs outside the benchmark loop.

Does anyone have any benchmarks for larger Cassandra queries from PHP
similar to what I'm trying to do?  The performance bottlenecks don't
show up on 1,5,10, or even 100 column query sets - only for larger
sets or query loops.

Anyone doing time series analysis?  This is the sort of use case where
I'd expect to see much larger query sets.

I suppose Facebook and Digg are only pulling out small column sets, so
they wouldn't necessarily notice this issue.



On Tue, Mar 30, 2010 at 8:00 PM, David Timothy Strauss
<da...@fourkitchens.com> wrote:
> Without APC, there should be even more of an improvement with the Thrift PHP extension.
>
> ----- "Rauan Maemirov" <ra...@maemirov.com> wrote:
>
>> What about APC? Did you turn it on?
>>
>> 2010/3/30 Julian Simon <js...@jules.com.au>:
>> > Hi,
>> >
>> > I've been trying to benchmark Cassandra for our use case and have
>> been
>> > seeing poor performance on both writes and (extremely) poor
>> > performance on reads.
>> >
>> > Using Cassandra 0.51 stable & thrift-0.2.0.
>> >
>> > It turns out all the CPU time is going to the PHP client process -
>> the
>> > JVM operating the Cassandra server isn't breaking much of a sweat.
>> >
>> > For reads the latency is often up to 1 second to fetch a row
>> > containing ~2000 columns, or around 300ms to fetch a 500-column
>> wide
>> > row.  This is with get_slice(), and a predicate specifying the start
>> &
>> > finish range.
>> >
>> > Using cachegrind and inspecting the code inside the Thrift bindings
>> > makes it pretty clear why the performance is so bad, particularly
>> on
>> > reads. The biggest culprit is the translation code which casts data
>> > back and forth into binary representations for sending over the
>> wire
>> > to the Cassandra server.
>> >
>> > There seems to be some 32-bit specific code which iterates heavily
>> > apparently due to a limitation in PHPs implementation of LONGs.
>> >
>> > However, testing on a 64-bit host doesn't yield any performance
>> improvement.
>> >
>> > More surprisingly, if I compile and enable the PHP native thrift
>> > bindings (following this guide
>> > https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
>> > read performance actually degrades by another 50%.  I have verified
>> > that the Thrift code is recognizing and using the native PHP
>> functions
>> > provided by the library.
>> >
>> > I've tested all of this on both 32-bit and 64-bit installations of
>> > both PHP 5.1 & 5.2.  Results are the same in all cases.
>> >
>> > My environment is on vanilla CentOS 5.4 server installations inside
>> > VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
>> >
>> > Has anyone been able to produce decent performance with PHP &
>> > Cassandra?  If so, how have you done it?
>> >
>> > Thanks,
>> > Jules
>> >
>
> --
> David Strauss
>   | david@fourkitchens.com
>   | +1 512 577 5827 [mobile]
> Four Kitchens
>   | http://fourkitchens.com
>   | +1 512 454 6659 [office]
>   | +1 512 870 8453 [direct]
>

Re: Poor performance; PHP & Thrift to blame

Posted by David Timothy Strauss <da...@fourkitchens.com>.
Without APC, there should be even more of an improvement with the Thrift PHP extension.

----- "Rauan Maemirov" <ra...@maemirov.com> wrote:

> What about APC? Did you turn it on?
> 
> 2010/3/30 Julian Simon <js...@jules.com.au>:
> > Hi,
> >
> > I've been trying to benchmark Cassandra for our use case and have
> been
> > seeing poor performance on both writes and (extremely) poor
> > performance on reads.
> >
> > Using Cassandra 0.51 stable & thrift-0.2.0.
> >
> > It turns out all the CPU time is going to the PHP client process -
> the
> > JVM operating the Cassandra server isn't breaking much of a sweat.
> >
> > For reads the latency is often up to 1 second to fetch a row
> > containing ~2000 columns, or around 300ms to fetch a 500-column
> wide
> > row.  This is with get_slice(), and a predicate specifying the start
> &
> > finish range.
> >
> > Using cachegrind and inspecting the code inside the Thrift bindings
> > makes it pretty clear why the performance is so bad, particularly
> on
> > reads. The biggest culprit is the translation code which casts data
> > back and forth into binary representations for sending over the
> wire
> > to the Cassandra server.
> >
> > There seems to be some 32-bit specific code which iterates heavily
> > apparently due to a limitation in PHPs implementation of LONGs.
> >
> > However, testing on a 64-bit host doesn't yield any performance
> improvement.
> >
> > More surprisingly, if I compile and enable the PHP native thrift
> > bindings (following this guide
> > https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
> > read performance actually degrades by another 50%.  I have verified
> > that the Thrift code is recognizing and using the native PHP
> functions
> > provided by the library.
> >
> > I've tested all of this on both 32-bit and 64-bit installations of
> > both PHP 5.1 & 5.2.  Results are the same in all cases.
> >
> > My environment is on vanilla CentOS 5.4 server installations inside
> > VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
> >
> > Has anyone been able to produce decent performance with PHP &
> > Cassandra?  If so, how have you done it?
> >
> > Thanks,
> > Jules
> >

-- 
David Strauss
   | david@fourkitchens.com
   | +1 512 577 5827 [mobile]
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]

Re: Poor performance; PHP & Thrift to blame

Posted by Rauan Maemirov <ra...@maemirov.com>.
What about APC? Did you turn it on?

2010/3/30 Julian Simon <js...@jules.com.au>:
> Hi,
>
> I've been trying to benchmark Cassandra for our use case and have been
> seeing poor performance on both writes and (extremely) poor
> performance on reads.
>
> Using Cassandra 0.51 stable & thrift-0.2.0.
>
> It turns out all the CPU time is going to the PHP client process - the
> JVM operating the Cassandra server isn't breaking much of a sweat.
>
> For reads the latency is often up to 1 second to fetch a row
> containing ~2000 columns, or around 300ms to fetch a 500-column wide
> row.  This is with get_slice(), and a predicate specifying the start &
> finish range.
>
> Using cachegrind and inspecting the code inside the Thrift bindings
> makes it pretty clear why the performance is so bad, particularly on
> reads. The biggest culprit is the translation code which casts data
> back and forth into binary representations for sending over the wire
> to the Cassandra server.
>
> There seems to be some 32-bit specific code which iterates heavily
> apparently due to a limitation in PHPs implementation of LONGs.
>
> However, testing on a 64-bit host doesn't yield any performance improvement.
>
> More surprisingly, if I compile and enable the PHP native thrift
> bindings (following this guide
> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
> read performance actually degrades by another 50%.  I have verified
> that the Thrift code is recognizing and using the native PHP functions
> provided by the library.
>
> I've tested all of this on both 32-bit and 64-bit installations of
> both PHP 5.1 & 5.2.  Results are the same in all cases.
>
> My environment is on vanilla CentOS 5.4 server installations inside
> VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
>
> Has anyone been able to produce decent performance with PHP &
> Cassandra?  If so, how have you done it?
>
> Thanks,
> Jules
>