You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by sasha <sa...@gmail.com> on 2010/08/20 09:35:44 UTC

Re: Poor performance; PHP & Thrift to blame

Julian Simon <jsimon <at> jules.com.au> writes:

> 
> Hi,
> 
> I've been trying to benchmark Cassandra for our use case and have been
> seeing poor performance on both writes and (extremely) poor
> performance on reads.
> 
> Using Cassandra 0.51 stable & thrift-0.2.0.
> 
> It turns out all the CPU time is going to the PHP client process - the
> JVM operating the Cassandra server isn't breaking much of a sweat.
> 
> For reads the latency is often up to 1 second to fetch a row
> containing ~2000 columns, or around 300ms to fetch a 500-column wide
> row.  This is with get_slice(), and a predicate specifying the start &
> finish range.
> 
> Using cachegrind and inspecting the code inside the Thrift bindings
> makes it pretty clear why the performance is so bad, particularly on
> reads. The biggest culprit is the translation code which casts data
> back and forth into binary representations for sending over the wire
> to the Cassandra server.
> 
> There seems to be some 32-bit specific code which iterates heavily
> apparently due to a limitation in PHPs implementation of LONGs.
> 
> However, testing on a 64-bit host doesn't yield any performance improvement.
> 
> More surprisingly, if I compile and enable the PHP native thrift
> bindings (following this guide
> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
> read performance actually degrades by another 50%.  I have verified
> that the Thrift code is recognizing and using the native PHP functions
> provided by the library.
> 
> I've tested all of this on both 32-bit and 64-bit installations of
> both PHP 5.1 & 5.2.  Results are the same in all cases.
> 
> My environment is on vanilla CentOS 5.4 server installations inside
> VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
> 
> Has anyone been able to produce decent performance with PHP &
> Cassandra?  If so, how have you done it?
> 
> Thanks,
> Jules
> 
> 


I had exactly the same problem: without native thrift bindings the performance 
was low and PHP used too much CPU. But when I compiled 
and enabled the native thrift bindings (following this guide https://
wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP), the performance 
became even lower, it degraded SEVERAL TIMES (although CPU usage decreased too).

With the several random tries I discovered, that the buffer size matters. I 
mean the second and the third arguments for "new TBufferedTransport($socket, X, 
Y)". But the most surprising fact is that it matters much more when using 
native thrift bindings than when not using them.

I.e.:
- get_range_slices without native thrift bindings (either small or large buffer 
size): ~1sec.
- get_range_slices with native thrift bindings and small buffer size (1024): 
~5sec!
- get_range_slices with native thrift bindings and large buffer size (40960): 
~0.1sec.

I don't know why!!

P.S.: cassandra 0.6.3.



Re: Poor performance; PHP & Thrift to blame

Posted by Juho Mäkinen <ju...@gmail.com>.
Beware that the native thrift php bindings has a bug which might
change provided argument types. Check out the bug report which I
filled:
https://issues.apache.org/jira/browse/THRIFT-796

 - Garo

On Fri, Aug 20, 2010 at 10:35 AM, sasha <sa...@gmail.com> wrote:
> Julian Simon <jsimon <at> jules.com.au> writes:
>
>>
>> Hi,
>>
>> I've been trying to benchmark Cassandra for our use case and have been
>> seeing poor performance on both writes and (extremely) poor
>> performance on reads.
>>
>> Using Cassandra 0.51 stable & thrift-0.2.0.
>>
>> It turns out all the CPU time is going to the PHP client process - the
>> JVM operating the Cassandra server isn't breaking much of a sweat.
>>
>> For reads the latency is often up to 1 second to fetch a row
>> containing ~2000 columns, or around 300ms to fetch a 500-column wide
>> row.  This is with get_slice(), and a predicate specifying the start &
>> finish range.
>>
>> Using cachegrind and inspecting the code inside the Thrift bindings
>> makes it pretty clear why the performance is so bad, particularly on
>> reads. The biggest culprit is the translation code which casts data
>> back and forth into binary representations for sending over the wire
>> to the Cassandra server.
>>
>> There seems to be some 32-bit specific code which iterates heavily
>> apparently due to a limitation in PHPs implementation of LONGs.
>>
>> However, testing on a 64-bit host doesn't yield any performance improvement.
>>
>> More surprisingly, if I compile and enable the PHP native thrift
>> bindings (following this guide
>> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
>> read performance actually degrades by another 50%.  I have verified
>> that the Thrift code is recognizing and using the native PHP functions
>> provided by the library.
>>
>> I've tested all of this on both 32-bit and 64-bit installations of
>> both PHP 5.1 & 5.2.  Results are the same in all cases.
>>
>> My environment is on vanilla CentOS 5.4 server installations inside
>> VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
>>
>> Has anyone been able to produce decent performance with PHP &
>> Cassandra?  If so, how have you done it?
>>
>> Thanks,
>> Jules
>>
>>
>
>
> I had exactly the same problem: without native thrift bindings the performance
> was low and PHP used too much CPU. But when I compiled
> and enabled the native thrift bindings (following this guide https://
> wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP), the performance
> became even lower, it degraded SEVERAL TIMES (although CPU usage decreased too).
>
> With the several random tries I discovered, that the buffer size matters. I
> mean the second and the third arguments for "new TBufferedTransport($socket, X,
> Y)". But the most surprising fact is that it matters much more when using
> native thrift bindings than when not using them.
>
> I.e.:
> - get_range_slices without native thrift bindings (either small or large buffer
> size): ~1sec.
> - get_range_slices with native thrift bindings and small buffer size (1024):
> ~5sec!
> - get_range_slices with native thrift bindings and large buffer size (40960):
> ~0.1sec.
>
> I don't know why!!
>
> P.S.: cassandra 0.6.3.
>
>
>

SV: Poor performance; PHP & Thrift to blame

Posted by Thorvaldsson Justus <ju...@svenskaspel.se>.
Seach the mailing list http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/
This is already been addressed and is a php issue only
The time 5 sec is a timeout because if I remember correctly packet size is too small or something like it
You can config so it stops being a porblem but I don't use php so search the maillist

-----Ursprungligt meddelande-----
Från: sasha [mailto:sasha2048@gmail.com] 
Skickat: den 20 augusti 2010 09:36
Till: user@cassandra.apache.org
Ämne: Re: Poor performance; PHP &amp; Thrift to blame

Julian Simon <jsimon <at> jules.com.au> writes:

> 
> Hi,
> 
> I've been trying to benchmark Cassandra for our use case and have been
> seeing poor performance on both writes and (extremely) poor
> performance on reads.
> 
> Using Cassandra 0.51 stable & thrift-0.2.0.
> 
> It turns out all the CPU time is going to the PHP client process - the
> JVM operating the Cassandra server isn't breaking much of a sweat.
> 
> For reads the latency is often up to 1 second to fetch a row
> containing ~2000 columns, or around 300ms to fetch a 500-column wide
> row.  This is with get_slice(), and a predicate specifying the start &
> finish range.
> 
> Using cachegrind and inspecting the code inside the Thrift bindings
> makes it pretty clear why the performance is so bad, particularly on
> reads. The biggest culprit is the translation code which casts data
> back and forth into binary representations for sending over the wire
> to the Cassandra server.
> 
> There seems to be some 32-bit specific code which iterates heavily
> apparently due to a limitation in PHPs implementation of LONGs.
> 
> However, testing on a 64-bit host doesn't yield any performance improvement.
> 
> More surprisingly, if I compile and enable the PHP native thrift
> bindings (following this guide
> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
> read performance actually degrades by another 50%.  I have verified
> that the Thrift code is recognizing and using the native PHP functions
> provided by the library.
> 
> I've tested all of this on both 32-bit and 64-bit installations of
> both PHP 5.1 & 5.2.  Results are the same in all cases.
> 
> My environment is on vanilla CentOS 5.4 server installations inside
> VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
> 
> Has anyone been able to produce decent performance with PHP &
> Cassandra?  If so, how have you done it?
> 
> Thanks,
> Jules
> 
> 


I had exactly the same problem: without native thrift bindings the performance 
was low and PHP used too much CPU. But when I compiled 
and enabled the native thrift bindings (following this guide https://
wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP), the performance 
became even lower, it degraded SEVERAL TIMES (although CPU usage decreased too).

With the several random tries I discovered, that the buffer size matters. I 
mean the second and the third arguments for "new TBufferedTransport($socket, X, 
Y)". But the most surprising fact is that it matters much more when using 
native thrift bindings than when not using them.

I.e.:
- get_range_slices without native thrift bindings (either small or large buffer 
size): ~1sec.
- get_range_slices with native thrift bindings and small buffer size (1024): 
~5sec!
- get_range_slices with native thrift bindings and large buffer size (40960): 
~0.1sec.

I don't know why!!

P.S.: cassandra 0.6.3.