You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vodnok <vo...@gmail.com> on 2011/03/11 17:24:36 UTC

Poor performance on small data set

Hi,

I'm facing poor performance issue getting a simple row

Here is my dev env :

- Windows 7
- PHPCassa [PHP 5.3.5]
- Cassandra 0.7.3

CF :
create column family docs with comparator = 'UTF8Type' and column_type =
'Standard' and rows_cached=100000 and keys_cached=100000;

There is less than 1000 rows and i've got a 75-100ms to get one row by id
With memcached it's 2ms....

I don't know where is the problem. jvm ? cassandra ? phpcassa ?

What can i do to detect where is the problem ?

Thank you,

Vodnok,

ps: phpcassa say to use C Extension (but cannot use this C Extension made
for linux on Windows) but i'm not sure the issue is coming from phpcassa

Re: Poor performance on small data set

Posted by ruslan usifov <ru...@gmail.com>.
Here is php windows extension but you must use trunk version of thrift

2011/3/12 Vodnok <vo...@gmail.com>

> Thank you all for your replies
>
>
> "nagle + delayed ACK problem" : I founded a way to solve this via regedit
> but no impact on response time
>
> THRIFT-638 : It seems to be a solution but i don't know how to patch this
> on my environement phpcassa has a C extension but it's hard for me to build
> a php extension
>
>

Re: Poor performance on small data set

Posted by Sébastien Kondov <vo...@gmail.com>.
Hi,

Just to inform that i finally compiled thrift extension to a .dll and
performances are improved. I was forced to switch to a php vc9. vc6 isn't
supported anymore by php.

Average access time were pretty bad before (70-100ms) by row and now it's
5-10ms. So nearly 10X faster caused by new extension .dll and maybe php vc9.

So it's good news... but 10ms is no really good performance compare to mysql
or memcached on insert. So i'll make new test on a virtual machine (XP) to
see Windows Seven impact.


I would like your advice on these performances :

CPU : U9400@1.4Ghz
Windows 7 32bit
Ram : 4Go
(my dev config In futur, it'll run on unix server)

Testing by reading/inserting 1000x the same row id

Read : 7.2 sec to read 1000 rows
Insert : 8.5 sec to insert 1000 rows

strlen of 1 rows serialized= 2604 char
1 row = 20 column
When i say row it's mean like mysql row.


Does it sound good to you ?
Are performances limited by CPU ?


Other observation is when i store my row serialized in one column i've got
boost performance.

cf[id][serialized]=serialize(row)

read/insert: 2.3sec/1000rows

when read/insert row not serialized 7-8sec

*So performance does not depend on size but on number of column*

So conclusion is that is better to store a data row serialized as i will all
the time read all the data of a row each time.




Thank you,

Vodnok

2011/3/12 Tyler Hobbs <ty...@datastax.com>

> On Sat, Mar 12, 2011 at 6:45 AM, Vodnok <vo...@gmail.com> wrote:
>
>>
>> THRIFT-638 : It seems to be a solution but i don't know how to patch this
>> on my environement phpcassa has a C extension but it's hard for me to build
>> a php extension
>>
>
> The master branch of phpcassa includes the changes from THRIFT-638.
>
> --
> Tyler Hobbs
> Software Engineer, DataStax <http://datastax.com/>
> Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
> Python client library
>
>

Re: Poor performance on small data set

Posted by Tyler Hobbs <ty...@datastax.com>.
On Sat, Mar 12, 2011 at 6:45 AM, Vodnok <vo...@gmail.com> wrote:

>
> THRIFT-638 : It seems to be a solution but i don't know how to patch this
> on my environement phpcassa has a C extension but it's hard for me to build
> a php extension
>

The master branch of phpcassa includes the changes from THRIFT-638.

-- 
Tyler Hobbs
Software Engineer, DataStax <http://datastax.com/>
Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
Python client library

Re: Poor performance on small data set

Posted by Vodnok <vo...@gmail.com>.
Thank you all for your replies


"nagle + delayed ACK problem" : I founded a way to solve this via regedit
but no impact on response time

THRIFT-638 : It seems to be a solution but i don't know how to patch this on
my environement phpcassa has a C extension but it's hard for me to build a
php extension

I tried to connect with localhost or 127.0.0.1 no change.

It's strange beaucause in my memories 1 week ago i had better performances.
Accessing a row was 115% time of memcached and now it's a lot more

I'll try to reinstall cassandra from scratch...


2011/3/11 Jonathan Ellis <jb...@gmail.com>

> Also: https://issues.apache.org/jira/browse/THRIFT-638
>
> On Fri, Mar 11, 2011 at 10:44 AM, Peter Schuller
> <pe...@infidyne.com> wrote:
> >> There is less than 1000 rows and i've got a 75-100ms to get one row by
> id
> >> With memcached it's 2ms....
> >>
> >> I don't know where is the problem. jvm ? cassandra ? phpcassa ?
> >>
> >> What can i do to detect where is the problem ?
> >
> > I'm not familiar with the PHP client, but this sounds suspiciously
> > like a nagle + delayed ACK problem. The PHP client probably isn't
> > setting the TCP_NODELAY flag (or the equivalent in Windows).
> >
> > Google for "nagle delayed ack" for details.
> >
> > --
> > / Peter Schuller
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Poor performance on small data set

Posted by Jonathan Ellis <jb...@gmail.com>.
Also: https://issues.apache.org/jira/browse/THRIFT-638

On Fri, Mar 11, 2011 at 10:44 AM, Peter Schuller
<pe...@infidyne.com> wrote:
>> There is less than 1000 rows and i've got a 75-100ms to get one row by id
>> With memcached it's 2ms....
>>
>> I don't know where is the problem. jvm ? cassandra ? phpcassa ?
>>
>> What can i do to detect where is the problem ?
>
> I'm not familiar with the PHP client, but this sounds suspiciously
> like a nagle + delayed ACK problem. The PHP client probably isn't
> setting the TCP_NODELAY flag (or the equivalent in Windows).
>
> Google for "nagle delayed ack" for details.
>
> --
> / Peter Schuller
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Poor performance on small data set

Posted by Edward Capriolo <ed...@gmail.com>.
On Fri, Mar 11, 2011 at 11:44 AM, Peter Schuller
<pe...@infidyne.com> wrote:
>> There is less than 1000 rows and i've got a 75-100ms to get one row by id
>> With memcached it's 2ms....
>>
>> I don't know where is the problem. jvm ? cassandra ? phpcassa ?
>>
>> What can i do to detect where is the problem ?
>
> I'm not familiar with the PHP client, but this sounds suspiciously
> like a nagle + delayed ACK problem. The PHP client probably isn't
> setting the TCP_NODELAY flag (or the equivalent in Windows).
>
> Google for "nagle delayed ack" for details.
>
> --
> / Peter Schuller
>
Also you will find that setting rowsCached and keysCached not
effective. Chose one or the other. (that is not your problem but an
FYI)

Re: Poor performance on small data set

Posted by Peter Schuller <pe...@infidyne.com>.
> There is less than 1000 rows and i've got a 75-100ms to get one row by id
> With memcached it's 2ms....
>
> I don't know where is the problem. jvm ? cassandra ? phpcassa ?
>
> What can i do to detect where is the problem ?

I'm not familiar with the PHP client, but this sounds suspiciously
like a nagle + delayed ACK problem. The PHP client probably isn't
setting the TCP_NODELAY flag (or the equivalent in Windows).

Google for "nagle delayed ack" for details.

-- 
/ Peter Schuller