You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Steven Liu <st...@datafeedfile.com> on 2011/03/11 23:34:19 UTC

Write speed roughly 1/10 of expected.

We are using the latest phpcassa
(phpcassa-0.7.a.2.tar.gz<https://github.com/downloads/thobbs/phpcassa/phpcassa-0.7.a.2.tar.gz>
) and cassandra 0.7.3,
we have inserted 12+ million documents into one column family with the
following keyspace/columnfamily settings:

Keyspace: dffl:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
    Replication Factor: 4
  Column Families:
    ColumnFamily: product
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period: 0.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 1.1578125/247/60
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Built indexes: []

It took 219 minutes to insert 12+ million docs which translates to about 913
docs/second using batch_insert in batches of 1250 documents per batch.

We have a cluster of 10 nodes all running 2 x Xeon 3.6ghz with 8GB memory
each and RAID 5 SCSI u320 and have expected better performance. We do have
the binary accel installed (actually, we went through the PHP client and
removed the $bin_accel checks so it always use the accel).

Does anybody have any ideas on the common gotcha's which could cause it to
be this slow?

Re: Write speed roughly 1/10 of expected.

Posted by Tyler Hobbs <ty...@datastax.com>.
>
> Re: Mr. Hobbs,
>
> Did you mean "which has the benefit of THRIFT-638, while 0.7.a.2 does not"
> (instead of 0.7.a.3)? 0.7.a.3 was the latest version of phpcassa we could
> find on github. We installed 0.7.a.3 with its C extension and didn't see an
> improvement. Is there a newer version with THRIFT-638 fix?
>

Neither 0.7.a.2 or 0.7.a.3 include the fix for THRIFT-638.  The fix *is*
included in the master branch, and a release with this fix included should
happen sometime soon.

-- 
Tyler Hobbs
Software Engineer, DataStax <http://datastax.com/>
Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
Python client library

Re: Write speed roughly 1/10 of expected.

Posted by Steven Liu <st...@datafeedfile.com>.
Re: Mr. Schuller,

The test documents are very small (a few lines of text each). Test data
model is standard CF with each document correponding to a row containing
9-12 columns. We are using a single client for sequential batch_insert
(probably maps to batch mutate in phpcassa), so it is very possible that
that's the bottleneck.

Re: Mr. Hobbs,

Did you mean "which has the benefit of THRIFT-638, while 0.7.a.2 does not"
(instead of 0.7.a.3)? 0.7.a.3 was the latest version of phpcassa we could
find on github. We installed 0.7.a.3 with its C extension and didn't see an
improvement. Is there a newer version with THRIFT-638 fix?

Steve


On Fri, Mar 11, 2011 at 4:55 PM, Tyler Hobbs <ty...@datastax.com> wrote:

> (I have no idea how fast phpcassa is.)
>>
>
> The current master branch (which has the benefit of THRIFT-638<https://issues.apache.org/jira/browse/THRIFT-638>,
> while 0.7.a.3 does not) can insert about 3k individual rows a second against
> a local Cassandra instance.
>
> --
> Tyler Hobbs
> Software Engineer, DataStax <http://datastax.com/>
> Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
> Python client library
>
>

Re: Write speed roughly 1/10 of expected.

Posted by Tyler Hobbs <ty...@datastax.com>.
>
> (I have no idea how fast phpcassa is.)
>

The current master branch (which has the benefit of
THRIFT-638<https://issues.apache.org/jira/browse/THRIFT-638>,
while 0.7.a.3 does not) can insert about 3k individual rows a second against
a local Cassandra instance.

-- 
Tyler Hobbs
Software Engineer, DataStax <http://datastax.com/>
Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
Python client library

Re: Write speed roughly 1/10 of expected.

Posted by Peter Schuller <pe...@infidyne.com>.
> It took 219 minutes to insert 12+ million docs which translates to about 913
> docs/second using batch_insert in batches of 1250 documents per batch.

How big are the documents and/or how big is the resulting data when loaded?

What is your data model - is each document a single column? Or a row
containing multiple columns? "913 docs/second" can be low or high or
expected, very much depending on what that means in terms of rows,
columns and sizes.

Did you observe what the bottleneck were during insertion? Were you
inserting using a single client or multiple concurrent clients to make
sure you're not bottlenecking there?

(I have no idea how fast phpcassa is.)

-- 
/ Peter Schuller