You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Donal Zang <za...@ihep.ac.cn> on 2011/06/06 10:05:14 UTC

Re: [SPAM] Re: slow insertion rate with secondary index

On 06/06/2011 05:38, Jonathan Ellis wrote:
> Index updates require read-before-write (to find out what the prior
> version was, if any, and update the index accordingly).  This is
> random i/o.
>
> Index creation on the other hand is a lot of sequential i/o, hence
> more efficient.
>
> So, the classic bulk load advice to ingest data prior to creating
> indexes applies.
Thanks for the explanation!

-- 
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zangds@ihep.ac.cn
86 010 8823 6018



Re: [SPAM] Re: [SPAM] Re: slow insertion rate with secondary index

Posted by Donal Zang <za...@ihep.ac.cn>.
On 06/06/2011 14:29, David Boxenhorn wrote:
> Jonathan, are Donal Zang's results (10x slowdown) typical?
>
> On Mon, Jun 6, 2011 at 3:14 PM, Jonathan Ellis <jbellis@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     On Mon, Jun 6, 2011 at 6:28 AM, Donal Zang <zangds@ihep.ac.cn
>     <ma...@ihep.ac.cn>> wrote:
>     > Another thing I noticed is : if you first do insertion, and then
>     build the
>     > secondary index use "update column family ...", and then do
>     select based on
>     > the index, the result is not right (seems the index is still
>     being built
>     > though the "update" commands returns quickly).
>
>     That is correct. "describe keyspace" from the cli tells you when an
>     index has finished building.
>
>     --
>     Jonathan Ellis
>     Project Chair, Apache Cassandra
>     co-founder of DataStax, the source for professional Cassandra support
>     http://www.datastax.com
>
>
seems similar to this https://issues.apache.org/jira/browse/CASSANDRA-2470

-- 
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zangds@ihep.ac.cn
86 010 8823 6018


Re: [SPAM] Re: slow insertion rate with secondary index

Posted by Jonathan Ellis <jb...@gmail.com>.
If the rows you are updating are not cached, yes.  (Otherwise maybe 10% slower.)

On Mon, Jun 6, 2011 at 7:29 AM, David Boxenhorn <da...@citypath.com> wrote:
> Jonathan, are Donal Zang's results (10x slowdown) typical?
>
> On Mon, Jun 6, 2011 at 3:14 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> On Mon, Jun 6, 2011 at 6:28 AM, Donal Zang <za...@ihep.ac.cn> wrote:
>> > Another thing I noticed is : if you first do insertion, and then build
>> > the
>> > secondary index use "update column family ...", and then do select based
>> > on
>> > the index, the result is not right (seems the index is still being built
>> > though the "update" commands returns quickly).
>>
>> That is correct. "describe keyspace" from the cli tells you when an
>> index has finished building.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: [SPAM] Re: slow insertion rate with secondary index

Posted by David Boxenhorn <da...@citypath.com>.
Jonathan, are Donal Zang's results (10x slowdown) typical?

On Mon, Jun 6, 2011 at 3:14 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Mon, Jun 6, 2011 at 6:28 AM, Donal Zang <za...@ihep.ac.cn> wrote:
> > Another thing I noticed is : if you first do insertion, and then build
> the
> > secondary index use "update column family ...", and then do select based
> on
> > the index, the result is not right (seems the index is still being built
> > though the "update" commands returns quickly).
>
> That is correct. "describe keyspace" from the cli tells you when an
> index has finished building.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: [SPAM] Re: slow insertion rate with secondary index

Posted by Jonathan Ellis <jb...@gmail.com>.
On Mon, Jun 6, 2011 at 6:28 AM, Donal Zang <za...@ihep.ac.cn> wrote:
> Another thing I noticed is : if you first do insertion, and then build the
> secondary index use "update column family ...", and then do select based on
> the index, the result is not right (seems the index is still being built
> though the "update" commands returns quickly).

That is correct. "describe keyspace" from the cli tells you when an
index has finished building.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: [SPAM] Re: slow insertion rate with secondary index

Posted by Donal Zang <za...@ihep.ac.cn>.
On 06/06/2011 10:15, David Boxenhorn wrote:
> Is there really a 10x difference between indexed CFs and non-indexed CFs? 
Well, as for my test, it is!
I'm using 0.7.6-2, 9 nodes, 3 replicas, write_consistency_level QUORUM, 
about 90,000,000 rows (~ 1K per row)
I use 20 process, 20rows for each insertion.
the insertion time for the whole row is about 0.02 seconds without index
and then I add a secondary index, and update every row with the indexed 
column, the insertion time is about 2 seconds
and if I remove the index, and update the column, the time is about 0.002

Another thing I noticed is : if you first do insertion, and then build 
the secondary index use "update column family ...", and then do select 
based on the index, the result is not right (seems the index is still 
being built though the "update" commands returns quickly). And after a 
while, the get_indexed_slices() goes time out from time to time (with 
pycassa.ConnectionPool('keyspace1', ['host1','host2'], timeout=600, 
pool_size=1) ).

Does some one else have some same experiences using the secondary indexes?

-- 
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zangds@ihep.ac.cn
86 010 8823 6018



Re: [SPAM] Re: slow insertion rate with secondary index

Posted by David Boxenhorn <da...@citypath.com>.
Is there really a 10x difference between indexed CFs and non-indexed CFs?

On Mon, Jun 6, 2011 at 11:05 AM, Donal Zang <za...@ihep.ac.cn> wrote:

> On 06/06/2011 05:38, Jonathan Ellis wrote:
>
>> Index updates require read-before-write (to find out what the prior
>> version was, if any, and update the index accordingly).  This is
>> random i/o.
>>
>> Index creation on the other hand is a lot of sequential i/o, hence
>> more efficient.
>>
>> So, the classic bulk load advice to ingest data prior to creating
>> indexes applies.
>>
> Thanks for the explanation!
>
> --
> Donal Zang
> Computing Center, IHEP
> 19B YuquanLu, Shijingshan District,Beijing, 100049
> zangds@ihep.ac.cn
> 86 010 8823 6018
>
>
>