You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Mohica Jasha <mo...@gmail.com> on 2013/07/02 06:04:52 UTC

very inefficient operation with tombstones

Querying a table with 5000 thousands tombstones take 3 minutes to complete!
But Querying the same table with the same data pattern with 10,000 entries
takes a fraction of second to complete!


Details:
1. created the following table:
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '1'};
use test;
CREATE TABLE job_index (   stage text,   "timestamp" text,   PRIMARY KEY
(stage, "timestamp"));

2. inserted 5000 entries to the table:
INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00000001' );
INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00000002' );
....
INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00004999' );
INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00005000' );

3. flushed the table:
nodetool flush test job_index

4. deleted the 5000 entries:
DELETE from job_index WHERE stage ='a' AND timestamp = '00000001' ;
DELETE from job_index WHERE stage ='a' AND timestamp = '00000002' ;
...
DELETE from job_index WHERE stage ='a' AND timestamp = '00004999' ;
DELETE from job_index WHERE stage ='a' AND timestamp = '00005000' ;

5. flushed the table:
nodetool flush test job_index

6. querying the table takes 3 minutes to complete:
cqlsh:test> SELECT * from job_index limit 20000;
tracing:
http://pastebin.com/jH2rZN2X

while query was getting executed I saw a lot of GC entries in cassandra's
log:
DEBUG [ScheduledTasks:1] 2013-07-01 23:47:59,221 GCInspector.java (line
121) GC for ParNew: 30 ms for 6 collections, 263993608 used; max is
2093809664
DEBUG [ScheduledTasks:1] 2013-07-01 23:48:00,222 GCInspector.java (line
121) GC for ParNew: 29 ms for 6 collections, 186209616 used; max is
2093809664
DEBUG [ScheduledTasks:1] 2013-07-01 23:48:01,223 GCInspector.java (line
121) GC for ParNew: 29 ms for 6 collections, 108731464 used; max is
2093809664

It seems that something very inefficient is happening in managing
tombstones.

If I start with a clean table and do the following:
1. insert 5000 entries
2. flush to disk
3. insert new 5000 entries
4. flush to disk
Querying the job_index for all the 10,000 entries takes a fraction of
second to complete:
tracing:
http://pastebin.com/scUN9JrP

The fact that iterating over 5000 tombstones takes 3 minutes but iterating
over 10,000 live cells takes fraction of a second to suggest that something
very inefficient is happening in managing tombstones.

I appreciate if any developer can look into this.

-M

Re: very inefficient operation with tombstones

Posted by Robert Wille <rw...@fold3.com>.

I've seen the same thing

From:  Sylvain Lebresne <sy...@datastax.com>
Reply-To:  <us...@cassandra.apache.org>
Date:  Tue, 2 Jul 2013 08:32:06 +0200
To:  "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject:  Re: very inefficient operation with tombstones

This is https://issues.apache.org/jira/browse/CASSANDRA-5677.

--
Sylvain


On Tue, Jul 2, 2013 at 6:04 AM, Mohica Jasha <mo...@gmail.com> wrote:
> Querying a table with 5000 thousands tombstones take 3 minutes to complete!
> But Querying the same table with the same data pattern with 10,000 entries
> takes a fraction of second to complete!
> 
> 
> Details:
> 1. created the following table:
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': '1'};
> use test;
> CREATE TABLE job_index (   stage text,   "timestamp" text,   PRIMARY KEY
> (stage, "timestamp"));
> 
> 2. inserted 5000 entries to the table:
> INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00000001' );
> INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00000002' );
> ....
> INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00004999' );
> INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00005000' );
> 
> 3. flushed the table:
> nodetool flush test job_index
> 
> 4. deleted the 5000 entries:
> DELETE from job_index WHERE stage ='a' AND timestamp = '00000001' ;
> DELETE from job_index WHERE stage ='a' AND timestamp = '00000002' ;
> ...
> DELETE from job_index WHERE stage ='a' AND timestamp = '00004999' ;
> DELETE from job_index WHERE stage ='a' AND timestamp = '00005000' ;
> 
> 5. flushed the table:
> nodetool flush test job_index
> 
> 6. querying the table takes 3 minutes to complete:
> cqlsh:test> SELECT * from job_index limit 20000;
> tracing:
> http://pastebin.com/jH2rZN2X
> 
> while query was getting executed I saw a lot of GC entries in cassandra's log:
> DEBUG [ScheduledTasks:1] 2013-07-01 23:47:59,221 GCInspector.java (line 121)
> GC for ParNew: 30 ms for 6 collections, 263993608 used; max is 2093809664
> DEBUG [ScheduledTasks:1] 2013-07-01 23:48:00,222 GCInspector.java (line 121)
> GC for ParNew: 29 ms for 6 collections, 186209616 used; max is 2093809664
> DEBUG [ScheduledTasks:1] 2013-07-01 23:48:01,223 GCInspector.java (line 121)
> GC for ParNew: 29 ms for 6 collections, 108731464 used; max is 2093809664
> 
> It seems that something very inefficient is happening in managing tombstones.
> 
> If I start with a clean table and do the following:
> 1. insert 5000 entries
> 2. flush to disk
> 3. insert new 5000 entries
> 4. flush to disk
> Querying the job_index for all the 10,000 entries takes a fraction of second
> to complete:
> tracing:
> http://pastebin.com/scUN9JrP
> 
> The fact that iterating over 5000 tombstones takes 3 minutes but iterating
> over 10,000 live cells takes fraction of a second to suggest that something
> very inefficient is happening in managing tombstones.
> 
> I appreciate if any developer can look into this.
> 
> -M
> 
> 
> 
> 
> 
> 
>

Re: very inefficient operation with tombstones

Posted by Sylvain Lebresne <sy...@datastax.com>.

This is https://issues.apache.org/jira/browse/CASSANDRA-5677.

--
Sylvain


On Tue, Jul 2, 2013 at 6:04 AM, Mohica Jasha <mo...@gmail.com> wrote:

> Querying a table with 5000 thousands tombstones take 3 minutes to complete!
> But Querying the same table with the same data pattern with 10,000 entries
> takes a fraction of second to complete!
>
>
> Details:
> 1. created the following table:
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': '1'};
> use test;
> CREATE TABLE job_index (   stage text,   "timestamp" text,   PRIMARY KEY
> (stage, "timestamp"));
>
> 2. inserted 5000 entries to the table:
> INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00000001' );
> INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00000002' );
> ....
> INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00004999' );
> INSERT INTO job_index (stage, timestamp) VALUES ( 'a', '00005000' );
>
> 3. flushed the table:
> nodetool flush test job_index
>
> 4. deleted the 5000 entries:
> DELETE from job_index WHERE stage ='a' AND timestamp = '00000001' ;
> DELETE from job_index WHERE stage ='a' AND timestamp = '00000002' ;
> ...
> DELETE from job_index WHERE stage ='a' AND timestamp = '00004999' ;
> DELETE from job_index WHERE stage ='a' AND timestamp = '00005000' ;
>
> 5. flushed the table:
> nodetool flush test job_index
>
> 6. querying the table takes 3 minutes to complete:
> cqlsh:test> SELECT * from job_index limit 20000;
> tracing:
> http://pastebin.com/jH2rZN2X
>
> while query was getting executed I saw a lot of GC entries in cassandra's
> log:
> DEBUG [ScheduledTasks:1] 2013-07-01 23:47:59,221 GCInspector.java (line
> 121) GC for ParNew: 30 ms for 6 collections, 263993608 used; max is
> 2093809664
> DEBUG [ScheduledTasks:1] 2013-07-01 23:48:00,222 GCInspector.java (line
> 121) GC for ParNew: 29 ms for 6 collections, 186209616 used; max is
> 2093809664
> DEBUG [ScheduledTasks:1] 2013-07-01 23:48:01,223 GCInspector.java (line
> 121) GC for ParNew: 29 ms for 6 collections, 108731464 used; max is
> 2093809664
>
> It seems that something very inefficient is happening in managing
> tombstones.
>
> If I start with a clean table and do the following:
> 1. insert 5000 entries
> 2. flush to disk
> 3. insert new 5000 entries
> 4. flush to disk
> Querying the job_index for all the 10,000 entries takes a fraction of
> second to complete:
> tracing:
> http://pastebin.com/scUN9JrP
>
> The fact that iterating over 5000 tombstones takes 3 minutes but iterating
> over 10,000 live cells takes fraction of a second to suggest that something
> very inefficient is happening in managing tombstones.
>
> I appreciate if any developer can look into this.
>
> -M
>
>
>
>
>
>
>
>