You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Roland Gude <ro...@ez.no> on 2012/09/13 16:46:25 UTC

secondery indexes TTL - strange issues

Hi,

we have been running a system on Cassandra 0.7 heavily relying on secondary indexes for columns with TTL.
This has been working like a charm, but we are trying hard to move forward with Cassandra and are struggling at that point:

When we put our data into a new cluster (any 1.1.x version - currently 1.1.5) , rebuild indexes and run our system, everything seems to work good - until in some point of time index queries do not return any data at all anymore (note that the TTL has not yet expired for several months).
Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again.

What seems strange is that compaction apparently is very aggressive:

INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line
221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E
ventsByItem.ebi_eventtypeIndex-he-10-Data.db,].  78,623,000 to 373,348 (~0% of o
riginal) bytes for 83 keys at 0.000280MB/s.  Time: 1,272,883ms.


Actually we have switched to LeveledCompaction. Could it be that leveled compaction does not play nice with indexes?



AW: secondery indexes TTL - strange issues

Posted by Roland Gude <ro...@ez.no>.
Issue created.

Will attach debug logs asap
CASSANDRA-4670<https://issues.apache.org/jira/browse/CASSANDRA-4670>

Von: aaron morton [mailto:aaron@thelastpickle.com]
Gesendet: Montag, 17. September 2012 03:46
An: user@cassandra.apache.org
Betreff: Re: secondery indexes TTL - strange issues

 Date gets inserted and accessible via index query for some time. At some point in time Indexes are completely empty and start filling again (while new data enters the system).
If you can reproduce this please create a ticket on https://issues.apache.org/jira/browse/CASSANDRA .

If you can include DEBUG level logs that would be helpful.

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/09/2012, at 10:08 PM, Roland Gude <ro...@ez.no>> wrote:


I am not sure it is compacting an old file: the same thing happens eeverytime I rebuild the index. New Files appear, get compacted and vanish.

We have set up a new smaller cluster with fresh data. Same thing happens here as well. Date gets inserted and accessible via index query for some time. At some point in time Indexes are completely empty and start filling again (while new data enters the system).

I am currently testing with SizeTiered on both the fresh set and the imported set.

For the fresh set (which is significantly smaller) first results imply that the issue is not happening with SizeTieredCompaction - I have not yet tested everything that comes into my mind and will update if something new comes up.

As for the failing query it is from the cli:
get EventsByItem where 00000003-0000-1000-0000-000000000000=utf8('someValue');
00000003-0000-1000-0000-000000000000 is a TUUID we use as a marker for a TimeSeries.
(and equivalent queries with astyanax and hector as well)

This is a cf with the issue:

create column family EventsByItem
  with column_type = 'Standard'
  and comparator = 'TimeUUIDType'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'BytesType'
  and read_repair_chance = 0.5
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
  and caching = 'NONE'
  and column_metadata = [
    {column_name : '00000000-0000-1000-0000-000000000000',
    validation_class : BytesType,
    index_name : 'ebi_mandatorIndex',
    index_type : 0},
    {column_name : '00000002-0000-1000-0000-000000000000',
    validation_class : BytesType,
    index_name : 'ebi_itemidIndex',
    index_type : 0},
    {column_name : '00000003-0000-1000-0000-000000000000',
    validation_class : BytesType,
    index_name : 'ebi_eventtypeIndex',
    index_type : 0}]
  and compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64};

Von: aaron morton [mailto:aaron@thelastpickle.com<http://thelastpickle.com>]
Gesendet: Freitag, 14. September 2012 10:46
An: user@cassandra.apache.org<ma...@cassandra.apache.org>
Betreff: Re: secondery indexes TTL - strange issues

INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line
221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E
ventsByItem.ebi_eventtypeIndex-he-10-Data.db,].  78,623,000 to 373,348 (~0% of o
riginal) bytes for 83 keys at 0.000280MB/s.  Time: 1,272,883ms.
There is a lot of weird things here.
It could be levelled compaction compacting an older file for the first time. But that would be a guess.

Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again.
Are you able to do a test with SiezedTieredCompaction ?

Are you able to replicate the problem with a fresh testing CF and some test Data?

If it's only a problem with imported data can you provide a sample of the failing query ? Any maybe the CF definition ?

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/09/2012, at 2:46 AM, Roland Gude <ro...@ez.no>> wrote:



Hi,

we have been running a system on Cassandra 0.7 heavily relying on secondary indexes for columns with TTL.
This has been working like a charm, but we are trying hard to move forward with Cassandra and are struggling at that point:

When we put our data into a new cluster (any 1.1.x version - currently 1.1.5) , rebuild indexes and run our system, everything seems to work good - until in some point of time index queries do not return any data at all anymore (note that the TTL has not yet expired for several months).
Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again.

What seems strange is that compaction apparently is very aggressive:

INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line
221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E
ventsByItem.ebi_eventtypeIndex-he-10-Data.db,].  78,623,000 to 373,348 (~0% of o
riginal) bytes for 83 keys at 0.000280MB/s.  Time: 1,272,883ms.


Actually we have switched to LeveledCompaction. Could it be that leveled compaction does not play nice with indexes?




Re: secondery indexes TTL - strange issues

Posted by aaron morton <aa...@thelastpickle.com>.
>  Date gets inserted and accessible via index query for some time. At some point in time Indexes are completely empty and start filling again (while new data enters the system).
If you can reproduce this please create a ticket on https://issues.apache.org/jira/browse/CASSANDRA . 

If you can include DEBUG level logs that would be helpful. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/09/2012, at 10:08 PM, Roland Gude <ro...@ez.no> wrote:

> I am not sure it is compacting an old file: the same thing happens eeverytime I rebuild the index. New Files appear, get compacted and vanish.
>  
> We have set up a new smaller cluster with fresh data. Same thing happens here as well. Date gets inserted and accessible via index query for some time. At some point in time Indexes are completely empty and start filling again (while new data enters the system).
>  
> I am currently testing with SizeTiered on both the fresh set and the imported set.
>  
> For the fresh set (which is significantly smaller) first results imply that the issue is not happening with SizeTieredCompaction – I have not yet tested everything that comes into my mind and will update if something new comes up.
>  
> As for the failing query it is from the cli:
> get EventsByItem where 00000003-0000-1000-0000-000000000000=utf8(‘someValue’);
> 00000003-0000-1000-0000-000000000000 is a TUUID we use as a marker for a TimeSeries.
> (and equivalent queries with astyanax and hector as well)
>  
> This is a cf with the issue:
>  
> create column family EventsByItem
>   with column_type = 'Standard'
>   and comparator = 'TimeUUIDType'
>   and default_validation_class = 'BytesType'
>   and key_validation_class = 'BytesType'
>   and read_repair_chance = 0.5
>   and dclocal_read_repair_chance = 0.0
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
>   and caching = 'NONE'
>   and column_metadata = [
>     {column_name : '00000000-0000-1000-0000-000000000000',
>     validation_class : BytesType,
>     index_name : 'ebi_mandatorIndex',
>     index_type : 0},
>     {column_name : '00000002-0000-1000-0000-000000000000',
>     validation_class : BytesType,
>     index_name : 'ebi_itemidIndex',
>     index_type : 0},
>     {column_name : '00000003-0000-1000-0000-000000000000',
>     validation_class : BytesType,
>     index_name : 'ebi_eventtypeIndex',
>     index_type : 0}]
>   and compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64};
>  
> Von: aaron morton [mailto:aaron@thelastpickle.com] 
> Gesendet: Freitag, 14. September 2012 10:46
> An: user@cassandra.apache.org
> Betreff: Re: secondery indexes TTL - strange issues
>  
> INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line
> 221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E
> ventsByItem.ebi_eventtypeIndex-he-10-Data.db,].  78,623,000 to 373,348 (~0% of o
> riginal) bytes for 83 keys at 0.000280MB/s.  Time: 1,272,883ms.
> There is a lot of weird things here. 
> It could be levelled compaction compacting an older file for the first time. But that would be a guess. 
>  
> Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again.
> Are you able to do a test with SiezedTieredCompaction ? 
>  
> Are you able to replicate the problem with a fresh testing CF and some test Data?
>  
> If it's only a problem with imported data can you provide a sample of the failing query ? Any maybe the CF definition ? 
>  
> Cheers
>  
>  
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 14/09/2012, at 2:46 AM, Roland Gude <ro...@ez.no> wrote:
> 
> 
> Hi,
>  
> we have been running a system on Cassandra 0.7 heavily relying on secondary indexes for columns with TTL.
> This has been working like a charm, but we are trying hard to move forward with Cassandra and are struggling at that point:
>  
> When we put our data into a new cluster (any 1.1.x version – currently 1.1.5) , rebuild indexes and run our system, everything seems to work good – until in some point of time index queries do not return any data at all anymore (note that the TTL has not yet expired for several months).
> Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again.
>  
> What seems strange is that compaction apparently is very aggressive:
>  
> INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line
> 221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E
> ventsByItem.ebi_eventtypeIndex-he-10-Data.db,].  78,623,000 to 373,348 (~0% of o
> riginal) bytes for 83 keys at 0.000280MB/s.  Time: 1,272,883ms.
>  
>  
> Actually we have switched to LeveledCompaction. Could it be that leveled compaction does not play nice with indexes?
>  
>  


AW: secondery indexes TTL - strange issues

Posted by Roland Gude <ro...@ez.no>.
I am not sure it is compacting an old file: the same thing happens eeverytime I rebuild the index. New Files appear, get compacted and vanish.

We have set up a new smaller cluster with fresh data. Same thing happens here as well. Date gets inserted and accessible via index query for some time. At some point in time Indexes are completely empty and start filling again (while new data enters the system).

I am currently testing with SizeTiered on both the fresh set and the imported set.

For the fresh set (which is significantly smaller) first results imply that the issue is not happening with SizeTieredCompaction - I have not yet tested everything that comes into my mind and will update if something new comes up.

As for the failing query it is from the cli:
get EventsByItem where 00000003-0000-1000-0000-000000000000=utf8('someValue');
00000003-0000-1000-0000-000000000000 is a TUUID we use as a marker for a TimeSeries.
(and equivalent queries with astyanax and hector as well)

This is a cf with the issue:

create column family EventsByItem
  with column_type = 'Standard'
  and comparator = 'TimeUUIDType'
  and default_validation_class = 'BytesType'
  and key_validation_class = 'BytesType'
  and read_repair_chance = 0.5
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
  and caching = 'NONE'
  and column_metadata = [
    {column_name : '00000000-0000-1000-0000-000000000000',
    validation_class : BytesType,
    index_name : 'ebi_mandatorIndex',
    index_type : 0},
    {column_name : '00000002-0000-1000-0000-000000000000',
    validation_class : BytesType,
    index_name : 'ebi_itemidIndex',
    index_type : 0},
    {column_name : '00000003-0000-1000-0000-000000000000',
    validation_class : BytesType,
    index_name : 'ebi_eventtypeIndex',
    index_type : 0}]
  and compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64};

Von: aaron morton [mailto:aaron@thelastpickle.com]
Gesendet: Freitag, 14. September 2012 10:46
An: user@cassandra.apache.org
Betreff: Re: secondery indexes TTL - strange issues

INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line
221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E
ventsByItem.ebi_eventtypeIndex-he-10-Data.db,].  78,623,000 to 373,348 (~0% of o
riginal) bytes for 83 keys at 0.000280MB/s.  Time: 1,272,883ms.
There is a lot of weird things here.
It could be levelled compaction compacting an older file for the first time. But that would be a guess.

Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again.
Are you able to do a test with SiezedTieredCompaction ?

Are you able to replicate the problem with a fresh testing CF and some test Data?

If it's only a problem with imported data can you provide a sample of the failing query ? Any maybe the CF definition ?

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/09/2012, at 2:46 AM, Roland Gude <ro...@ez.no>> wrote:


Hi,

we have been running a system on Cassandra 0.7 heavily relying on secondary indexes for columns with TTL.
This has been working like a charm, but we are trying hard to move forward with Cassandra and are struggling at that point:

When we put our data into a new cluster (any 1.1.x version - currently 1.1.5) , rebuild indexes and run our system, everything seems to work good - until in some point of time index queries do not return any data at all anymore (note that the TTL has not yet expired for several months).
Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again.

What seems strange is that compaction apparently is very aggressive:

INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line
221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E
ventsByItem.ebi_eventtypeIndex-he-10-Data.db,].  78,623,000 to 373,348 (~0% of o
riginal) bytes for 83 keys at 0.000280MB/s.  Time: 1,272,883ms.


Actually we have switched to LeveledCompaction. Could it be that leveled compaction does not play nice with indexes?




Re: secondery indexes TTL - strange issues

Posted by aaron morton <aa...@thelastpickle.com>.
> INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line
> 221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E
> ventsByItem.ebi_eventtypeIndex-he-10-Data.db,].  78,623,000 to 373,348 (~0% of o
> riginal) bytes for 83 keys at 0.000280MB/s.  Time: 1,272,883ms.
There is a lot of weird things here. 
It could be levelled compaction compacting an older file for the first time. But that would be a guess. 

> Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again.
Are you able to do a test with SiezedTieredCompaction ? 

Are you able to replicate the problem with a fresh testing CF and some test Data?

If it's only a problem with imported data can you provide a sample of the failing query ? Any maybe the CF definition ? 

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/09/2012, at 2:46 AM, Roland Gude <ro...@ez.no> wrote:

> Hi,
>  
> we have been running a system on Cassandra 0.7 heavily relying on secondary indexes for columns with TTL.
> This has been working like a charm, but we are trying hard to move forward with Cassandra and are struggling at that point:
>  
> When we put our data into a new cluster (any 1.1.x version – currently 1.1.5) , rebuild indexes and run our system, everything seems to work good – until in some point of time index queries do not return any data at all anymore (note that the TTL has not yet expired for several months).
> Rebuilding the index gives us back the data for a couple of minutes - then it vanishes again.
>  
> What seems strange is that compaction apparently is very aggressive:
>  
> INFO [CompactionExecutor:181] 2012-09-13 12:58:37,443 CompactionTask.java (line
> 221) Compacted to [/var/lib/cassandra/data/Eventstore/EventsByItem/Eventstore-E
> ventsByItem.ebi_eventtypeIndex-he-10-Data.db,].  78,623,000 to 373,348 (~0% of o
> riginal) bytes for 83 keys at 0.000280MB/s.  Time: 1,272,883ms.
>  
>  
> Actually we have switched to LeveledCompaction. Could it be that leveled compaction does not play nice with indexes?
>  
>