You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alexei Bakanov <ru...@gmail.com> on 2012/12/18 09:44:34 UTC

TTL on SecondaryIndex Columns. A bug?

Hi,

We are having an issue with TTL on Secondary index columns. We get 0
rows in return when running queries on indexed columns that have TTL.
Everything works fine with small amounts of data, but when we get over
a ceratin threshold it looks like older rows dissapear from the index.
In the example below we create 70 rows with 45k columns each + one
indexed column with just the rowkey as value, so we have one row per
indexed value. When the script is finished the index contains rows
66-69. Rows 0-65 are gone from the index.
Using 'indexedColumn' without TTL fixes the problem.


------------- SCHEMA START -----------------
create keyspace ks123
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = {datacenter1 : 1}
  and durable_writes = true;

use ks123;

create column family cf1
  with column_type = 'Standard'
  and comparator = 'AsciiType'
  and default_validation_class = 'AsciiType'
  and key_validation_class = 'AsciiType'
  and read_repair_chance = 0.1
  and dclocal_read_repair_chance = 0.0
  and gc_grace = 864000
  and min_compaction_threshold = 4
  and max_compaction_threshold = 32
  and replicate_on_write = true
  and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
  and caching = 'KEYS_ONLY'
  and column_metadata = [
    {column_name : 'indexedColumn',
    validation_class : AsciiType,
    index_name : 'INDEX1',
    index_type : 0}]
  and compression_options = {'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'};
------------- SCHEMA FINISH -----------------

------------- POPULATE START -----------------
from pycassa.batch import Mutator
import pycassa

pool = pycassa.ConnectionPool('ks123')
cf = pycassa.ColumnFamily(pool, 'cf1')

for rowKey in xrange(70):
    b = Mutator(pool)
    for datapoint in xrange(1, 45001):
        b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=7884000);
    b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=7887600);
    print 'row %d' % rowKey
    b.send()
    b = Mutator(pool)

pool.dispose()
------------- POPULATE FINISH -----------------

------------- QUERY START -----------------
[default@ks123] get cf1 where 'indexedColumn'='65';

0 Row Returned.
Elapsed time: 2.38 msec(s).

[default@ks123] get cf1 where 'indexedColumn'='66';
-------------------
RowKey: 66
=> (column=1, value=val, timestamp=1355818765548964, ttl=7884000)
...
=> (column=10087, value=val, timestamp=1355818766075538, ttl=7884000)
=> (column=indexedColumn, value=66, timestamp=1355818768119334, ttl=7887600)

1 Row Returned.
Elapsed time: 31 msec(s).
------------- QUERY FINISH -----------------

This is all using Cassandra 1.1.7 with default settings.

Best regards,

Alexei Bakanov

Re: TTL on SecondaryIndex Columns. A bug?

Posted by Alexei Bakanov <ru...@gmail.com>.

Great stuff, Aaron. Thanks for your time


On 20 December 2012 05:10, aaron morton <aa...@thelastpickle.com> wrote:
> Well that was fun https://issues.apache.org/jira/browse/CASSANDRA-5079
>
> Just testing my idea of a fix now.
>
> Cheers
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 20/12/2012, at 10:33 AM, aaron morton <aa...@thelastpickle.com> wrote:
>
> Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M
>
> Done and I now get your repo case…
>
> [default@ks123] get cf1 where 'indexedColumn'='65';
>
> 0 Row Returned.
> Elapsed time: 1.44 msec(s).
>
>
> [default@ks123] get cf1 where 'indexedColumn'='66';
> -------------------
> RowKey: 66
> => (column=1, value=val, timestamp=1355952222439049, ttl=7884000)
> => (column=10, value=val, timestamp=1355952222439269, ttl=7884000)
> ...
> => (column=indexedColumn, value=66, timestamp=1355952223881937, ttl=7887600)
>
> Looking into it now.
>
> Thanks
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/12/2012, at 9:56 PM, Roland Gude <ro...@ez.no> wrote:
>
> I think this might be https://issues.apache.org/jira/browse/CASSANDRA-4670
> Unfortunately apart from me no one was yet able to reproduce.
>
> Check if data is available before/after compaction
> If you have leveled compaction it is hard to test because you cannot trigger
> compaction manually.
>
> -----Ursprüngliche Nachricht-----
> Von: Alexei Bakanov [mailto:russisk@gmail.com]
> Gesendet: Mittwoch, 19. Dezember 2012 09:35
> An: user@cassandra.apache.org
> Betreff: Re: TTL on SecondaryIndex Columns. A bug?
>
> I'm running on a single node on my laptop.
> It looks like the point when rows dissapear from the index depends on JVM
> memory settings. With more memory it needs more data to feed in before
> things start disappearing.
> Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M
>
> To be sure, try to get rows for 'indexedColumn'='1':
>
> [default@ks123] get cf1 where 'indexedColumn'='1';
>
> 0 Row Returned.
>
> Thanks
>
>
> On 19 December 2012 05:15, aaron morton <aa...@thelastpickle.com> wrote:
>
> Thanks for the nice steps to reproduce.
>
> I ran this on my MBP using C* 1.1.7 and got the expected results, both
> get's returned a row.
>
> Were you running against a single node or a cluster ? If a cluster did
> you change the CL, cassandra-cli defaults to ONE.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/12/2012, at 9:44 PM, Alexei Bakanov <ru...@gmail.com> wrote:
>
> Hi,
>
> We are having an issue with TTL on Secondary index columns. We get 0
> rows in return when running queries on indexed columns that have TTL.
> Everything works fine with small amounts of data, but when we get over
> a ceratin threshold it looks like older rows dissapear from the index.
> In the example below we create 70 rows with 45k columns each + one
> indexed column with just the rowkey as value, so we have one row per
> indexed value. When the script is finished the index contains rows
> 66-69. Rows 0-65 are gone from the index.
> Using 'indexedColumn' without TTL fixes the problem.
>
>
> ------------- SCHEMA START ----------------- create keyspace ks123
> with placement_strategy = 'NetworkTopologyStrategy'
> and strategy_options = {datacenter1 : 1}  and durable_writes = true;
>
> use ks123;
>
> create column family cf1
> with column_type = 'Standard'
> and comparator = 'AsciiType'
> and default_validation_class = 'AsciiType'
> and key_validation_class = 'AsciiType'
> and read_repair_chance = 0.1
> and dclocal_read_repair_chance = 0.0
> and gc_grace = 864000
> and min_compaction_threshold = 4
> and max_compaction_threshold = 32
> and replicate_on_write = true
> and compaction_strategy =
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
> and caching = 'KEYS_ONLY'
> and column_metadata = [
>   {column_name : 'indexedColumn',
>   validation_class : AsciiType,
>   index_name : 'INDEX1',
>   index_type : 0}]
> and compression_options = {'sstable_compression' :
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
> ------------- SCHEMA FINISH -----------------
>
> ------------- POPULATE START ----------------- from pycassa.batch
> import Mutator import pycassa
>
> pool = pycassa.ConnectionPool('ks123') cf = pycassa.ColumnFamily(pool,
> 'cf1')
>
> for rowKey in xrange(70):
>   b = Mutator(pool)
>   for datapoint in xrange(1, 45001):
>       b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=7884000);
>   b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=7887600);
>   print 'row %d' % rowKey
>   b.send()
>   b = Mutator(pool)
>
> pool.dispose()
> ------------- POPULATE FINISH -----------------
>
> ------------- QUERY START ----------------- [default@ks123] get cf1
> where 'indexedColumn'='65';
>
> 0 Row Returned.
> Elapsed time: 2.38 msec(s).
>
> [default@ks123] get cf1 where 'indexedColumn'='66';
> -------------------
> RowKey: 66
> => (column=1, value=val, timestamp=1355818765548964, ttl=7884000) ...
> => (column=10087, value=val, timestamp=1355818766075538, ttl=7884000)
> => (column=indexedColumn, value=66, timestamp=1355818768119334,
> ttl=7887600)
>
> 1 Row Returned.
> Elapsed time: 31 msec(s).
> ------------- QUERY FINISH -----------------
>
> This is all using Cassandra 1.1.7 with default settings.
>
> Best regards,
>
> Alexei Bakanov
>
>
>
>
>
>

Re: TTL on SecondaryIndex Columns. A bug?

Posted by cs...@orange.com.

Nice job Aaron,

AFAIU now you set the gc_before to the current time for secondary indexes. And as it was set to Integer.MAX_VALUE before your patch, removeDeletedStandard function was testing if (column.getLocalDeletiontime() < MAX_VALUE) which is always true and so was removing all rows from the secondary index. Am I right ?

--
Cyril SCETBON

On Dec 20, 2012, at 9:28 PM, aaron morton <aa...@thelastpickle.com>> wrote:

Yes, but they will get compacted away again unless the patch is there.

it's a small patch so you should be able to apply it easily enough if you need a fix ASAP.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com<http://www.thelastpickle.com/>

On 20/12/2012, at 5:27 PM, B. Todd Burruss <bt...@gmail.com>> wrote:

i believe we have hit this as well.  if you use nodetool to
rebuild_index, does it work?

On Wed, Dec 19, 2012 at 8:10 PM, aaron morton <aa...@thelastpickle.com>> wrote:
Well that was fun https://issues.apache.org/jira/browse/CASSANDRA-5079

Just testing my idea of a fix now.

Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com<http://www.thelastpickle.com/>

On 20/12/2012, at 10:33 AM, aaron morton <aa...@thelastpickle.com>> wrote:

Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M

Done and I now get your repo case…

[default@ks123] get cf1 where 'indexedColumn'='65';

0 Row Returned.
Elapsed time: 1.44 msec(s).


[default@ks123] get cf1 where 'indexedColumn'='66';
-------------------
RowKey: 66
=> (column=1, value=val, timestamp=1355952222439049, ttl=7884000)
=> (column=10, value=val, timestamp=1355952222439269, ttl=7884000)
...
=> (column=indexedColumn, value=66, timestamp=1355952223881937, ttl=7887600)

Looking into it now.

Thanks

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/12/2012, at 9:56 PM, Roland Gude <ro...@ez.no> wrote:

I think this might be https://issues.apache.org/jira/browse/CASSANDRA-4670
Unfortunately apart from me no one was yet able to reproduce.

Check if data is available before/after compaction
If you have leveled compaction it is hard to test because you cannot trigger
compaction manually.

-----Ursprüngliche Nachricht-----
Von: Alexei Bakanov [mailto:russisk@gmail.com]
Gesendet: Mittwoch, 19. Dezember 2012 09:35
An: user@cassandra.apache.org
Betreff: Re: TTL on SecondaryIndex Columns. A bug?

I'm running on a single node on my laptop.
It looks like the point when rows dissapear from the index depends on JVM
memory settings. With more memory it needs more data to feed in before
things start disappearing.
Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M

To be sure, try to get rows for 'indexedColumn'='1':

[default@ks123] get cf1 where 'indexedColumn'='1';

0 Row Returned.

Thanks


On 19 December 2012 05:15, aaron morton <aa...@thelastpickle.com> wrote:

Thanks for the nice steps to reproduce.

I ran this on my MBP using C* 1.1.7 and got the expected results, both
get's returned a row.

Were you running against a single node or a cluster ? If a cluster did
you change the CL, cassandra-cli defaults to ONE.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/12/2012, at 9:44 PM, Alexei Bakanov <ru...@gmail.com> wrote:

Hi,

We are having an issue with TTL on Secondary index columns. We get 0
rows in return when running queries on indexed columns that have TTL.
Everything works fine with small amounts of data, but when we get over
a ceratin threshold it looks like older rows dissapear from the index.
In the example below we create 70 rows with 45k columns each + one
indexed column with just the rowkey as value, so we have one row per
indexed value. When the script is finished the index contains rows
66-69. Rows 0-65 are gone from the index.
Using 'indexedColumn' without TTL fixes the problem.


------------- SCHEMA START ----------------- create keyspace ks123
with placement_strategy = 'NetworkTopologyStrategy'
and strategy_options = {datacenter1 : 1}  and durable_writes = true;

use ks123;

create column family cf1
with column_type = 'Standard'
and comparator = 'AsciiType'
and default_validation_class = 'AsciiType'
and key_validation_class = 'AsciiType'
and read_repair_chance = 0.1
and dclocal_read_repair_chance = 0.0
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = true
and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'KEYS_ONLY'
and column_metadata = [
 {column_name : 'indexedColumn',
 validation_class : AsciiType,
 index_name : 'INDEX1',
 index_type : 0}]
and compression_options = {'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'};
------------- SCHEMA FINISH -----------------

------------- POPULATE START ----------------- from pycassa.batch
import Mutator import pycassa

pool = pycassa.ConnectionPool('ks123') cf = pycassa.ColumnFamily(pool,
'cf1')

for rowKey in xrange(70):
 b = Mutator(pool)
 for datapoint in xrange(1, 45001):
     b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=7884000);
 b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=7887600);
 print 'row %d' % rowKey
 b.send()
 b = Mutator(pool)

pool.dispose()
------------- POPULATE FINISH -----------------

------------- QUERY START ----------------- [default@ks123] get cf1
where 'indexedColumn'='65';

0 Row Returned.
Elapsed time: 2.38 msec(s).

[default@ks123] get cf1 where 'indexedColumn'='66';
-------------------
RowKey: 66
=> (column=1, value=val, timestamp=1355818765548964, ttl=7884000) ...
=> (column=10087, value=val, timestamp=1355818766075538, ttl=7884000)
=> (column=indexedColumn, value=66, timestamp=1355818768119334,
ttl=7887600)

1 Row Returned.
Elapsed time: 31 msec(s).
------------- QUERY FINISH -----------------

This is all using Cassandra 1.1.7 with default settings.

Best regards,

Alexei Bakanov









_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

Re: TTL on SecondaryIndex Columns. A bug?

Posted by aaron morton <aa...@thelastpickle.com>.

Yes, but they will get compacted away again unless the patch is there. 

it's a small patch so you should be able to apply it easily enough if you need a fix ASAP. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/12/2012, at 5:27 PM, B. Todd Burruss <bt...@gmail.com> wrote:

> i believe we have hit this as well.  if you use nodetool to
> rebuild_index, does it work?
> 
> On Wed, Dec 19, 2012 at 8:10 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> Well that was fun https://issues.apache.org/jira/browse/CASSANDRA-5079
>> 
>> Just testing my idea of a fix now.
>> 
>> Cheers
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 20/12/2012, at 10:33 AM, aaron morton <aa...@thelastpickle.com> wrote:
>> 
>> Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M
>> 
>> Done and I now get your repo case…
>> 
>> [default@ks123] get cf1 where 'indexedColumn'='65';
>> 
>> 0 Row Returned.
>> Elapsed time: 1.44 msec(s).
>> 
>> 
>> [default@ks123] get cf1 where 'indexedColumn'='66';
>> -------------------
>> RowKey: 66
>> => (column=1, value=val, timestamp=1355952222439049, ttl=7884000)
>> => (column=10, value=val, timestamp=1355952222439269, ttl=7884000)
>> ...
>> => (column=indexedColumn, value=66, timestamp=1355952223881937, ttl=7887600)
>> 
>> Looking into it now.
>> 
>> Thanks
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 19/12/2012, at 9:56 PM, Roland Gude <ro...@ez.no> wrote:
>> 
>> I think this might be https://issues.apache.org/jira/browse/CASSANDRA-4670
>> Unfortunately apart from me no one was yet able to reproduce.
>> 
>> Check if data is available before/after compaction
>> If you have leveled compaction it is hard to test because you cannot trigger
>> compaction manually.
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Alexei Bakanov [mailto:russisk@gmail.com]
>> Gesendet: Mittwoch, 19. Dezember 2012 09:35
>> An: user@cassandra.apache.org
>> Betreff: Re: TTL on SecondaryIndex Columns. A bug?
>> 
>> I'm running on a single node on my laptop.
>> It looks like the point when rows dissapear from the index depends on JVM
>> memory settings. With more memory it needs more data to feed in before
>> things start disappearing.
>> Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M
>> 
>> To be sure, try to get rows for 'indexedColumn'='1':
>> 
>> [default@ks123] get cf1 where 'indexedColumn'='1';
>> 
>> 0 Row Returned.
>> 
>> Thanks
>> 
>> 
>> On 19 December 2012 05:15, aaron morton <aa...@thelastpickle.com> wrote:
>> 
>> Thanks for the nice steps to reproduce.
>> 
>> I ran this on my MBP using C* 1.1.7 and got the expected results, both
>> get's returned a row.
>> 
>> Were you running against a single node or a cluster ? If a cluster did
>> you change the CL, cassandra-cli defaults to ONE.
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 18/12/2012, at 9:44 PM, Alexei Bakanov <ru...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> We are having an issue with TTL on Secondary index columns. We get 0
>> rows in return when running queries on indexed columns that have TTL.
>> Everything works fine with small amounts of data, but when we get over
>> a ceratin threshold it looks like older rows dissapear from the index.
>> In the example below we create 70 rows with 45k columns each + one
>> indexed column with just the rowkey as value, so we have one row per
>> indexed value. When the script is finished the index contains rows
>> 66-69. Rows 0-65 are gone from the index.
>> Using 'indexedColumn' without TTL fixes the problem.
>> 
>> 
>> ------------- SCHEMA START ----------------- create keyspace ks123
>> with placement_strategy = 'NetworkTopologyStrategy'
>> and strategy_options = {datacenter1 : 1}  and durable_writes = true;
>> 
>> use ks123;
>> 
>> create column family cf1
>> with column_type = 'Standard'
>> and comparator = 'AsciiType'
>> and default_validation_class = 'AsciiType'
>> and key_validation_class = 'AsciiType'
>> and read_repair_chance = 0.1
>> and dclocal_read_repair_chance = 0.0
>> and gc_grace = 864000
>> and min_compaction_threshold = 4
>> and max_compaction_threshold = 32
>> and replicate_on_write = true
>> and compaction_strategy =
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>> and caching = 'KEYS_ONLY'
>> and column_metadata = [
>>  {column_name : 'indexedColumn',
>>  validation_class : AsciiType,
>>  index_name : 'INDEX1',
>>  index_type : 0}]
>> and compression_options = {'sstable_compression' :
>> 'org.apache.cassandra.io.compress.SnappyCompressor'};
>> ------------- SCHEMA FINISH -----------------
>> 
>> ------------- POPULATE START ----------------- from pycassa.batch
>> import Mutator import pycassa
>> 
>> pool = pycassa.ConnectionPool('ks123') cf = pycassa.ColumnFamily(pool,
>> 'cf1')
>> 
>> for rowKey in xrange(70):
>>  b = Mutator(pool)
>>  for datapoint in xrange(1, 45001):
>>      b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=7884000);
>>  b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=7887600);
>>  print 'row %d' % rowKey
>>  b.send()
>>  b = Mutator(pool)
>> 
>> pool.dispose()
>> ------------- POPULATE FINISH -----------------
>> 
>> ------------- QUERY START ----------------- [default@ks123] get cf1
>> where 'indexedColumn'='65';
>> 
>> 0 Row Returned.
>> Elapsed time: 2.38 msec(s).
>> 
>> [default@ks123] get cf1 where 'indexedColumn'='66';
>> -------------------
>> RowKey: 66
>> => (column=1, value=val, timestamp=1355818765548964, ttl=7884000) ...
>> => (column=10087, value=val, timestamp=1355818766075538, ttl=7884000)
>> => (column=indexedColumn, value=66, timestamp=1355818768119334,
>> ttl=7887600)
>> 
>> 1 Row Returned.
>> Elapsed time: 31 msec(s).
>> ------------- QUERY FINISH -----------------
>> 
>> This is all using Cassandra 1.1.7 with default settings.
>> 
>> Best regards,
>> 
>> Alexei Bakanov
>> 
>> 
>> 
>> 
>> 
>>

Re: TTL on SecondaryIndex Columns. A bug?

Posted by "B. Todd Burruss" <bt...@gmail.com>.

i believe we have hit this as well.  if you use nodetool to
rebuild_index, does it work?

On Wed, Dec 19, 2012 at 8:10 PM, aaron morton <aa...@thelastpickle.com> wrote:
> Well that was fun https://issues.apache.org/jira/browse/CASSANDRA-5079
>
> Just testing my idea of a fix now.
>
> Cheers
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 20/12/2012, at 10:33 AM, aaron morton <aa...@thelastpickle.com> wrote:
>
> Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M
>
> Done and I now get your repo case…
>
> [default@ks123] get cf1 where 'indexedColumn'='65';
>
> 0 Row Returned.
> Elapsed time: 1.44 msec(s).
>
>
> [default@ks123] get cf1 where 'indexedColumn'='66';
> -------------------
> RowKey: 66
> => (column=1, value=val, timestamp=1355952222439049, ttl=7884000)
> => (column=10, value=val, timestamp=1355952222439269, ttl=7884000)
> ...
> => (column=indexedColumn, value=66, timestamp=1355952223881937, ttl=7887600)
>
> Looking into it now.
>
> Thanks
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/12/2012, at 9:56 PM, Roland Gude <ro...@ez.no> wrote:
>
> I think this might be https://issues.apache.org/jira/browse/CASSANDRA-4670
> Unfortunately apart from me no one was yet able to reproduce.
>
> Check if data is available before/after compaction
> If you have leveled compaction it is hard to test because you cannot trigger
> compaction manually.
>
> -----Ursprüngliche Nachricht-----
> Von: Alexei Bakanov [mailto:russisk@gmail.com]
> Gesendet: Mittwoch, 19. Dezember 2012 09:35
> An: user@cassandra.apache.org
> Betreff: Re: TTL on SecondaryIndex Columns. A bug?
>
> I'm running on a single node on my laptop.
> It looks like the point when rows dissapear from the index depends on JVM
> memory settings. With more memory it needs more data to feed in before
> things start disappearing.
> Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M
>
> To be sure, try to get rows for 'indexedColumn'='1':
>
> [default@ks123] get cf1 where 'indexedColumn'='1';
>
> 0 Row Returned.
>
> Thanks
>
>
> On 19 December 2012 05:15, aaron morton <aa...@thelastpickle.com> wrote:
>
> Thanks for the nice steps to reproduce.
>
> I ran this on my MBP using C* 1.1.7 and got the expected results, both
> get's returned a row.
>
> Were you running against a single node or a cluster ? If a cluster did
> you change the CL, cassandra-cli defaults to ONE.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/12/2012, at 9:44 PM, Alexei Bakanov <ru...@gmail.com> wrote:
>
> Hi,
>
> We are having an issue with TTL on Secondary index columns. We get 0
> rows in return when running queries on indexed columns that have TTL.
> Everything works fine with small amounts of data, but when we get over
> a ceratin threshold it looks like older rows dissapear from the index.
> In the example below we create 70 rows with 45k columns each + one
> indexed column with just the rowkey as value, so we have one row per
> indexed value. When the script is finished the index contains rows
> 66-69. Rows 0-65 are gone from the index.
> Using 'indexedColumn' without TTL fixes the problem.
>
>
> ------------- SCHEMA START ----------------- create keyspace ks123
> with placement_strategy = 'NetworkTopologyStrategy'
> and strategy_options = {datacenter1 : 1}  and durable_writes = true;
>
> use ks123;
>
> create column family cf1
> with column_type = 'Standard'
> and comparator = 'AsciiType'
> and default_validation_class = 'AsciiType'
> and key_validation_class = 'AsciiType'
> and read_repair_chance = 0.1
> and dclocal_read_repair_chance = 0.0
> and gc_grace = 864000
> and min_compaction_threshold = 4
> and max_compaction_threshold = 32
> and replicate_on_write = true
> and compaction_strategy =
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
> and caching = 'KEYS_ONLY'
> and column_metadata = [
>   {column_name : 'indexedColumn',
>   validation_class : AsciiType,
>   index_name : 'INDEX1',
>   index_type : 0}]
> and compression_options = {'sstable_compression' :
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
> ------------- SCHEMA FINISH -----------------
>
> ------------- POPULATE START ----------------- from pycassa.batch
> import Mutator import pycassa
>
> pool = pycassa.ConnectionPool('ks123') cf = pycassa.ColumnFamily(pool,
> 'cf1')
>
> for rowKey in xrange(70):
>   b = Mutator(pool)
>   for datapoint in xrange(1, 45001):
>       b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=7884000);
>   b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=7887600);
>   print 'row %d' % rowKey
>   b.send()
>   b = Mutator(pool)
>
> pool.dispose()
> ------------- POPULATE FINISH -----------------
>
> ------------- QUERY START ----------------- [default@ks123] get cf1
> where 'indexedColumn'='65';
>
> 0 Row Returned.
> Elapsed time: 2.38 msec(s).
>
> [default@ks123] get cf1 where 'indexedColumn'='66';
> -------------------
> RowKey: 66
> => (column=1, value=val, timestamp=1355818765548964, ttl=7884000) ...
> => (column=10087, value=val, timestamp=1355818766075538, ttl=7884000)
> => (column=indexedColumn, value=66, timestamp=1355818768119334,
> ttl=7887600)
>
> 1 Row Returned.
> Elapsed time: 31 msec(s).
> ------------- QUERY FINISH -----------------
>
> This is all using Cassandra 1.1.7 with default settings.
>
> Best regards,
>
> Alexei Bakanov
>
>
>
>
>
>

Re: TTL on SecondaryIndex Columns. A bug?

Posted by aaron morton <aa...@thelastpickle.com>.

Well that was fun https://issues.apache.org/jira/browse/CASSANDRA-5079

Just testing my idea of a fix now.
Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 20/12/2012, at 10:33 AM, aaron morton <aa...@thelastpickle.com> wrote:

>> Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M
> Done and I now get your repo case…
> 
> [default@ks123] get cf1 where 'indexedColumn'='65';
> 
> 0 Row Returned.
> Elapsed time: 1.44 msec(s).
> 
> 
> [default@ks123] get cf1 where 'indexedColumn'='66';
> -------------------
> RowKey: 66
> => (column=1, value=val, timestamp=1355952222439049, ttl=7884000)
> => (column=10, value=val, timestamp=1355952222439269, ttl=7884000)
> ...
> => (column=indexedColumn, value=66, timestamp=1355952223881937, ttl=7887600)
> 
> Looking into it now. 
> 
> Thanks
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 19/12/2012, at 9:56 PM, Roland Gude <ro...@ez.no> wrote:
> 
>> I think this might be https://issues.apache.org/jira/browse/CASSANDRA-4670
>> Unfortunately apart from me no one was yet able to reproduce.
>> 
>> Check if data is available before/after compaction
>> If you have leveled compaction it is hard to test because you cannot trigger compaction manually.
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: Alexei Bakanov [mailto:russisk@gmail.com] 
>> Gesendet: Mittwoch, 19. Dezember 2012 09:35
>> An: user@cassandra.apache.org
>> Betreff: Re: TTL on SecondaryIndex Columns. A bug?
>> 
>> I'm running on a single node on my laptop.
>> It looks like the point when rows dissapear from the index depends on JVM memory settings. With more memory it needs more data to feed in before things start disappearing.
>> Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M
>> 
>> To be sure, try to get rows for 'indexedColumn'='1':
>> 
>> [default@ks123] get cf1 where 'indexedColumn'='1';
>> 
>> 0 Row Returned.
>> 
>> Thanks
>> 
>> 
>> On 19 December 2012 05:15, aaron morton <aa...@thelastpickle.com> wrote:
>>> Thanks for the nice steps to reproduce.
>>> 
>>> I ran this on my MBP using C* 1.1.7 and got the expected results, both 
>>> get's returned a row.
>>> 
>>> Were you running against a single node or a cluster ? If a cluster did 
>>> you change the CL, cassandra-cli defaults to ONE.
>>> 
>>> Cheers
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> New Zealand
>>> 
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 18/12/2012, at 9:44 PM, Alexei Bakanov <ru...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> We are having an issue with TTL on Secondary index columns. We get 0 
>>> rows in return when running queries on indexed columns that have TTL.
>>> Everything works fine with small amounts of data, but when we get over 
>>> a ceratin threshold it looks like older rows dissapear from the index.
>>> In the example below we create 70 rows with 45k columns each + one 
>>> indexed column with just the rowkey as value, so we have one row per 
>>> indexed value. When the script is finished the index contains rows 
>>> 66-69. Rows 0-65 are gone from the index.
>>> Using 'indexedColumn' without TTL fixes the problem.
>>> 
>>> 
>>> ------------- SCHEMA START ----------------- create keyspace ks123  
>>> with placement_strategy = 'NetworkTopologyStrategy'
>>> and strategy_options = {datacenter1 : 1}  and durable_writes = true;
>>> 
>>> use ks123;
>>> 
>>> create column family cf1
>>> with column_type = 'Standard'
>>> and comparator = 'AsciiType'
>>> and default_validation_class = 'AsciiType'
>>> and key_validation_class = 'AsciiType'
>>> and read_repair_chance = 0.1
>>> and dclocal_read_repair_chance = 0.0
>>> and gc_grace = 864000
>>> and min_compaction_threshold = 4
>>> and max_compaction_threshold = 32
>>> and replicate_on_write = true
>>> and compaction_strategy =
>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>>> and caching = 'KEYS_ONLY'
>>> and column_metadata = [
>>>   {column_name : 'indexedColumn',
>>>   validation_class : AsciiType,
>>>   index_name : 'INDEX1',
>>>   index_type : 0}]
>>> and compression_options = {'sstable_compression' :
>>> 'org.apache.cassandra.io.compress.SnappyCompressor'};
>>> ------------- SCHEMA FINISH -----------------
>>> 
>>> ------------- POPULATE START ----------------- from pycassa.batch 
>>> import Mutator import pycassa
>>> 
>>> pool = pycassa.ConnectionPool('ks123') cf = pycassa.ColumnFamily(pool, 
>>> 'cf1')
>>> 
>>> for rowKey in xrange(70):
>>>   b = Mutator(pool)
>>>   for datapoint in xrange(1, 45001):
>>>       b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=7884000);
>>>   b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=7887600);
>>>   print 'row %d' % rowKey
>>>   b.send()
>>>   b = Mutator(pool)
>>> 
>>> pool.dispose()
>>> ------------- POPULATE FINISH -----------------
>>> 
>>> ------------- QUERY START ----------------- [default@ks123] get cf1 
>>> where 'indexedColumn'='65';
>>> 
>>> 0 Row Returned.
>>> Elapsed time: 2.38 msec(s).
>>> 
>>> [default@ks123] get cf1 where 'indexedColumn'='66';
>>> -------------------
>>> RowKey: 66
>>> => (column=1, value=val, timestamp=1355818765548964, ttl=7884000) ...
>>> => (column=10087, value=val, timestamp=1355818766075538, ttl=7884000) 
>>> => (column=indexedColumn, value=66, timestamp=1355818768119334, 
>>> ttl=7887600)
>>> 
>>> 1 Row Returned.
>>> Elapsed time: 31 msec(s).
>>> ------------- QUERY FINISH -----------------
>>> 
>>> This is all using Cassandra 1.1.7 with default settings.
>>> 
>>> Best regards,
>>> 
>>> Alexei Bakanov
>>> 
>>> 
>> 
>> 
>

Re: TTL on SecondaryIndex Columns. A bug?

Posted by aaron morton <aa...@thelastpickle.com>.

> Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M
Done and I now get your repo case…

[default@ks123] get cf1 where 'indexedColumn'='65';

0 Row Returned.
Elapsed time: 1.44 msec(s).


[default@ks123] get cf1 where 'indexedColumn'='66';
-------------------
RowKey: 66
=> (column=1, value=val, timestamp=1355952222439049, ttl=7884000)
=> (column=10, value=val, timestamp=1355952222439269, ttl=7884000)
...
=> (column=indexedColumn, value=66, timestamp=1355952223881937, ttl=7887600)

Looking into it now. 

Thanks

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/12/2012, at 9:56 PM, Roland Gude <ro...@ez.no> wrote:

> I think this might be https://issues.apache.org/jira/browse/CASSANDRA-4670
> Unfortunately apart from me no one was yet able to reproduce.
> 
> Check if data is available before/after compaction
> If you have leveled compaction it is hard to test because you cannot trigger compaction manually.
> 
> -----Ursprüngliche Nachricht-----
> Von: Alexei Bakanov [mailto:russisk@gmail.com] 
> Gesendet: Mittwoch, 19. Dezember 2012 09:35
> An: user@cassandra.apache.org
> Betreff: Re: TTL on SecondaryIndex Columns. A bug?
> 
> I'm running on a single node on my laptop.
> It looks like the point when rows dissapear from the index depends on JVM memory settings. With more memory it needs more data to feed in before things start disappearing.
> Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M
> 
> To be sure, try to get rows for 'indexedColumn'='1':
> 
> [default@ks123] get cf1 where 'indexedColumn'='1';
> 
> 0 Row Returned.
> 
> Thanks
> 
> 
> On 19 December 2012 05:15, aaron morton <aa...@thelastpickle.com> wrote:
>> Thanks for the nice steps to reproduce.
>> 
>> I ran this on my MBP using C* 1.1.7 and got the expected results, both 
>> get's returned a row.
>> 
>> Were you running against a single node or a cluster ? If a cluster did 
>> you change the CL, cassandra-cli defaults to ONE.
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 18/12/2012, at 9:44 PM, Alexei Bakanov <ru...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> We are having an issue with TTL on Secondary index columns. We get 0 
>> rows in return when running queries on indexed columns that have TTL.
>> Everything works fine with small amounts of data, but when we get over 
>> a ceratin threshold it looks like older rows dissapear from the index.
>> In the example below we create 70 rows with 45k columns each + one 
>> indexed column with just the rowkey as value, so we have one row per 
>> indexed value. When the script is finished the index contains rows 
>> 66-69. Rows 0-65 are gone from the index.
>> Using 'indexedColumn' without TTL fixes the problem.
>> 
>> 
>> ------------- SCHEMA START ----------------- create keyspace ks123  
>> with placement_strategy = 'NetworkTopologyStrategy'
>> and strategy_options = {datacenter1 : 1}  and durable_writes = true;
>> 
>> use ks123;
>> 
>> create column family cf1
>> with column_type = 'Standard'
>> and comparator = 'AsciiType'
>> and default_validation_class = 'AsciiType'
>> and key_validation_class = 'AsciiType'
>> and read_repair_chance = 0.1
>> and dclocal_read_repair_chance = 0.0
>> and gc_grace = 864000
>> and min_compaction_threshold = 4
>> and max_compaction_threshold = 32
>> and replicate_on_write = true
>> and compaction_strategy =
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>> and caching = 'KEYS_ONLY'
>> and column_metadata = [
>>   {column_name : 'indexedColumn',
>>   validation_class : AsciiType,
>>   index_name : 'INDEX1',
>>   index_type : 0}]
>> and compression_options = {'sstable_compression' :
>> 'org.apache.cassandra.io.compress.SnappyCompressor'};
>> ------------- SCHEMA FINISH -----------------
>> 
>> ------------- POPULATE START ----------------- from pycassa.batch 
>> import Mutator import pycassa
>> 
>> pool = pycassa.ConnectionPool('ks123') cf = pycassa.ColumnFamily(pool, 
>> 'cf1')
>> 
>> for rowKey in xrange(70):
>>   b = Mutator(pool)
>>   for datapoint in xrange(1, 45001):
>>       b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=7884000);
>>   b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=7887600);
>>   print 'row %d' % rowKey
>>   b.send()
>>   b = Mutator(pool)
>> 
>> pool.dispose()
>> ------------- POPULATE FINISH -----------------
>> 
>> ------------- QUERY START ----------------- [default@ks123] get cf1 
>> where 'indexedColumn'='65';
>> 
>> 0 Row Returned.
>> Elapsed time: 2.38 msec(s).
>> 
>> [default@ks123] get cf1 where 'indexedColumn'='66';
>> -------------------
>> RowKey: 66
>> => (column=1, value=val, timestamp=1355818765548964, ttl=7884000) ...
>> => (column=10087, value=val, timestamp=1355818766075538, ttl=7884000) 
>> => (column=indexedColumn, value=66, timestamp=1355818768119334, 
>> ttl=7887600)
>> 
>> 1 Row Returned.
>> Elapsed time: 31 msec(s).
>> ------------- QUERY FINISH -----------------
>> 
>> This is all using Cassandra 1.1.7 with default settings.
>> 
>> Best regards,
>> 
>> Alexei Bakanov
>> 
>> 
> 
>

AW: TTL on SecondaryIndex Columns. A bug?

Posted by Roland Gude <ro...@ez.no>.

I think this might be https://issues.apache.org/jira/browse/CASSANDRA-4670
Unfortunately apart from me no one was yet able to reproduce.

Check if data is available before/after compaction
If you have leveled compaction it is hard to test because you cannot trigger compaction manually.

-----Ursprüngliche Nachricht-----
Von: Alexei Bakanov [mailto:russisk@gmail.com] 
Gesendet: Mittwoch, 19. Dezember 2012 09:35
An: user@cassandra.apache.org
Betreff: Re: TTL on SecondaryIndex Columns. A bug?

I'm running on a single node on my laptop.
It looks like the point when rows dissapear from the index depends on JVM memory settings. With more memory it needs more data to feed in before things start disappearing.
Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M

To be sure, try to get rows for 'indexedColumn'='1':

[default@ks123] get cf1 where 'indexedColumn'='1';

0 Row Returned.

Thanks


On 19 December 2012 05:15, aaron morton <aa...@thelastpickle.com> wrote:
> Thanks for the nice steps to reproduce.
>
> I ran this on my MBP using C* 1.1.7 and got the expected results, both 
> get's returned a row.
>
> Were you running against a single node or a cluster ? If a cluster did 
> you change the CL, cassandra-cli defaults to ONE.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/12/2012, at 9:44 PM, Alexei Bakanov <ru...@gmail.com> wrote:
>
> Hi,
>
> We are having an issue with TTL on Secondary index columns. We get 0 
> rows in return when running queries on indexed columns that have TTL.
> Everything works fine with small amounts of data, but when we get over 
> a ceratin threshold it looks like older rows dissapear from the index.
> In the example below we create 70 rows with 45k columns each + one 
> indexed column with just the rowkey as value, so we have one row per 
> indexed value. When the script is finished the index contains rows 
> 66-69. Rows 0-65 are gone from the index.
> Using 'indexedColumn' without TTL fixes the problem.
>
>
> ------------- SCHEMA START ----------------- create keyspace ks123  
> with placement_strategy = 'NetworkTopologyStrategy'
>  and strategy_options = {datacenter1 : 1}  and durable_writes = true;
>
> use ks123;
>
> create column family cf1
>  with column_type = 'Standard'
>  and comparator = 'AsciiType'
>  and default_validation_class = 'AsciiType'
>  and key_validation_class = 'AsciiType'
>  and read_repair_chance = 0.1
>  and dclocal_read_repair_chance = 0.0
>  and gc_grace = 864000
>  and min_compaction_threshold = 4
>  and max_compaction_threshold = 32
>  and replicate_on_write = true
>  and compaction_strategy =
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>  and caching = 'KEYS_ONLY'
>  and column_metadata = [
>    {column_name : 'indexedColumn',
>    validation_class : AsciiType,
>    index_name : 'INDEX1',
>    index_type : 0}]
>  and compression_options = {'sstable_compression' :
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
> ------------- SCHEMA FINISH -----------------
>
> ------------- POPULATE START ----------------- from pycassa.batch 
> import Mutator import pycassa
>
> pool = pycassa.ConnectionPool('ks123') cf = pycassa.ColumnFamily(pool, 
> 'cf1')
>
> for rowKey in xrange(70):
>    b = Mutator(pool)
>    for datapoint in xrange(1, 45001):
>        b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=7884000);
>    b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=7887600);
>    print 'row %d' % rowKey
>    b.send()
>    b = Mutator(pool)
>
> pool.dispose()
> ------------- POPULATE FINISH -----------------
>
> ------------- QUERY START ----------------- [default@ks123] get cf1 
> where 'indexedColumn'='65';
>
> 0 Row Returned.
> Elapsed time: 2.38 msec(s).
>
> [default@ks123] get cf1 where 'indexedColumn'='66';
> -------------------
> RowKey: 66
> => (column=1, value=val, timestamp=1355818765548964, ttl=7884000) ...
> => (column=10087, value=val, timestamp=1355818766075538, ttl=7884000) 
> => (column=indexedColumn, value=66, timestamp=1355818768119334, 
> ttl=7887600)
>
> 1 Row Returned.
> Elapsed time: 31 msec(s).
> ------------- QUERY FINISH -----------------
>
> This is all using Cassandra 1.1.7 with default settings.
>
> Best regards,
>
> Alexei Bakanov
>
>

Re: TTL on SecondaryIndex Columns. A bug?

Posted by Alexei Bakanov <ru...@gmail.com>.

I'm running on a single node on my laptop.
It looks like the point when rows dissapear from the index depends on
JVM memory settings. With more memory it needs more data to feed in
before things start disappearing.
Please try to run Cassandra with -Xms1927M -Xmx1927M -Xmn400M

To be sure, try to get rows for 'indexedColumn'='1':

[default@ks123] get cf1 where 'indexedColumn'='1';

0 Row Returned.

Thanks


On 19 December 2012 05:15, aaron morton <aa...@thelastpickle.com> wrote:
> Thanks for the nice steps to reproduce.
>
> I ran this on my MBP using C* 1.1.7 and got the expected results, both get's
> returned a row.
>
> Were you running against a single node or a cluster ? If a cluster did you
> change the CL, cassandra-cli defaults to ONE.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/12/2012, at 9:44 PM, Alexei Bakanov <ru...@gmail.com> wrote:
>
> Hi,
>
> We are having an issue with TTL on Secondary index columns. We get 0
> rows in return when running queries on indexed columns that have TTL.
> Everything works fine with small amounts of data, but when we get over
> a ceratin threshold it looks like older rows dissapear from the index.
> In the example below we create 70 rows with 45k columns each + one
> indexed column with just the rowkey as value, so we have one row per
> indexed value. When the script is finished the index contains rows
> 66-69. Rows 0-65 are gone from the index.
> Using 'indexedColumn' without TTL fixes the problem.
>
>
> ------------- SCHEMA START -----------------
> create keyspace ks123
>  with placement_strategy = 'NetworkTopologyStrategy'
>  and strategy_options = {datacenter1 : 1}
>  and durable_writes = true;
>
> use ks123;
>
> create column family cf1
>  with column_type = 'Standard'
>  and comparator = 'AsciiType'
>  and default_validation_class = 'AsciiType'
>  and key_validation_class = 'AsciiType'
>  and read_repair_chance = 0.1
>  and dclocal_read_repair_chance = 0.0
>  and gc_grace = 864000
>  and min_compaction_threshold = 4
>  and max_compaction_threshold = 32
>  and replicate_on_write = true
>  and compaction_strategy =
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>  and caching = 'KEYS_ONLY'
>  and column_metadata = [
>    {column_name : 'indexedColumn',
>    validation_class : AsciiType,
>    index_name : 'INDEX1',
>    index_type : 0}]
>  and compression_options = {'sstable_compression' :
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
> ------------- SCHEMA FINISH -----------------
>
> ------------- POPULATE START -----------------
> from pycassa.batch import Mutator
> import pycassa
>
> pool = pycassa.ConnectionPool('ks123')
> cf = pycassa.ColumnFamily(pool, 'cf1')
>
> for rowKey in xrange(70):
>    b = Mutator(pool)
>    for datapoint in xrange(1, 45001):
>        b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=7884000);
>    b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=7887600);
>    print 'row %d' % rowKey
>    b.send()
>    b = Mutator(pool)
>
> pool.dispose()
> ------------- POPULATE FINISH -----------------
>
> ------------- QUERY START -----------------
> [default@ks123] get cf1 where 'indexedColumn'='65';
>
> 0 Row Returned.
> Elapsed time: 2.38 msec(s).
>
> [default@ks123] get cf1 where 'indexedColumn'='66';
> -------------------
> RowKey: 66
> => (column=1, value=val, timestamp=1355818765548964, ttl=7884000)
> ...
> => (column=10087, value=val, timestamp=1355818766075538, ttl=7884000)
> => (column=indexedColumn, value=66, timestamp=1355818768119334, ttl=7887600)
>
> 1 Row Returned.
> Elapsed time: 31 msec(s).
> ------------- QUERY FINISH -----------------
>
> This is all using Cassandra 1.1.7 with default settings.
>
> Best regards,
>
> Alexei Bakanov
>
>

Re: TTL on SecondaryIndex Columns. A bug?

Posted by aaron morton <aa...@thelastpickle.com>.

Thanks for the nice steps to reproduce. 

I ran this on my MBP using C* 1.1.7 and got the expected results, both get's returned a row. 

Were you running against a single node or a cluster ? If a cluster did you change the CL, cassandra-cli defaults to ONE.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/12/2012, at 9:44 PM, Alexei Bakanov <ru...@gmail.com> wrote:

> Hi,
> 
> We are having an issue with TTL on Secondary index columns. We get 0
> rows in return when running queries on indexed columns that have TTL.
> Everything works fine with small amounts of data, but when we get over
> a ceratin threshold it looks like older rows dissapear from the index.
> In the example below we create 70 rows with 45k columns each + one
> indexed column with just the rowkey as value, so we have one row per
> indexed value. When the script is finished the index contains rows
> 66-69. Rows 0-65 are gone from the index.
> Using 'indexedColumn' without TTL fixes the problem.
> 
> 
> ------------- SCHEMA START -----------------
> create keyspace ks123
>  with placement_strategy = 'NetworkTopologyStrategy'
>  and strategy_options = {datacenter1 : 1}
>  and durable_writes = true;
> 
> use ks123;
> 
> create column family cf1
>  with column_type = 'Standard'
>  and comparator = 'AsciiType'
>  and default_validation_class = 'AsciiType'
>  and key_validation_class = 'AsciiType'
>  and read_repair_chance = 0.1
>  and dclocal_read_repair_chance = 0.0
>  and gc_grace = 864000
>  and min_compaction_threshold = 4
>  and max_compaction_threshold = 32
>  and replicate_on_write = true
>  and compaction_strategy =
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>  and caching = 'KEYS_ONLY'
>  and column_metadata = [
>    {column_name : 'indexedColumn',
>    validation_class : AsciiType,
>    index_name : 'INDEX1',
>    index_type : 0}]
>  and compression_options = {'sstable_compression' :
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
> ------------- SCHEMA FINISH -----------------
> 
> ------------- POPULATE START -----------------
> from pycassa.batch import Mutator
> import pycassa
> 
> pool = pycassa.ConnectionPool('ks123')
> cf = pycassa.ColumnFamily(pool, 'cf1')
> 
> for rowKey in xrange(70):
>    b = Mutator(pool)
>    for datapoint in xrange(1, 45001):
>        b.insert(cf,str(rowKey), {str(datapoint): 'val'}, ttl=7884000);
>    b.insert(cf, str(rowKey), {'indexedColumn': str(rowKey)}, ttl=7887600);
>    print 'row %d' % rowKey
>    b.send()
>    b = Mutator(pool)
> 
> pool.dispose()
> ------------- POPULATE FINISH -----------------
> 
> ------------- QUERY START -----------------
> [default@ks123] get cf1 where 'indexedColumn'='65';
> 
> 0 Row Returned.
> Elapsed time: 2.38 msec(s).
> 
> [default@ks123] get cf1 where 'indexedColumn'='66';
> -------------------
> RowKey: 66
> => (column=1, value=val, timestamp=1355818765548964, ttl=7884000)
> ...
> => (column=10087, value=val, timestamp=1355818766075538, ttl=7884000)
> => (column=indexedColumn, value=66, timestamp=1355818768119334, ttl=7887600)
> 
> 1 Row Returned.
> Elapsed time: 31 msec(s).
> ------------- QUERY FINISH -----------------
> 
> This is all using Cassandra 1.1.7 with default settings.
> 
> Best regards,
> 
> Alexei Bakanov