You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alexei Bakanov <ru...@gmail.com> on 2013/02/01 09:03:39 UTC
Secondary index query + 2 Datacenters + Row Cache + Restart = 0 rows
Hello,
I've found a combination that doesn't work:
A column family that have a secondary index and caching='ALL' with
data in two datacenters and I do a restart of the nodes, then my
secondary index queries start returning 0 rows.
It happens when amount of data goes over a certain threshold, so I
suspect that compactions are involved in this as well.
Taking out one of the ingredients fixes the problem and my queries
return rows from secondary index.
I suspect that this guy is struggling with the same thing
https://issues.apache.org/jira/browse/CASSANDRA-4785
Here is a sequence of actions that reproduces it with help of CCM:
$ ccm create --cassandra-version 1.2.1 --nodes 2 -p RandomPartitioner
testRowCacheDC
$ ccm updateconf 'endpoint_snitch: PropertyFileSnitch'
$ ccm updateconf 'row_cache_size_in_mb: 200'
$ cp ~/Downloads/cassandra-topology.properties
~/.ccm/testRowCacheDC/node1/conf/ (please find .properties file
below)
$ cp ~/Downloads/cassandra-topology.properties ~/.ccm/testRowCacheDC/node2/conf/
$ ccm start
$ ccm cli
->create keyspace and column family(please find schema below)
$ python populate_rowcache.py
$ ccm stop (I tried flush first, doesn't help)
$ ccm start
$ ccm cli
Connected to: "testRowCacheDC" on 127.0.0.1/9160
Welcome to Cassandra CLI version 1.2.1-SNAPSHOT
Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.
[default@unknown] use testks;
Authenticated to keyspace: testks
[default@testks] get cf1 where 'indexedColumn'='userId_75';
0 Row Returned.
Elapsed time: 68 msec(s).
My cassandra instances run with -Xms1927M -Xmx1927M -Xmn400M
Thanks for help.
Best regards,
Alexei
------ START cassandra-topology.properties ----------
127.0.0.1=DC1:RAC1
127.0.0.2=DC2:RAC1
default=DC1:r1
------ FINISH cassandra-topology.properties ----------
------ START cassandra-cli schema -----------
create keyspace testks
with placement_strategy = 'NetworkTopologyStrategy'
and strategy_options = {DC2 : 1, DC1 : 1}
and durable_writes = true;
use testks;
create column family cf1
with column_type = 'Standard'
and comparator = 'org.apache.cassandra.db.marshal.AsciiType'
and default_validation_class = 'UTF8Type'
and key_validation_class = 'UTF8Type'
and read_repair_chance = 1.0
and dclocal_read_repair_chance = 0.0
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = true
and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'ALL'
and column_metadata = [
{column_name : 'indexedColumn',
validation_class : UTF8Type,
index_name : 'INDEX1',
index_type : 0}]
and compression_options = {'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'};
-------FINISH cassandra-cli schema -----------
------ START populate_rowcache.py -----------
from pycassa.batch import Mutator
import pycassa
pool = pycassa.ConnectionPool('testks', timeout=5)
cf = pycassa.ColumnFamily(pool, 'cf1')
for userId in xrange(0, 1000):
print userId
b = Mutator(pool, queue_size=200)
for itemId in xrange(20):
rowKey = 'userId_%s:itemId_%s'%(userId, itemId)
for message_number in xrange(10):
b.insert(cf, rowKey, {'indexedColumn': 'userId_%s'%userId,
str(message_number): str(message_number)})
b.send()
pool.dispose()
------ FINISH populate_rowcache.py -----------
Re: Secondary index query + 2 Datacenters + Row Cache + Restart = 0 rows
Posted by Alexei Bakanov <ru...@gmail.com>.
I tried to run with tracing, but it says 'Scanned 0 rows and matched 0'.
I found existing issue on this bug
https://issues.apache.org/jira/browse/CASSANDRA-4973
I made a d-test for reproducing it and attached to the ticket.
Alexei
On 2 February 2013 23:00, aaron morton <aa...@thelastpickle.com> wrote:
> Can you run the select in cqlsh and enabling tracing (see the cqlsh online
> help).
>
> If you can replicate it then place raise a ticket on
> https://issues.apache.org/jira/browse/CASSANDRA and update email thread.
>
> Thanks
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1/02/2013, at 9:03 PM, Alexei Bakanov <ru...@gmail.com> wrote:
>
> Hello,
>
> I've found a combination that doesn't work:
> A column family that have a secondary index and caching='ALL' with
> data in two datacenters and I do a restart of the nodes, then my
> secondary index queries start returning 0 rows.
> It happens when amount of data goes over a certain threshold, so I
> suspect that compactions are involved in this as well.
> Taking out one of the ingredients fixes the problem and my queries
> return rows from secondary index.
> I suspect that this guy is struggling with the same thing
> https://issues.apache.org/jira/browse/CASSANDRA-4785
>
> Here is a sequence of actions that reproduces it with help of CCM:
>
> $ ccm create --cassandra-version 1.2.1 --nodes 2 -p RandomPartitioner
> testRowCacheDC
> $ ccm updateconf 'endpoint_snitch: PropertyFileSnitch'
> $ ccm updateconf 'row_cache_size_in_mb: 200'
> $ cp ~/Downloads/cassandra-topology.properties
> ~/.ccm/testRowCacheDC/node1/conf/ (please find .properties file
> below)
> $ cp ~/Downloads/cassandra-topology.properties
> ~/.ccm/testRowCacheDC/node2/conf/
> $ ccm start
> $ ccm cli
> ->create keyspace and column family(please find schema below)
> $ python populate_rowcache.py
> $ ccm stop (I tried flush first, doesn't help)
> $ ccm start
> $ ccm cli
> Connected to: "testRowCacheDC" on 127.0.0.1/9160
> Welcome to Cassandra CLI version 1.2.1-SNAPSHOT
>
> Type 'help;' or '?' for help.
> Type 'quit;' or 'exit;' to quit.
>
> [default@unknown] use testks;
> Authenticated to keyspace: testks
> [default@testks] get cf1 where 'indexedColumn'='userId_75';
>
> 0 Row Returned.
> Elapsed time: 68 msec(s).
>
> My cassandra instances run with -Xms1927M -Xmx1927M -Xmn400M
> Thanks for help.
>
> Best regards,
> Alexei
>
>
> ------ START cassandra-topology.properties ----------
> 127.0.0.1=DC1:RAC1
> 127.0.0.2=DC2:RAC1
> default=DC1:r1
> ------ FINISH cassandra-topology.properties ----------
>
> ------ START cassandra-cli schema -----------
> create keyspace testks
> with placement_strategy = 'NetworkTopologyStrategy'
> and strategy_options = {DC2 : 1, DC1 : 1}
> and durable_writes = true;
>
> use testks;
>
> create column family cf1
> with column_type = 'Standard'
> and comparator = 'org.apache.cassandra.db.marshal.AsciiType'
> and default_validation_class = 'UTF8Type'
> and key_validation_class = 'UTF8Type'
> and read_repair_chance = 1.0
> and dclocal_read_repair_chance = 0.0
> and gc_grace = 864000
> and min_compaction_threshold = 4
> and max_compaction_threshold = 32
> and replicate_on_write = true
> and compaction_strategy =
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
> and caching = 'ALL'
> and column_metadata = [
> {column_name : 'indexedColumn',
> validation_class : UTF8Type,
> index_name : 'INDEX1',
> index_type : 0}]
> and compression_options = {'sstable_compression' :
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
> -------FINISH cassandra-cli schema -----------
>
> ------ START populate_rowcache.py -----------
> from pycassa.batch import Mutator
>
> import pycassa
>
> pool = pycassa.ConnectionPool('testks', timeout=5)
> cf = pycassa.ColumnFamily(pool, 'cf1')
>
> for userId in xrange(0, 1000):
> print userId
> b = Mutator(pool, queue_size=200)
> for itemId in xrange(20):
> rowKey = 'userId_%s:itemId_%s'%(userId, itemId)
> for message_number in xrange(10):
> b.insert(cf, rowKey, {'indexedColumn': 'userId_%s'%userId,
> str(message_number): str(message_number)})
> b.send()
>
> pool.dispose()
> ------ FINISH populate_rowcache.py -----------
>
>
Re: Secondary index query + 2 Datacenters + Row Cache + Restart = 0 rows
Posted by aaron morton <aa...@thelastpickle.com>.
Can you run the select in cqlsh and enabling tracing (see the cqlsh online help).
If you can replicate it then place raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA and update email thread.
Thanks
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 1/02/2013, at 9:03 PM, Alexei Bakanov <ru...@gmail.com> wrote:
> Hello,
>
> I've found a combination that doesn't work:
> A column family that have a secondary index and caching='ALL' with
> data in two datacenters and I do a restart of the nodes, then my
> secondary index queries start returning 0 rows.
> It happens when amount of data goes over a certain threshold, so I
> suspect that compactions are involved in this as well.
> Taking out one of the ingredients fixes the problem and my queries
> return rows from secondary index.
> I suspect that this guy is struggling with the same thing
> https://issues.apache.org/jira/browse/CASSANDRA-4785
>
> Here is a sequence of actions that reproduces it with help of CCM:
>
> $ ccm create --cassandra-version 1.2.1 --nodes 2 -p RandomPartitioner
> testRowCacheDC
> $ ccm updateconf 'endpoint_snitch: PropertyFileSnitch'
> $ ccm updateconf 'row_cache_size_in_mb: 200'
> $ cp ~/Downloads/cassandra-topology.properties
> ~/.ccm/testRowCacheDC/node1/conf/ (please find .properties file
> below)
> $ cp ~/Downloads/cassandra-topology.properties ~/.ccm/testRowCacheDC/node2/conf/
> $ ccm start
> $ ccm cli
> ->create keyspace and column family(please find schema below)
> $ python populate_rowcache.py
> $ ccm stop (I tried flush first, doesn't help)
> $ ccm start
> $ ccm cli
> Connected to: "testRowCacheDC" on 127.0.0.1/9160
> Welcome to Cassandra CLI version 1.2.1-SNAPSHOT
>
> Type 'help;' or '?' for help.
> Type 'quit;' or 'exit;' to quit.
>
> [default@unknown] use testks;
> Authenticated to keyspace: testks
> [default@testks] get cf1 where 'indexedColumn'='userId_75';
>
> 0 Row Returned.
> Elapsed time: 68 msec(s).
>
> My cassandra instances run with -Xms1927M -Xmx1927M -Xmn400M
> Thanks for help.
>
> Best regards,
> Alexei
>
>
> ------ START cassandra-topology.properties ----------
> 127.0.0.1=DC1:RAC1
> 127.0.0.2=DC2:RAC1
> default=DC1:r1
> ------ FINISH cassandra-topology.properties ----------
>
> ------ START cassandra-cli schema -----------
> create keyspace testks
> with placement_strategy = 'NetworkTopologyStrategy'
> and strategy_options = {DC2 : 1, DC1 : 1}
> and durable_writes = true;
>
> use testks;
>
> create column family cf1
> with column_type = 'Standard'
> and comparator = 'org.apache.cassandra.db.marshal.AsciiType'
> and default_validation_class = 'UTF8Type'
> and key_validation_class = 'UTF8Type'
> and read_repair_chance = 1.0
> and dclocal_read_repair_chance = 0.0
> and gc_grace = 864000
> and min_compaction_threshold = 4
> and max_compaction_threshold = 32
> and replicate_on_write = true
> and compaction_strategy =
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
> and caching = 'ALL'
> and column_metadata = [
> {column_name : 'indexedColumn',
> validation_class : UTF8Type,
> index_name : 'INDEX1',
> index_type : 0}]
> and compression_options = {'sstable_compression' :
> 'org.apache.cassandra.io.compress.SnappyCompressor'};
> -------FINISH cassandra-cli schema -----------
>
> ------ START populate_rowcache.py -----------
> from pycassa.batch import Mutator
>
> import pycassa
>
> pool = pycassa.ConnectionPool('testks', timeout=5)
> cf = pycassa.ColumnFamily(pool, 'cf1')
>
> for userId in xrange(0, 1000):
> print userId
> b = Mutator(pool, queue_size=200)
> for itemId in xrange(20):
> rowKey = 'userId_%s:itemId_%s'%(userId, itemId)
> for message_number in xrange(10):
> b.insert(cf, rowKey, {'indexedColumn': 'userId_%s'%userId,
> str(message_number): str(message_number)})
> b.send()
>
> pool.dispose()
> ------ FINISH populate_rowcache.py -----------