You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by DuyHai Doan <do...@gmail.com> on 2014/05/12 07:37:59 UTC

Re: Question about READS in a multi DC environment.

Ins't read repair supposed to be done asynchronously in background ?


On Mon, May 12, 2014 at 2:07 AM, graham sanderson <gr...@vast.com> wrote:

> You have a read_repair_chance of 1.0 which is probably why your query is
> hitting all data centers.
>
> On May 11, 2014, at 3:44 PM, Mark Farnan <de...@petrolink.com> wrote:
>
> > Im trying to understand READ load in Cassandra across a multi-datacenter
> cluster.   (Specifically why it seems to be hitting more than one DC) and
> hope someone can help.
> >
> > From what Iím seeing here, a READ, with Consistency LOCAL_ONE,   seems
> to be hitting All 3 datacenters, rather than just the one Iím connected to.
>   I see  'Read 101 live and 0 tombstoned cells'  from EACH of the 3 DC"s in
> the trace, which seems, wrong.
> > I have tried every  Consistency level, same result.   This also is same
> from my C# code via the DataStax driver, (where I first noticed the issue).
> >
> > Can someone please shed some light on what is occurring ?  Specifically
> I dont' want a query on one DC, going anywhere near the other 2 as a rule,
> as in production,  these DC's will be accross slower links.
> >
> >
> > Query:  (NOTE:  Whilst this uses a kairosdb table,  i'm just playing
> with queries against it as it has 100k columns in this key for testing).
> >
> > cqlsh:kairosdb> consistency local_one
> > Consistency level set to LOCAL_ONE.
> >
> > cqlsh:kairosdb> select * from data_points where key =
> 0x6d61726c796e2e746573742e74656d70340000000145b514a400726f6f6d3d6f66666963653a
> limit 1000;
> >
> > ... Some return data  rows listed here which I've removed ....
> >
> > <CassandraQuery.txt>
> > Query Respose Trace:
> >
> > activity
>                                                                 | timestamp
>    | source         | source_elapsed
> >
> ------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------------+----------------
> >
>                                               execute_cql3_query |
> 07:18:12,692 | 192.168.25.111 |              0
> >
>                            Message received from /192.168.25.111 |
> 07:18:00,706 | 192.168.25.131 |             50
> >
>                  Executing single-partition query on data_points |
> 07:18:00,707 | 192.168.25.131 |            760
> >
>                                     Acquiring sstable references |
> 07:18:00,707 | 192.168.25.131 |            814
> >
>                                      Merging memtable tombstones |
> 07:18:00,707 | 192.168.25.131 |            924
> >
>                         Bloom filter allows skipping sstable 191 |
> 07:18:00,707 | 192.168.25.131 |           1050
> >
>                         Bloom filter allows skipping sstable 190 |
> 07:18:00,707 | 192.168.25.131 |           1166
> >
>                                    Key cache hit for sstable 189 |
> 07:18:00,707 | 192.168.25.131 |           1275
> >
>                      Seeking to partition beginning in data file |
> 07:18:00,707 | 192.168.25.131 |           1293
> >                                                                Skipped
> 0/3 non-slice-intersecting sstables, included 0 due to tombstones |
> 07:18:00,708 | 192.168.25.131 |           2173
> >
>                       Merging data from memtables and 1 sstables |
> 07:18:00,708 | 192.168.25.131 |           2195
> >
>                            Read 1001 live and 0 tombstoned cells |
> 07:18:00,709 | 192.168.25.131 |           3259
> >
>                            Enqueuing response to /192.168.25.111 |
> 07:18:00,710 | 192.168.25.131 |           4006
> >
>                               Sending message to /192.168.25.111 |
> 07:18:00,710 | 192.168.25.131 |           4210
> > Parsing select * from data_points where key =
> 0x6d61726c796e2e746573742e74656d70340000000145b514a400726f6f6d3d6f66666963653a
> limit 1000; | 07:18:12,692 | 192.168.25.111 |             52
> >
>                                              Preparing statement |
> 07:18:12,692 | 192.168.25.111 |            257
> >
>                               Sending message to /192.168.25.121 |
> 07:18:12,693 | 192.168.25.111 |           1099
> >
>                               Sending message to /192.168.25.131 |
> 07:18:12,693 | 192.168.25.111 |           1254
> >
>                  Executing single-partition query on data_points |
> 07:18:12,693 | 192.168.25.111 |           1269
> >
>                                     Acquiring sstable references |
> 07:18:12,693 | 192.168.25.111 |           1284
> >
>                                      Merging memtable tombstones |
> 07:18:12,694 | 192.168.25.111 |           1315
> >
>                                    Key cache hit for sstable 205 |
> 07:18:12,694 | 192.168.25.111 |           1592
> >
>                      Seeking to partition beginning in data file |
> 07:18:12,694 | 192.168.25.111 |           1606
> >                                                                Skipped
> 0/1 non-slice-intersecting sstables, included 0 due to tombstones |
> 07:18:12,695 | 192.168.25.111 |           2423
> >
>                       Merging data from memtables and 1 sstables |
> 07:18:12,695 | 192.168.25.111 |           2498
> >
>                            Read 1001 live and 0 tombstoned cells |
> 07:18:12,695 | 192.168.25.111 |           3167
> >
>                            Message received from /192.168.25.121 |
> 07:18:12,697 | 192.168.25.111 |           null
> >
>                         Processing response from /192.168.25.121 |
> 07:18:12,697 | 192.168.25.111 |           null
> >
>                            Message received from /192.168.25.131 |
> 07:18:12,699 | 192.168.25.111 |           null
> >
>                         Processing response from /192.168.25.131 |
> 07:18:12,699 | 192.168.25.111 |           null
> >
>                            Message received from /192.168.25.111 |
> 07:19:49,432 | 192.168.25.121 |             68
> >
>                  Executing single-partition query on data_points |
> 07:19:49,433 | 192.168.25.121 |            824
> >
>                                     Acquiring sstable references |
> 07:19:49,433 | 192.168.25.121 |            840
> >
>                                      Merging memtable tombstones |
> 07:19:49,433 | 192.168.25.121 |            898
> >
>                         Bloom filter allows skipping sstable 193 |
> 07:19:49,433 | 192.168.25.121 |            983
> >
>                                    Key cache hit for sstable 192 |
> 07:19:49,433 | 192.168.25.121 |           1055
> >
>                      Seeking to partition beginning in data file |
> 07:19:49,433 | 192.168.25.121 |           1073
> >                                                                Skipped
> 0/2 non-slice-intersecting sstables, included 0 due to tombstones |
> 07:19:49,434 | 192.168.25.121 |           1803
> >
>                       Merging data from memtables and 1 sstables |
> 07:19:49,434 | 192.168.25.121 |           1839
> >
>                            Read 1001 live and 0 tombstoned cells |
> 07:19:49,434 | 192.168.25.121 |           2518
> >
>                            Enqueuing response to /192.168.25.111 |
> 07:19:49,435 | 192.168.25.121 |           3026
> >
>                               Sending message to /192.168.25.111 |
> 07:19:49,435 | 192.168.25.121 |           3128
> >
>                                                 Request complete |
> 07:18:12,696 | 192.168.25.111 |           4387
> >
> >
> > Other Stats about the cluster:
> >
> > [root@cdev101 conf]# nodetool status
> > Datacenter: DC3
> > ===============
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/Moving
> > --  Address         Load       Tokens  Owns   Host ID
>             Rack
> > UN  192.168.25.131  80.67 MB   256     34.2%
>  6ec61643-17d4-4a2e-8c44-57e08687a957  RAC1
> > Datacenter: DC2
> > ===============
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/Moving
> > --  Address         Load       Tokens  Owns   Host ID
>             Rack
> > UN  192.168.25.121  79.46 MB   256     30.6%
>  976626fb-ea80-405b-abb0-eae703b0074d  RAC1
> > Datacenter: DC1
> > ===============
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/Moving
> > --  Address         Load       Tokens  Owns   Host ID
>             Rack
> > UN  192.168.25.111  61.82 MB   256     35.2%
>  9475e2da-d926-42d0-83fb-0188d0f8f438  RAC1
> >
> >
> > cqlsh> describe keyspace kairosdb
> >
> > CREATE KEYSPACE kairosdb WITH replication = {
> >  'class': 'NetworkTopologyStrategy',
> >  'DC2': '1',
> >  'DC3': '1',
> >  'DC1': '1'
> > };
> >
> > USE kairosdb;
> >
> > CREATE TABLE data_points (
> >  key blob,
> >  column1 blob,
> >  value blob,
> >  PRIMARY KEY (key, column1)
> > ) WITH COMPACT STORAGE AND
> >  bloom_filter_fp_chance=0.010000 AND
> >  caching='KEYS_ONLY' AND
> >  comment='' AND
> >  dclocal_read_repair_chance=0.000000 AND
> >  gc_grace_seconds=864000 AND
> >  index_interval=128 AND
> >  read_repair_chance=1.000000 AND
> >  replicate_on_write='true' AND
> >  populate_io_cache_on_flush='false' AND
> >  default_time_to_live=0 AND
> >  speculative_retry='NONE' AND
> >  memtable_flush_period_in_ms=0 AND
> >  compaction={'class': 'SizeTieredCompactionStrategy'} AND
> >  compression={'sstable_compression': 'LZ4Compressor'};
> >
> >
> >
>
>

Re: Question about READS in a multi DC environment.

Posted by Aaron Morton <aa...@thelastpickle.com>.
In this case I was not thinking about what was happening synchronous to client request, only that the request was hitting all nodes. 

You are right, when reading at LOCAL_ONE the coordinator will only be blocking for one response (the data response). 

Cheers
Aaron
-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 14/05/2014, at 11:36 am, graham sanderson <gr...@vast.com> wrote:

> Yeah, but all the requests for data/digest are sent at the same time… responses that aren’t “needed” to complete the request are dealt with asynchronously (possibly causing repair). 
> 
> In the original trace (which is confusing because I don’t think the clocks are in sync)… I don’t see anything that makes me believe it is blocking for all 3 responses - It actually does reads on all 3 nodes even if only digests are required
> 
> On May 12, 2014, at 12:37 AM, DuyHai Doan <do...@gmail.com> wrote:
> 
>> Ins't read repair supposed to be done asynchronously in background ?
>> 
>> 
>> On Mon, May 12, 2014 at 2:07 AM, graham sanderson <gr...@vast.com> wrote:
>> You have a read_repair_chance of 1.0 which is probably why your query is hitting all data centers.
>> 
>> On May 11, 2014, at 3:44 PM, Mark Farnan <de...@petrolink.com> wrote:
>> 
>> > Im trying to understand READ load in Cassandra across a multi-datacenter cluster.   (Specifically why it seems to be hitting more than one DC) and hope someone can help.
>> >
>> > From what Iím seeing here, a READ, with Consistency LOCAL_ONE,   seems to be hitting All 3 datacenters, rather than just the one Iím connected to.   I see  'Read 101 live and 0 tombstoned cells'  from EACH of the 3 DC"s in the trace, which seems, wrong.
>> > I have tried every  Consistency level, same result.   This also is same from my C# code via the DataStax driver, (where I first noticed the issue).
>> >
>> > Can someone please shed some light on what is occurring ?  Specifically I dont' want a query on one DC, going anywhere near the other 2 as a rule, as in production,  these DC's will be accross slower links.
>> >
>> >
>> > Query:  (NOTE:  Whilst this uses a kairosdb table,  i'm just playing with queries against it as it has 100k columns in this key for testing).
>> >
>> > cqlsh:kairosdb> consistency local_one
>> > Consistency level set to LOCAL_ONE.
>> >
>> > cqlsh:kairosdb> select * from data_points where key = 0x6d61726c796e2e746573742e74656d70340000000145b514a400726f6f6d3d6f66666963653a limit 1000;
>> >
>> > ... Some return data  rows listed here which I've removed ....
>> >
>> > <CassandraQuery.txt>
>> > Query Respose Trace:
>> >
>> > activity                                                                                                                                 | timestamp    | source         | source_elapsed
>> > ------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------------+----------------
>> >                                                                                                                       execute_cql3_query | 07:18:12,692 | 192.168.25.111 |              0
>> >                                                                                                    Message received from /192.168.25.111 | 07:18:00,706 | 192.168.25.131 |             50
>> >                                                                                          Executing single-partition query on data_points | 07:18:00,707 | 192.168.25.131 |            760
>> >                                                                                                             Acquiring sstable references | 07:18:00,707 | 192.168.25.131 |            814
>> >                                                                                                              Merging memtable tombstones | 07:18:00,707 | 192.168.25.131 |            924
>> >                                                                                                 Bloom filter allows skipping sstable 191 | 07:18:00,707 | 192.168.25.131 |           1050
>> >                                                                                                 Bloom filter allows skipping sstable 190 | 07:18:00,707 | 192.168.25.131 |           1166
>> >                                                                                                            Key cache hit for sstable 189 | 07:18:00,707 | 192.168.25.131 |           1275
>> >                                                                                              Seeking to partition beginning in data file | 07:18:00,707 | 192.168.25.131 |           1293
>> >                                                                Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 07:18:00,708 | 192.168.25.131 |           2173
>> >                                                                                               Merging data from memtables and 1 sstables | 07:18:00,708 | 192.168.25.131 |           2195
>> >                                                                                                    Read 1001 live and 0 tombstoned cells | 07:18:00,709 | 192.168.25.131 |           3259
>> >                                                                                                    Enqueuing response to /192.168.25.111 | 07:18:00,710 | 192.168.25.131 |           4006
>> >                                                                                                       Sending message to /192.168.25.111 | 07:18:00,710 | 192.168.25.131 |           4210
>> > Parsing select * from data_points where key = 0x6d61726c796e2e746573742e74656d70340000000145b514a400726f6f6d3d6f66666963653a limit 1000; | 07:18:12,692 | 192.168.25.111 |             52
>> >                                                                                                                      Preparing statement | 07:18:12,692 | 192.168.25.111 |            257
>> >                                                                                                       Sending message to /192.168.25.121 | 07:18:12,693 | 192.168.25.111 |           1099
>> >                                                                                                       Sending message to /192.168.25.131 | 07:18:12,693 | 192.168.25.111 |           1254
>> >                                                                                          Executing single-partition query on data_points | 07:18:12,693 | 192.168.25.111 |           1269
>> >                                                                                                             Acquiring sstable references | 07:18:12,693 | 192.168.25.111 |           1284
>> >                                                                                                              Merging memtable tombstones | 07:18:12,694 | 192.168.25.111 |           1315
>> >                                                                                                            Key cache hit for sstable 205 | 07:18:12,694 | 192.168.25.111 |           1592
>> >                                                                                              Seeking to partition beginning in data file | 07:18:12,694 | 192.168.25.111 |           1606
>> >                                                                Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 07:18:12,695 | 192.168.25.111 |           2423
>> >                                                                                               Merging data from memtables and 1 sstables | 07:18:12,695 | 192.168.25.111 |           2498
>> >                                                                                                    Read 1001 live and 0 tombstoned cells | 07:18:12,695 | 192.168.25.111 |           3167
>> >                                                                                                    Message received from /192.168.25.121 | 07:18:12,697 | 192.168.25.111 |           null
>> >                                                                                                 Processing response from /192.168.25.121 | 07:18:12,697 | 192.168.25.111 |           null
>> >                                                                                                    Message received from /192.168.25.131 | 07:18:12,699 | 192.168.25.111 |           null
>> >                                                                                                 Processing response from /192.168.25.131 | 07:18:12,699 | 192.168.25.111 |           null
>> >                                                                                                    Message received from /192.168.25.111 | 07:19:49,432 | 192.168.25.121 |             68
>> >                                                                                          Executing single-partition query on data_points | 07:19:49,433 | 192.168.25.121 |            824
>> >                                                                                                             Acquiring sstable references | 07:19:49,433 | 192.168.25.121 |            840
>> >                                                                                                              Merging memtable tombstones | 07:19:49,433 | 192.168.25.121 |            898
>> >                                                                                                 Bloom filter allows skipping sstable 193 | 07:19:49,433 | 192.168.25.121 |            983
>> >                                                                                                            Key cache hit for sstable 192 | 07:19:49,433 | 192.168.25.121 |           1055
>> >                                                                                              Seeking to partition beginning in data file | 07:19:49,433 | 192.168.25.121 |           1073
>> >                                                                Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones | 07:19:49,434 | 192.168.25.121 |           1803
>> >                                                                                               Merging data from memtables and 1 sstables | 07:19:49,434 | 192.168.25.121 |           1839
>> >                                                                                                    Read 1001 live and 0 tombstoned cells | 07:19:49,434 | 192.168.25.121 |           2518
>> >                                                                                                    Enqueuing response to /192.168.25.111 | 07:19:49,435 | 192.168.25.121 |           3026
>> >                                                                                                       Sending message to /192.168.25.111 | 07:19:49,435 | 192.168.25.121 |           3128
>> >                                                                                                                         Request complete | 07:18:12,696 | 192.168.25.111 |           4387
>> >
>> >
>> > Other Stats about the cluster:
>> >
>> > [root@cdev101 conf]# nodetool status
>> > Datacenter: DC3
>> > ===============
>> > Status=Up/Down
>> > |/ State=Normal/Leaving/Joining/Moving
>> > --  Address         Load       Tokens  Owns   Host ID                               Rack
>> > UN  192.168.25.131  80.67 MB   256     34.2%  6ec61643-17d4-4a2e-8c44-57e08687a957  RAC1
>> > Datacenter: DC2
>> > ===============
>> > Status=Up/Down
>> > |/ State=Normal/Leaving/Joining/Moving
>> > --  Address         Load       Tokens  Owns   Host ID                               Rack
>> > UN  192.168.25.121  79.46 MB   256     30.6%  976626fb-ea80-405b-abb0-eae703b0074d  RAC1
>> > Datacenter: DC1
>> > ===============
>> > Status=Up/Down
>> > |/ State=Normal/Leaving/Joining/Moving
>> > --  Address         Load       Tokens  Owns   Host ID                               Rack
>> > UN  192.168.25.111  61.82 MB   256     35.2%  9475e2da-d926-42d0-83fb-0188d0f8f438  RAC1
>> >
>> >
>> > cqlsh> describe keyspace kairosdb
>> >
>> > CREATE KEYSPACE kairosdb WITH replication = {
>> >  'class': 'NetworkTopologyStrategy',
>> >  'DC2': '1',
>> >  'DC3': '1',
>> >  'DC1': '1'
>> > };
>> >
>> > USE kairosdb;
>> >
>> > CREATE TABLE data_points (
>> >  key blob,
>> >  column1 blob,
>> >  value blob,
>> >  PRIMARY KEY (key, column1)
>> > ) WITH COMPACT STORAGE AND
>> >  bloom_filter_fp_chance=0.010000 AND
>> >  caching='KEYS_ONLY' AND
>> >  comment='' AND
>> >  dclocal_read_repair_chance=0.000000 AND
>> >  gc_grace_seconds=864000 AND
>> >  index_interval=128 AND
>> >  read_repair_chance=1.000000 AND
>> >  replicate_on_write='true' AND
>> >  populate_io_cache_on_flush='false' AND
>> >  default_time_to_live=0 AND
>> >  speculative_retry='NONE' AND
>> >  memtable_flush_period_in_ms=0 AND
>> >  compaction={'class': 'SizeTieredCompactionStrategy'} AND
>> >  compression={'sstable_compression': 'LZ4Compressor'};
>> >
>> >
>> >
>> 
>> 
> 


Re: Question about READS in a multi DC environment.

Posted by graham sanderson <gr...@vast.com>.
Yeah, but all the requests for data/digest are sent at the same time… responses that aren’t “needed” to complete the request are dealt with asynchronously (possibly causing repair). 

In the original trace (which is confusing because I don’t think the clocks are in sync)… I don’t see anything that makes me believe it is blocking for all 3 responses - It actually does reads on all 3 nodes even if only digests are required

On May 12, 2014, at 12:37 AM, DuyHai Doan <do...@gmail.com> wrote:

> Ins't read repair supposed to be done asynchronously in background ?
> 
> 
> On Mon, May 12, 2014 at 2:07 AM, graham sanderson <gr...@vast.com> wrote:
> You have a read_repair_chance of 1.0 which is probably why your query is hitting all data centers.
> 
> On May 11, 2014, at 3:44 PM, Mark Farnan <de...@petrolink.com> wrote:
> 
> > Im trying to understand READ load in Cassandra across a multi-datacenter cluster.   (Specifically why it seems to be hitting more than one DC) and hope someone can help.
> >
> > From what Iím seeing here, a READ, with Consistency LOCAL_ONE,   seems to be hitting All 3 datacenters, rather than just the one Iím connected to.   I see  'Read 101 live and 0 tombstoned cells'  from EACH of the 3 DC"s in the trace, which seems, wrong.
> > I have tried every  Consistency level, same result.   This also is same from my C# code via the DataStax driver, (where I first noticed the issue).
> >
> > Can someone please shed some light on what is occurring ?  Specifically I dont' want a query on one DC, going anywhere near the other 2 as a rule, as in production,  these DC's will be accross slower links.
> >
> >
> > Query:  (NOTE:  Whilst this uses a kairosdb table,  i'm just playing with queries against it as it has 100k columns in this key for testing).
> >
> > cqlsh:kairosdb> consistency local_one
> > Consistency level set to LOCAL_ONE.
> >
> > cqlsh:kairosdb> select * from data_points where key = 0x6d61726c796e2e746573742e74656d70340000000145b514a400726f6f6d3d6f66666963653a limit 1000;
> >
> > ... Some return data  rows listed here which I've removed ....
> >
> > <CassandraQuery.txt>
> > Query Respose Trace:
> >
> > activity                                                                                                                                 | timestamp    | source         | source_elapsed
> > ------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------------+----------------
> >                                                                                                                       execute_cql3_query | 07:18:12,692 | 192.168.25.111 |              0
> >                                                                                                    Message received from /192.168.25.111 | 07:18:00,706 | 192.168.25.131 |             50
> >                                                                                          Executing single-partition query on data_points | 07:18:00,707 | 192.168.25.131 |            760
> >                                                                                                             Acquiring sstable references | 07:18:00,707 | 192.168.25.131 |            814
> >                                                                                                              Merging memtable tombstones | 07:18:00,707 | 192.168.25.131 |            924
> >                                                                                                 Bloom filter allows skipping sstable 191 | 07:18:00,707 | 192.168.25.131 |           1050
> >                                                                                                 Bloom filter allows skipping sstable 190 | 07:18:00,707 | 192.168.25.131 |           1166
> >                                                                                                            Key cache hit for sstable 189 | 07:18:00,707 | 192.168.25.131 |           1275
> >                                                                                              Seeking to partition beginning in data file | 07:18:00,707 | 192.168.25.131 |           1293
> >                                                                Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 07:18:00,708 | 192.168.25.131 |           2173
> >                                                                                               Merging data from memtables and 1 sstables | 07:18:00,708 | 192.168.25.131 |           2195
> >                                                                                                    Read 1001 live and 0 tombstoned cells | 07:18:00,709 | 192.168.25.131 |           3259
> >                                                                                                    Enqueuing response to /192.168.25.111 | 07:18:00,710 | 192.168.25.131 |           4006
> >                                                                                                       Sending message to /192.168.25.111 | 07:18:00,710 | 192.168.25.131 |           4210
> > Parsing select * from data_points where key = 0x6d61726c796e2e746573742e74656d70340000000145b514a400726f6f6d3d6f66666963653a limit 1000; | 07:18:12,692 | 192.168.25.111 |             52
> >                                                                                                                      Preparing statement | 07:18:12,692 | 192.168.25.111 |            257
> >                                                                                                       Sending message to /192.168.25.121 | 07:18:12,693 | 192.168.25.111 |           1099
> >                                                                                                       Sending message to /192.168.25.131 | 07:18:12,693 | 192.168.25.111 |           1254
> >                                                                                          Executing single-partition query on data_points | 07:18:12,693 | 192.168.25.111 |           1269
> >                                                                                                             Acquiring sstable references | 07:18:12,693 | 192.168.25.111 |           1284
> >                                                                                                              Merging memtable tombstones | 07:18:12,694 | 192.168.25.111 |           1315
> >                                                                                                            Key cache hit for sstable 205 | 07:18:12,694 | 192.168.25.111 |           1592
> >                                                                                              Seeking to partition beginning in data file | 07:18:12,694 | 192.168.25.111 |           1606
> >                                                                Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 07:18:12,695 | 192.168.25.111 |           2423
> >                                                                                               Merging data from memtables and 1 sstables | 07:18:12,695 | 192.168.25.111 |           2498
> >                                                                                                    Read 1001 live and 0 tombstoned cells | 07:18:12,695 | 192.168.25.111 |           3167
> >                                                                                                    Message received from /192.168.25.121 | 07:18:12,697 | 192.168.25.111 |           null
> >                                                                                                 Processing response from /192.168.25.121 | 07:18:12,697 | 192.168.25.111 |           null
> >                                                                                                    Message received from /192.168.25.131 | 07:18:12,699 | 192.168.25.111 |           null
> >                                                                                                 Processing response from /192.168.25.131 | 07:18:12,699 | 192.168.25.111 |           null
> >                                                                                                    Message received from /192.168.25.111 | 07:19:49,432 | 192.168.25.121 |             68
> >                                                                                          Executing single-partition query on data_points | 07:19:49,433 | 192.168.25.121 |            824
> >                                                                                                             Acquiring sstable references | 07:19:49,433 | 192.168.25.121 |            840
> >                                                                                                              Merging memtable tombstones | 07:19:49,433 | 192.168.25.121 |            898
> >                                                                                                 Bloom filter allows skipping sstable 193 | 07:19:49,433 | 192.168.25.121 |            983
> >                                                                                                            Key cache hit for sstable 192 | 07:19:49,433 | 192.168.25.121 |           1055
> >                                                                                              Seeking to partition beginning in data file | 07:19:49,433 | 192.168.25.121 |           1073
> >                                                                Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones | 07:19:49,434 | 192.168.25.121 |           1803
> >                                                                                               Merging data from memtables and 1 sstables | 07:19:49,434 | 192.168.25.121 |           1839
> >                                                                                                    Read 1001 live and 0 tombstoned cells | 07:19:49,434 | 192.168.25.121 |           2518
> >                                                                                                    Enqueuing response to /192.168.25.111 | 07:19:49,435 | 192.168.25.121 |           3026
> >                                                                                                       Sending message to /192.168.25.111 | 07:19:49,435 | 192.168.25.121 |           3128
> >                                                                                                                         Request complete | 07:18:12,696 | 192.168.25.111 |           4387
> >
> >
> > Other Stats about the cluster:
> >
> > [root@cdev101 conf]# nodetool status
> > Datacenter: DC3
> > ===============
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/Moving
> > --  Address         Load       Tokens  Owns   Host ID                               Rack
> > UN  192.168.25.131  80.67 MB   256     34.2%  6ec61643-17d4-4a2e-8c44-57e08687a957  RAC1
> > Datacenter: DC2
> > ===============
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/Moving
> > --  Address         Load       Tokens  Owns   Host ID                               Rack
> > UN  192.168.25.121  79.46 MB   256     30.6%  976626fb-ea80-405b-abb0-eae703b0074d  RAC1
> > Datacenter: DC1
> > ===============
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/Moving
> > --  Address         Load       Tokens  Owns   Host ID                               Rack
> > UN  192.168.25.111  61.82 MB   256     35.2%  9475e2da-d926-42d0-83fb-0188d0f8f438  RAC1
> >
> >
> > cqlsh> describe keyspace kairosdb
> >
> > CREATE KEYSPACE kairosdb WITH replication = {
> >  'class': 'NetworkTopologyStrategy',
> >  'DC2': '1',
> >  'DC3': '1',
> >  'DC1': '1'
> > };
> >
> > USE kairosdb;
> >
> > CREATE TABLE data_points (
> >  key blob,
> >  column1 blob,
> >  value blob,
> >  PRIMARY KEY (key, column1)
> > ) WITH COMPACT STORAGE AND
> >  bloom_filter_fp_chance=0.010000 AND
> >  caching='KEYS_ONLY' AND
> >  comment='' AND
> >  dclocal_read_repair_chance=0.000000 AND
> >  gc_grace_seconds=864000 AND
> >  index_interval=128 AND
> >  read_repair_chance=1.000000 AND
> >  replicate_on_write='true' AND
> >  populate_io_cache_on_flush='false' AND
> >  default_time_to_live=0 AND
> >  speculative_retry='NONE' AND
> >  memtable_flush_period_in_ms=0 AND
> >  compaction={'class': 'SizeTieredCompactionStrategy'} AND
> >  compression={'sstable_compression': 'LZ4Compressor'};
> >
> >
> >
> 
> 


Re: Question about READS in a multi DC environment.

Posted by Aaron Morton <aa...@thelastpickle.com>.
> >  read_repair_chance=1.000000 AND

There’s your problem. 

When read repair is active for a read request the coordinator will over read to all UP replicas. Your client request will only block waiting for the one request (the data request), the rest of the repair will happen in the background. Setting this to 1.0 will mean it’s active across the entire cluster for each read. 

Change read_repair_chance to 0 and set dclocal_read_repair_chance to 0.1 so that read repair will only happen local to the DC you are connected to.

Hope that helps. 
A


-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 12/05/2014, at 5:37 pm, DuyHai Doan <do...@gmail.com> wrote:

> Ins't read repair supposed to be done asynchronously in background ?
> 
> 
> On Mon, May 12, 2014 at 2:07 AM, graham sanderson <gr...@vast.com> wrote:
> You have a read_repair_chance of 1.0 which is probably why your query is hitting all data centers.
> 
> On May 11, 2014, at 3:44 PM, Mark Farnan <de...@petrolink.com> wrote:
> 
> > Im trying to understand READ load in Cassandra across a multi-datacenter cluster.   (Specifically why it seems to be hitting more than one DC) and hope someone can help.
> >
> > From what Iím seeing here, a READ, with Consistency LOCAL_ONE,   seems to be hitting All 3 datacenters, rather than just the one Iím connected to.   I see  'Read 101 live and 0 tombstoned cells'  from EACH of the 3 DC"s in the trace, which seems, wrong.
> > I have tried every  Consistency level, same result.   This also is same from my C# code via the DataStax driver, (where I first noticed the issue).
> >
> > Can someone please shed some light on what is occurring ?  Specifically I dont' want a query on one DC, going anywhere near the other 2 as a rule, as in production,  these DC's will be accross slower links.
> >
> >
> > Query:  (NOTE:  Whilst this uses a kairosdb table,  i'm just playing with queries against it as it has 100k columns in this key for testing).
> >
> > cqlsh:kairosdb> consistency local_one
> > Consistency level set to LOCAL_ONE.
> >
> > cqlsh:kairosdb> select * from data_points where key = 0x6d61726c796e2e746573742e74656d70340000000145b514a400726f6f6d3d6f66666963653a limit 1000;
> >
> > ... Some return data  rows listed here which I've removed ....
> >
> > <CassandraQuery.txt>
> > Query Respose Trace:
> >
> > activity                                                                                                                                 | timestamp    | source         | source_elapsed
> > ------------------------------------------------------------------------------------------------------------------------------------------+--------------+----------------+----------------
> >                                                                                                                       execute_cql3_query | 07:18:12,692 | 192.168.25.111 |              0
> >                                                                                                    Message received from /192.168.25.111 | 07:18:00,706 | 192.168.25.131 |             50
> >                                                                                          Executing single-partition query on data_points | 07:18:00,707 | 192.168.25.131 |            760
> >                                                                                                             Acquiring sstable references | 07:18:00,707 | 192.168.25.131 |            814
> >                                                                                                              Merging memtable tombstones | 07:18:00,707 | 192.168.25.131 |            924
> >                                                                                                 Bloom filter allows skipping sstable 191 | 07:18:00,707 | 192.168.25.131 |           1050
> >                                                                                                 Bloom filter allows skipping sstable 190 | 07:18:00,707 | 192.168.25.131 |           1166
> >                                                                                                            Key cache hit for sstable 189 | 07:18:00,707 | 192.168.25.131 |           1275
> >                                                                                              Seeking to partition beginning in data file | 07:18:00,707 | 192.168.25.131 |           1293
> >                                                                Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones | 07:18:00,708 | 192.168.25.131 |           2173
> >                                                                                               Merging data from memtables and 1 sstables | 07:18:00,708 | 192.168.25.131 |           2195
> >                                                                                                    Read 1001 live and 0 tombstoned cells | 07:18:00,709 | 192.168.25.131 |           3259
> >                                                                                                    Enqueuing response to /192.168.25.111 | 07:18:00,710 | 192.168.25.131 |           4006
> >                                                                                                       Sending message to /192.168.25.111 | 07:18:00,710 | 192.168.25.131 |           4210
> > Parsing select * from data_points where key = 0x6d61726c796e2e746573742e74656d70340000000145b514a400726f6f6d3d6f66666963653a limit 1000; | 07:18:12,692 | 192.168.25.111 |             52
> >                                                                                                                      Preparing statement | 07:18:12,692 | 192.168.25.111 |            257
> >                                                                                                       Sending message to /192.168.25.121 | 07:18:12,693 | 192.168.25.111 |           1099
> >                                                                                                       Sending message to /192.168.25.131 | 07:18:12,693 | 192.168.25.111 |           1254
> >                                                                                          Executing single-partition query on data_points | 07:18:12,693 | 192.168.25.111 |           1269
> >                                                                                                             Acquiring sstable references | 07:18:12,693 | 192.168.25.111 |           1284
> >                                                                                                              Merging memtable tombstones | 07:18:12,694 | 192.168.25.111 |           1315
> >                                                                                                            Key cache hit for sstable 205 | 07:18:12,694 | 192.168.25.111 |           1592
> >                                                                                              Seeking to partition beginning in data file | 07:18:12,694 | 192.168.25.111 |           1606
> >                                                                Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones | 07:18:12,695 | 192.168.25.111 |           2423
> >                                                                                               Merging data from memtables and 1 sstables | 07:18:12,695 | 192.168.25.111 |           2498
> >                                                                                                    Read 1001 live and 0 tombstoned cells | 07:18:12,695 | 192.168.25.111 |           3167
> >                                                                                                    Message received from /192.168.25.121 | 07:18:12,697 | 192.168.25.111 |           null
> >                                                                                                 Processing response from /192.168.25.121 | 07:18:12,697 | 192.168.25.111 |           null
> >                                                                                                    Message received from /192.168.25.131 | 07:18:12,699 | 192.168.25.111 |           null
> >                                                                                                 Processing response from /192.168.25.131 | 07:18:12,699 | 192.168.25.111 |           null
> >                                                                                                    Message received from /192.168.25.111 | 07:19:49,432 | 192.168.25.121 |             68
> >                                                                                          Executing single-partition query on data_points | 07:19:49,433 | 192.168.25.121 |            824
> >                                                                                                             Acquiring sstable references | 07:19:49,433 | 192.168.25.121 |            840
> >                                                                                                              Merging memtable tombstones | 07:19:49,433 | 192.168.25.121 |            898
> >                                                                                                 Bloom filter allows skipping sstable 193 | 07:19:49,433 | 192.168.25.121 |            983
> >                                                                                                            Key cache hit for sstable 192 | 07:19:49,433 | 192.168.25.121 |           1055
> >                                                                                              Seeking to partition beginning in data file | 07:19:49,433 | 192.168.25.121 |           1073
> >                                                                Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones | 07:19:49,434 | 192.168.25.121 |           1803
> >                                                                                               Merging data from memtables and 1 sstables | 07:19:49,434 | 192.168.25.121 |           1839
> >                                                                                                    Read 1001 live and 0 tombstoned cells | 07:19:49,434 | 192.168.25.121 |           2518
> >                                                                                                    Enqueuing response to /192.168.25.111 | 07:19:49,435 | 192.168.25.121 |           3026
> >                                                                                                       Sending message to /192.168.25.111 | 07:19:49,435 | 192.168.25.121 |           3128
> >                                                                                                                         Request complete | 07:18:12,696 | 192.168.25.111 |           4387
> >
> >
> > Other Stats about the cluster:
> >
> > [root@cdev101 conf]# nodetool status
> > Datacenter: DC3
> > ===============
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/Moving
> > --  Address         Load       Tokens  Owns   Host ID                               Rack
> > UN  192.168.25.131  80.67 MB   256     34.2%  6ec61643-17d4-4a2e-8c44-57e08687a957  RAC1
> > Datacenter: DC2
> > ===============
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/Moving
> > --  Address         Load       Tokens  Owns   Host ID                               Rack
> > UN  192.168.25.121  79.46 MB   256     30.6%  976626fb-ea80-405b-abb0-eae703b0074d  RAC1
> > Datacenter: DC1
> > ===============
> > Status=Up/Down
> > |/ State=Normal/Leaving/Joining/Moving
> > --  Address         Load       Tokens  Owns   Host ID                               Rack
> > UN  192.168.25.111  61.82 MB   256     35.2%  9475e2da-d926-42d0-83fb-0188d0f8f438  RAC1
> >
> >
> > cqlsh> describe keyspace kairosdb
> >
> > CREATE KEYSPACE kairosdb WITH replication = {
> >  'class': 'NetworkTopologyStrategy',
> >  'DC2': '1',
> >  'DC3': '1',
> >  'DC1': '1'
> > };
> >
> > USE kairosdb;
> >
> > CREATE TABLE data_points (
> >  key blob,
> >  column1 blob,
> >  value blob,
> >  PRIMARY KEY (key, column1)
> > ) WITH COMPACT STORAGE AND
> >  bloom_filter_fp_chance=0.010000 AND
> >  caching='KEYS_ONLY' AND
> >  comment='' AND
> >  dclocal_read_repair_chance=0.000000 AND
> >  gc_grace_seconds=864000 AND
> >  index_interval=128 AND
> >  read_repair_chance=1.000000 AND
> >  replicate_on_write='true' AND
> >  populate_io_cache_on_flush='false' AND
> >  default_time_to_live=0 AND
> >  speculative_retry='NONE' AND
> >  memtable_flush_period_in_ms=0 AND
> >  compaction={'class': 'SizeTieredCompactionStrategy'} AND
> >  compression={'sstable_compression': 'LZ4Compressor'};
> >
> >
> >
> 
>