You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Saurabh Chandolia <s....@gmail.com> on 2015/06/02 17:15:52 UTC

Different number of records from COPY command

I am seeing different number of records each time I export a particular
table. There were no writes/reads in this table while exporting the data. I
am not able to understand why it is happening.
Am I missing something here?

Cassandra version: 2.1.4
Java driver version: 2.1.5
Cluster Size: 4 Nodes in same DC
Keyspace Replication factor: 2

Following commands were issued:
cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 68000 rows; Write: 3025.93 rows/s
68682 rows exported in 27.737 seconds.

cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 65000 rows; Write: 2821.06 rows/s
65535 rows exported in 26.667 seconds.

cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 66000 rows; Write: 3285.07 rows/s
66055 rows exported in 26.269 seconds.


cfstats for adlog.adclicklog20150528:
-------------------------------------------
$ nodetool cfstats adlog.adclicklog20150528
Keyspace: adlog
Read Count: 217
Read Latency: 2.773073732718894 ms.
Write Count: 103191
Write Latency: 0.10233075558915021 ms.
Pending Flushes: 0
Table: adclicklog20150528
SSTable count: 11
Space used (live): 37981202
Space used (total): 37981202
Space used by snapshots (total): 13407843
Off heap memory used (total): 25580
SSTable Compression Ratio: 0.26684147550494164
Number of keys (estimate): 5627
Memtable cell count: 94620
Memtable data size: 13459445
Memtable off heap memory used: 0
Memtable switch count: 19
Local read count: 217
Local read latency: 2.774 ms
Local write count: 103191
Local write latency: 0.103 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 7192
Bloom filter off heap memory used: 7104
Index summary off heap memory used: 980
Compression metadata off heap memory used: 17496
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 182785
Compacted partition mean bytes: 27808
Average live cells per slice (last five minutes): 44.663594470046085
Maximum live cells per slice (last five minutes): 86.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0

----------------

- Saurabh

RE: Different number of records from COPY command

Posted by "Vanlerberghe, Luc" <Lu...@bvdinfo.com>.
You’re probably hitting https://issues.apache.org/jira/browse/CASSANDRA-8940: Inconsistent select count and select distinct
It’s resolved (as I understand, a non-thread-safe object was shared between threads) and the patch will be included in 2.1.6 and 2.0.16

It’s a showstopper for me too: while developing I sometimes need to rebuild stuff based on the complete dataset (should become *very* rare in production, but still).
However, as long as this bug is around, I can never be sure all records are included.

Unfortunately, I don’t see any schedule for releasing either version…

Luc


From: Josef Lindman Hörnlund [mailto:josef@appdata.biz]
Sent: woensdag 3 juni 2015 12:16
To: user@cassandra.apache.org
Subject: Re: Different number of records from COPY command


I ran into that issue a while ago and it was because I hit the tombstone limit on one of the nodes. Try running `nodetool compact adlog 'adclicklog20150528.csv` and see if that helps.

Josef Lindman Hörnlund

On 02 Jun 2015, at 17:48, Saurabh Chandolia <s....@gmail.com>> wrote:

Still getting inconsistent number of records on consistency ALL and QUORUM. Following is the output of consistency ALL and QUORUM.

cqlsh:adlog> CONSISTENCY ALL;
Consistency level set to ALL.
cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 58000 rows; Write: 3065.60 rows/s
58463 rows exported in 21.353 seconds.
cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 63000 rows; Write: 3517.03 rows/s
63972 rows exported in 22.885 seconds.

cqlsh:adlog> CONSISTENCY QUORUM ;
Consistency level set to QUORUM.
cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 63000 rows; Write: 3443.37 rows/s
63440 rows exported in 21.987 seconds.
cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 65000 rows; Write: 3405.90 rows/s
65524 rows exported in 24.053 seconds.


- Saurabh

On Tue, Jun 2, 2015 at 9:09 PM, Anuj Wadehra <an...@yahoo.co.in>> wrote:
I have never exported data myself but can u just try setting 'consistency ALL' on cqlsh before executing command?

Thanks
Anuj Wadehra
Sent from Yahoo Mail on Android<https://overview.mail.yahoo.com/mobile/?.src=Android>
________________________________
From:"Saurabh Chandolia" <s....@gmail.com>>
Date:Tue, 2 Jun, 2015 at 8:47 pm
Subject:Different number of records from COPY command
I am seeing different number of records each time I export a particular table. There were no writes/reads in this table while exporting the data. I am not able to understand why it is happening.
Am I missing something here?

Cassandra version: 2.1.4
Java driver version: 2.1.5
Cluster Size: 4 Nodes in same DC
Keyspace Replication factor: 2

Following commands were issued:
cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 68000 rows; Write: 3025.93 rows/s
68682 rows exported in 27.737 seconds.

cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 65000 rows; Write: 2821.06 rows/s
65535 rows exported in 26.667 seconds.

cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 66000 rows; Write: 3285.07 rows/s
66055 rows exported in 26.269 seconds.


cfstats for adlog.adclicklog20150528:
-------------------------------------------
$ nodetool cfstats adlog.adclicklog20150528
Keyspace: adlog
Read Count: 217
Read Latency: 2.773073732718894 ms.
Write Count: 103191
Write Latency: 0.10233075558915021 ms.
Pending Flushes: 0
Table: adclicklog20150528
SSTable count: 11
Space used (live): 37981202
Space used (total): 37981202
Space used by snapshots (total): 13407843
Off heap memory used (total): 25580
SSTable Compression Ratio: 0.26684147550494164
Number of keys (estimate): 5627
Memtable cell count: 94620
Memtable data size: 13459445
Memtable off heap memory used: 0
Memtable switch count: 19
Local read count: 217
Local read latency: 2.774 ms
Local write count: 103191
Local write latency: 0.103 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 7192
Bloom filter off heap memory used: 7104
Index summary off heap memory used: 980
Compression metadata off heap memory used: 17496
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 182785
Compacted partition mean bytes: 27808
Average live cells per slice (last five minutes): 44.663594470046085
Maximum live cells per slice (last five minutes): 86.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0

----------------

- Saurabh





Re: Different number of records from COPY command

Posted by Josef Lindman Hörnlund <jo...@appdata.biz>.
I ran into that issue a while ago and it was because I hit the tombstone limit on one of the nodes. Try running `nodetool compact adlog 'adclicklog20150528.csv` and see if that helps.

Josef Lindman Hörnlund

> On 02 Jun 2015, at 17:48, Saurabh Chandolia <s....@gmail.com> wrote:
> 
> Still getting inconsistent number of records on consistency ALL and QUORUM. Following is the output of consistency ALL and QUORUM.
> 
> cqlsh:adlog> CONSISTENCY ALL;
> Consistency level set to ALL.
> cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
> Processed 58000 rows; Write: 3065.60 rows/s
> 58463 rows exported in 21.353 seconds.
> cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
> Processed 63000 rows; Write: 3517.03 rows/s
> 63972 rows exported in 22.885 seconds.
> 
> cqlsh:adlog> CONSISTENCY QUORUM ;
> Consistency level set to QUORUM.
> cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
> Processed 63000 rows; Write: 3443.37 rows/s
> 63440 rows exported in 21.987 seconds.
> cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
> Processed 65000 rows; Write: 3405.90 rows/s
> 65524 rows exported in 24.053 seconds.
> 
> 
> - Saurabh
> 
> On Tue, Jun 2, 2015 at 9:09 PM, Anuj Wadehra <anujw_2003@yahoo.co.in <ma...@yahoo.co.in>> wrote:
> I have never exported data myself but can u just try setting 'consistency ALL' on cqlsh before executing command?
> 
> Thanks
> Anuj Wadehra
> 
> Sent from Yahoo Mail on Android <https://overview.mail.yahoo.com/mobile/?.src=Android>
> From:"Saurabh Chandolia" <s.chandolia@gmail.com <ma...@gmail.com>>
> Date:Tue, 2 Jun, 2015 at 8:47 pm
> Subject:Different number of records from COPY command
> 
> I am seeing different number of records each time I export a particular table. There were no writes/reads in this table while exporting the data. I am not able to understand why it is happening.
> Am I missing something here?
> 
> Cassandra version: 2.1.4
> Java driver version: 2.1.5
> Cluster Size: 4 Nodes in same DC
> Keyspace Replication factor: 2
> 
> Following commands were issued:
> cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
> Processed 68000 rows; Write: 3025.93 rows/s
> 68682 rows exported in 27.737 seconds.
> 
> cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
> Processed 65000 rows; Write: 2821.06 rows/s
> 65535 rows exported in 26.667 seconds.
> 
> cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
> Processed 66000 rows; Write: 3285.07 rows/s
> 66055 rows exported in 26.269 seconds.
> 
> 
> cfstats for adlog.adclicklog20150528:
> -------------------------------------------
> $ nodetool cfstats adlog.adclicklog20150528
> Keyspace: adlog
> 	Read Count: 217
> 	Read Latency: 2.773073732718894 ms.
> 	Write Count: 103191
> 	Write Latency: 0.10233075558915021 ms.
> 	Pending Flushes: 0
> 		Table: adclicklog20150528
> 		SSTable count: 11
> 		Space used (live): 37981202
> 		Space used (total): 37981202
> 		Space used by snapshots (total): 13407843
> 		Off heap memory used (total): 25580
> 		SSTable Compression Ratio: 0.26684147550494164
> 		Number of keys (estimate): 5627
> 		Memtable cell count: 94620
> 		Memtable data size: 13459445
> 		Memtable off heap memory used: 0
> 		Memtable switch count: 19
> 		Local read count: 217
> 		Local read latency: 2.774 ms
> 		Local write count: 103191
> 		Local write latency: 0.103 ms
> 		Pending flushes: 0
> 		Bloom filter false positives: 0
> 		Bloom filter false ratio: 0.00000
> 		Bloom filter space used: 7192
> 		Bloom filter off heap memory used: 7104
> 		Index summary off heap memory used: 980
> 		Compression metadata off heap memory used: 17496
> 		Compacted partition minimum bytes: 1110
> 	
> 	Compacted partition maximum bytes: 182785
> 		Compacted partition mean bytes: 27808
> 		Average live cells per slice (last five minutes): 44.663594470046085
> 		Maximum live cells per slice (last five minutes): 86.0
> 		Average tombstones per slice (last five minutes): 0.0
> 		Maximum tombstones per slice (last five minutes): 0.0
> 
> ----------------
> 
> - Saurabh
> 


Re: Different number of records from COPY command

Posted by Saurabh Chandolia <s....@gmail.com>.
Still getting inconsistent number of records on consistency ALL and QUORUM.
Following is the output of consistency ALL and QUORUM.

cqlsh:adlog> CONSISTENCY ALL;
Consistency level set to ALL.
cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 58000 rows; Write: 3065.60 rows/s
58463 rows exported in 21.353 seconds.
cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 63000 rows; Write: 3517.03 rows/s
63972 rows exported in 22.885 seconds.

cqlsh:adlog> CONSISTENCY QUORUM ;
Consistency level set to QUORUM.
cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 63000 rows; Write: 3443.37 rows/s
63440 rows exported in 21.987 seconds.
cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
Processed 65000 rows; Write: 3405.90 rows/s
65524 rows exported in 24.053 seconds.


- Saurabh

On Tue, Jun 2, 2015 at 9:09 PM, Anuj Wadehra <an...@yahoo.co.in> wrote:

> I have never exported data myself but can u just try setting 'consistency
> ALL' on cqlsh before executing command?
>
> Thanks
> Anuj Wadehra
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> ------------------------------
>   *From*:"Saurabh Chandolia" <s....@gmail.com>
> *Date*:Tue, 2 Jun, 2015 at 8:47 pm
> *Subject*:Different number of records from COPY command
>
> I am seeing different number of records each time I export a particular
> table. There were no writes/reads in this table while exporting the data. I
> am not able to understand why it is happening.
> Am I missing something here?
>
> Cassandra version: 2.1.4
> Java driver version: 2.1.5
> Cluster Size: 4 Nodes in same DC
> Keyspace Replication factor: 2
>
> Following commands were issued:
> cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
> Processed 68000 rows; Write: 3025.93 rows/s
> 68682 rows exported in 27.737 seconds.
>
> cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
> Processed 65000 rows; Write: 2821.06 rows/s
> 65535 rows exported in 26.667 seconds.
>
> cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';
> Processed 66000 rows; Write: 3285.07 rows/s
> 66055 rows exported in 26.269 seconds.
>
>
> cfstats for adlog.adclicklog20150528:
> -------------------------------------------
> $ nodetool cfstats adlog.adclicklog20150528
> Keyspace: adlog
> Read Count: 217
> Read Latency: 2.773073732718894 ms.
> Write Count: 103191
> Write Latency: 0.10233075558915021 ms.
> Pending Flushes: 0
> Table: adclicklog20150528
> SSTable count: 11
> Space used (live): 37981202
> Space used (total): 37981202
> Space used by snapshots (total): 13407843
> Off heap memory used (total): 25580
> SSTable Compression Ratio: 0.26684147550494164
> Number of keys (estimate): 5627
> Memtable cell count: 94620
> Memtable data size: 13459445
> Memtable off heap memory used: 0
> Memtable switch count: 19
> Local read count: 217
> Local read latency: 2.774 ms
> Local write count: 103191
> Local write latency: 0.103 ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.00000
> Bloom filter space used: 7192
> Bloom filter off heap memory used: 7104
> Index summary off heap memory used: 980
> Compression metadata off heap memory used: 17496
> Compacted partition minimum bytes: 1110
> Compacted partition maximum bytes: 182785
> Compacted partition mean bytes: 27808
> Average live cells per slice (last five minutes): 44.663594470046085
> Maximum live cells per slice (last five minutes): 86.0
> Average tombstones per slice (last five minutes): 0.0
> Maximum tombstones per slice (last five minutes): 0.0
>
> ----------------
>
> - Saurabh
>

Re: Different number of records from COPY command

Posted by Anuj Wadehra <an...@yahoo.co.in>.
I have never exported data myself but can u just try setting 'consistency ALL' on cqlsh before executing command?


Thanks

Anuj Wadehra

Sent from Yahoo Mail on Android

From:"Saurabh Chandolia" <s....@gmail.com>
Date:Tue, 2 Jun, 2015 at 8:47 pm
Subject:Different number of records from COPY command

I am seeing different number of records each time I export a particular table. There were no writes/reads in this table while exporting the data. I am not able to understand why it is happening.

Am I missing something here?


Cassandra version: 2.1.4

Java driver version: 2.1.5

Cluster Size: 4 Nodes in same DC

Keyspace Replication factor: 2


Following commands were issued:

cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';

Processed 68000 rows; Write: 3025.93 rows/s

68682 rows exported in 27.737 seconds.


cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';

Processed 65000 rows; Write: 2821.06 rows/s

65535 rows exported in 26.667 seconds.


cqlsh:adlog> copy adclicklog20150528 (imprid) TO 'adclicklog20150528.csv';

Processed 66000 rows; Write: 3285.07 rows/s

66055 rows exported in 26.269 seconds.



cfstats for adlog.adclicklog20150528:

-------------------------------------------

$ nodetool cfstats adlog.adclicklog20150528

Keyspace: adlog

	Read Count: 217

	Read Latency: 2.773073732718894 ms.

	Write Count: 103191

	Write Latency: 0.10233075558915021 ms.

	Pending Flushes: 0

		Table: adclicklog20150528

		SSTable count: 11

		Space used (live): 37981202

		Space used (total): 37981202

		Space used by snapshots (total): 13407843

		Off heap memory used (total): 25580

		SSTable Compression Ratio: 0.26684147550494164

		Number of keys (estimate): 5627

		Memtable cell count: 94620

		Memtable data size: 13459445

		Memtable off heap memory used: 0

		Memtable switch count: 19

		Local read count: 217

		Local read latency: 2.774 ms

		Local write count: 103191

		Local write latency: 0.103 ms

		Pending flushes: 0

		Bloom filter false positives: 0

		Bloom filter false ratio: 0.00000

		Bloom filter space used: 7192

		Bloom filter off heap memory used: 7104

		Index summary off heap memory used: 980

		Compression metadata off heap memory used: 17496

		Compacted partition minimum bytes: 1110

		Compacted partition maximum bytes: 182785

		Compacted partition mean bytes: 27808

		Average live cells per slice (last five minutes): 44.663594470046085

		Maximum live cells per slice (last five minutes): 86.0

		Average tombstones per slice (last five minutes): 0.0

		Maximum tombstones per slice (last five minutes): 0.0


----------------


- Saurabh