You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Joe Obernberger <jo...@gmail.com> on 2020/12/02 15:55:23 UTC

Digest mismatch

Hi All - this is my first post here.  I've been using Cassandra for 
several months now and am loving it.  We are moving from Apache HBase to 
Cassandra for a big data analytics platform.

I'm using java to get rows from Cassandra and very frequently get a 
java.util.NoSuchElementException when iterating through a ResultSet.  If 
I retry this query again (often several times), it works.  The debug log 
on the Cassandra nodes show this message:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
DecoratedKey

My cluster looks like this:

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host 
ID                               Rack
UN  172.16.100.224  340.5 GiB  512          50.9% 
8ba646ac-2b33-49de-a220-ae9842f18806  rack1
UN  172.16.100.208  269.19 GiB  384          40.3% 
4e0ba42f-649b-425a-857a-34497eb3036e  rack1
UN  172.16.100.225  282.83 GiB  512          50.4% 
247f3d70-d13b-4d68-9a53-2ed58e01a63e  rack1
UN  172.16.110.3    409.78 GiB  768          63.2% 
0abea102-06d2-4309-af36-a3163e8f00d8  rack1
UN  172.16.110.4    330.15 GiB  512          50.6% 
2a5ae735-6304-4e99-924b-44d9d5ec86b7  rack1
UN  172.16.100.253  98.88 GiB  128          14.6% 
6b528b0b-d7f7-4378-bba8-1857802d4f18  rack1
UN  172.16.100.254  204.5 GiB  256          30.0% 
87d0cb48-a57d-460e-bd82-93e6e52e93ea  rack1

I suspect this has to do with how I'm using consistency levels? 
Typically I'm using ONE.  I just set the dclocal_read_repair_chance to 
0.0, but I'm still seeing the issue.  Any help/tips?

Thank you!

-Joe Obernberger


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Digest mismatch

Posted by Joe Obernberger <jo...@gmail.com>.

Python eh?  What's that?  Kidding.  (Java guy over here...)

I grepped the logs for mutations but only see messages like:

2020-09-14 16:15:19,963 CommitLog.java:149 - Log replay complete, 0 
replayed mutations
and
2020-09-17 16:22:13,020 CommitLog.java:149 - Log replay complete, 291708 
replayed mutations

Typically, we read very soon after the write, which I thought was a 
problem also; however at this point it's been 24+ hours since the data 
has been written that I'm now trying to read.  Happens very easily.
By determining the partition key, how will that help?

-Joe

On 12/2/2020 12:16 PM, Steve Lacerda wrote:
> The digest mismatch typically shows the partition key info, with 
> something like this:
>
> DecoratedKey(-1671292413668442751, 48343732322d3838353032)
>
> That refers to the partition key, which you can gather like so:
>
> python
> import binascii
> binascii.unhexlify('48343732322d3838353032')
> 'H4722-88502'
>
> My assumption is that since you are reading and writing with one, that 
> some nodes have the data and others don't. Are you seeing any dropped 
> mutations in the logs? How long after the write are you attempting to 
> read the same data?
>
>
>
>
>
>
> On Wed, Dec 2, 2020 at 9:12 AM Joe Obernberger 
> <joseph.obernberger@gmail.com <ma...@gmail.com>> 
> wrote:
>
>     Hi Carl - thank you for replying.
>     I am using Cassandra 3.11.9-1
>
>     Rows are not typically being deleted - I assume you're referring
>     to Tombstones.  I don't think that should be the case here as I
>     don't think we've deleted anything here.
>     This is a test cluster and some of the machines are small (hence
>     the one node with 128 tokens and 14.6% - it has a lot less disk
>     space than the other nodes).  This is one of the features that I
>     really like with Cassandra - being able to size nodes based on
>     disk/CPU/RAM.
>
>     All data is currently written with ONE.  All data is read with
>     ONE.  I can replicate this issue at will, so can try different
>     things easily.  I tried changing the read process to use QUORUM
>     and the issue still takes place. Right now I'm running a 'nodetool
>     repair' to see if that helps.  Our largest table 'doc' has the
>     following stats:
>
>     Table: doc
>     SSTable count: 28
>     Space used (live): 113609995010
>     Space used (total): 113609995010
>     Space used by snapshots (total): 0
>     Off heap memory used (total): 225006197
>     SSTable Compression Ratio: 0.37730474570644196
>     Number of partitions (estimate): 93641747
>     Memtable cell count: 0
>     Memtable data size: 0
>     Memtable off heap memory used: 0
>     Memtable switch count: 3712
>     Local read count: 891065091
>     Local read latency: NaN ms
>     Local write count: 7448281135
>     Local write latency: NaN ms
>     Pending flushes: 0
>     Percent repaired: 0.0
>     Bloom filter false positives: 988
>     Bloom filter false ratio: 0.00001
>     Bloom filter space used: 151149880
>     Bloom filter off heap memory used: 151149656
>     Index summary off heap memory used: 38654701
>     Compression metadata off heap memory used: 35201840
>     Compacted partition minimum bytes: 104
>     Compacted partition maximum bytes: 3379391
>     Compacted partition mean bytes: 3389
>     Average live cells per slice (last five minutes): NaN
>     Maximum live cells per slice (last five minutes): 0
>     Average tombstones per slice (last five minutes): NaN
>     Maximum tombstones per slice (last five minutes): 0
>     Dropped Mutations: 8174438
>
>     Thoughts/ideas?  Thank you!
>
>     -Joe
>
>     On 12/2/2020 11:49 AM, Carl Mueller wrote:
>>     Why is one of your nodes only at 14.6% ownership? That's weird,
>>     unless you have a small rowcount.
>>
>>     Are you frequently deleting rows? Are you frequently writing rows
>>     at ONE?
>>
>>     What version of cassandra?
>>
>>
>>
>>     On Wed, Dec 2, 2020 at 9:56 AM Joe Obernberger
>>     <joseph.obernberger@gmail.com
>>     <ma...@gmail.com>> wrote:
>>
>>         Hi All - this is my first post here.  I've been using
>>         Cassandra for
>>         several months now and am loving it.  We are moving from
>>         Apache HBase to
>>         Cassandra for a big data analytics platform.
>>
>>         I'm using java to get rows from Cassandra and very frequently
>>         get a
>>         java.util.NoSuchElementException when iterating through a
>>         ResultSet.  If
>>         I retry this query again (often several times), it works. 
>>         The debug log
>>         on the Cassandra nodes show this message:
>>         org.apache.cassandra.service.DigestMismatchException:
>>         Mismatch for key
>>         DecoratedKey
>>
>>         My cluster looks like this:
>>
>>         Datacenter: datacenter1
>>         =======================
>>         Status=Up/Down
>>         |/ State=Normal/Leaving/Joining/Moving
>>         --  Address         Load       Tokens       Owns (effective) 
>>         Host
>>         ID                               Rack
>>         UN  172.16.100.224  340.5 GiB  512          50.9%
>>         8ba646ac-2b33-49de-a220-ae9842f18806  rack1
>>         UN  172.16.100.208  269.19 GiB  384          40.3%
>>         4e0ba42f-649b-425a-857a-34497eb3036e  rack1
>>         UN  172.16.100.225  282.83 GiB  512          50.4%
>>         247f3d70-d13b-4d68-9a53-2ed58e01a63e  rack1
>>         UN  172.16.110.3    409.78 GiB  768          63.2%
>>         0abea102-06d2-4309-af36-a3163e8f00d8  rack1
>>         UN  172.16.110.4    330.15 GiB  512          50.6%
>>         2a5ae735-6304-4e99-924b-44d9d5ec86b7  rack1
>>         UN  172.16.100.253  98.88 GiB  128          14.6%
>>         6b528b0b-d7f7-4378-bba8-1857802d4f18  rack1
>>         UN  172.16.100.254  204.5 GiB  256          30.0%
>>         87d0cb48-a57d-460e-bd82-93e6e52e93ea  rack1
>>
>>         I suspect this has to do with how I'm using consistency levels?
>>         Typically I'm using ONE.  I just set the
>>         dclocal_read_repair_chance to
>>         0.0, but I'm still seeing the issue.  Any help/tips?
>>
>>         Thank you!
>>
>>         -Joe Obernberger
>>
>>
>>         ---------------------------------------------------------------------
>>         To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>         <ma...@cassandra.apache.org>
>>         For additional commands, e-mail:
>>         user-help@cassandra.apache.org
>>         <ma...@cassandra.apache.org>
>>
>>
>>     <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.avg.com_email-2Dsignature-3Futm-5Fmedium-3Demail-26utm-5Fsource-3Dlink-26utm-5Fcampaign-3Dsig-2Demail-26utm-5Fcontent-3Demailclient&d=DwMDaQ&c=adz96Xi0w1RHqtPMowiL2g&r=R58SsZ6FLB8iCRFGJzNOH0d2HRPVtaWKKj5fzuMiGlo&m=ILue9yYCY9fLwFNLsm3-mIbyPh6ehPGUPwbWBgqtxe4&s=FmyWQpErRafN779unRB23GMeoiN49uZNIZvPDcD1iVs&e=>
>>     	Virus-free. www.avg.com
>>     <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.avg.com_email-2Dsignature-3Futm-5Fmedium-3Demail-26utm-5Fsource-3Dlink-26utm-5Fcampaign-3Dsig-2Demail-26utm-5Fcontent-3Demailclient&d=DwMDaQ&c=adz96Xi0w1RHqtPMowiL2g&r=R58SsZ6FLB8iCRFGJzNOH0d2HRPVtaWKKj5fzuMiGlo&m=ILue9yYCY9fLwFNLsm3-mIbyPh6ehPGUPwbWBgqtxe4&s=FmyWQpErRafN779unRB23GMeoiN49uZNIZvPDcD1iVs&e=>
>>
>>
>>     <#m_-330889386023926826_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
>
>
> -- 
> Steve Lacerda
> e. steve.lacerda@datastax.com <ma...@datastax.com>
> w. www.datastax.com <http://www.datastax.com>
>

Re: Digest mismatch

Posted by Steve Lacerda <st...@datastax.com>.

The digest mismatch typically shows the partition key info, with something
like this:

DecoratedKey(-1671292413668442751, 48343732322d3838353032)

That refers to the partition key, which you can gather like so:

python
import binascii
binascii.unhexlify('48343732322d3838353032')
'H4722-88502'

My assumption is that since you are reading and writing with one, that some
nodes have the data and others don't. Are you seeing any dropped mutations
in the logs? How long after the write are you attempting to read the same
data?






On Wed, Dec 2, 2020 at 9:12 AM Joe Obernberger <jo...@gmail.com>
wrote:

> Hi Carl - thank you for replying.
> I am using Cassandra 3.11.9-1
>
> Rows are not typically being deleted - I assume you're referring to
> Tombstones.  I don't think that should be the case here as I don't think
> we've deleted anything here.
> This is a test cluster and some of the machines are small (hence the one
> node with 128 tokens and 14.6% - it has a lot less disk space than the
> other nodes).  This is one of the features that I really like with
> Cassandra - being able to size nodes based on disk/CPU/RAM.
>
> All data is currently written with ONE.  All data is read with ONE.  I can
> replicate this issue at will, so can try different things easily.  I tried
> changing the read process to use QUORUM and the issue still takes place.
> Right now I'm running a 'nodetool repair' to see if that helps.  Our
> largest table 'doc' has the following stats:
>
> Table: doc
> SSTable count: 28
> Space used (live): 113609995010
> Space used (total): 113609995010
> Space used by snapshots (total): 0
> Off heap memory used (total): 225006197
> SSTable Compression Ratio: 0.37730474570644196
> Number of partitions (estimate): 93641747
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 3712
> Local read count: 891065091
> Local read latency: NaN ms
> Local write count: 7448281135
> Local write latency: NaN ms
> Pending flushes: 0
> Percent repaired: 0.0
> Bloom filter false positives: 988
> Bloom filter false ratio: 0.00001
> Bloom filter space used: 151149880
> Bloom filter off heap memory used: 151149656
> Index summary off heap memory used: 38654701
> Compression metadata off heap memory used: 35201840
> Compacted partition minimum bytes: 104
> Compacted partition maximum bytes: 3379391
> Compacted partition mean bytes: 3389
> Average live cells per slice (last five minutes): NaN
> Maximum live cells per slice (last five minutes): 0
> Average tombstones per slice (last five minutes): NaN
> Maximum tombstones per slice (last five minutes): 0
> Dropped Mutations: 8174438
>
> Thoughts/ideas?  Thank you!
>
> -Joe
> On 12/2/2020 11:49 AM, Carl Mueller wrote:
>
> Why is one of your nodes only at 14.6% ownership? That's weird, unless you
> have a small rowcount.
>
> Are you frequently deleting rows? Are you frequently writing rows at ONE?
>
> What version of cassandra?
>
>
>
> On Wed, Dec 2, 2020 at 9:56 AM Joe Obernberger <
> joseph.obernberger@gmail.com> wrote:
>
>> Hi All - this is my first post here.  I've been using Cassandra for
>> several months now and am loving it.  We are moving from Apache HBase to
>> Cassandra for a big data analytics platform.
>>
>> I'm using java to get rows from Cassandra and very frequently get a
>> java.util.NoSuchElementException when iterating through a ResultSet.  If
>> I retry this query again (often several times), it works.  The debug log
>> on the Cassandra nodes show this message:
>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>> DecoratedKey
>>
>> My cluster looks like this:
>>
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load       Tokens       Owns (effective)  Host
>> ID                               Rack
>> UN  172.16.100.224  340.5 GiB  512          50.9%
>> 8ba646ac-2b33-49de-a220-ae9842f18806  rack1
>> UN  172.16.100.208  269.19 GiB  384          40.3%
>> 4e0ba42f-649b-425a-857a-34497eb3036e  rack1
>> UN  172.16.100.225  282.83 GiB  512          50.4%
>> 247f3d70-d13b-4d68-9a53-2ed58e01a63e  rack1
>> UN  172.16.110.3    409.78 GiB  768          63.2%
>> 0abea102-06d2-4309-af36-a3163e8f00d8  rack1
>> UN  172.16.110.4    330.15 GiB  512          50.6%
>> 2a5ae735-6304-4e99-924b-44d9d5ec86b7  rack1
>> UN  172.16.100.253  98.88 GiB  128          14.6%
>> 6b528b0b-d7f7-4378-bba8-1857802d4f18  rack1
>> UN  172.16.100.254  204.5 GiB  256          30.0%
>> 87d0cb48-a57d-460e-bd82-93e6e52e93ea  rack1
>>
>> I suspect this has to do with how I'm using consistency levels?
>> Typically I'm using ONE.  I just set the dclocal_read_repair_chance to
>> 0.0, but I'm still seeing the issue.  Any help/tips?
>>
>> Thank you!
>>
>> -Joe Obernberger
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>
>
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.avg.com_email-2Dsignature-3Futm-5Fmedium-3Demail-26utm-5Fsource-3Dlink-26utm-5Fcampaign-3Dsig-2Demail-26utm-5Fcontent-3Demailclient&d=DwMDaQ&c=adz96Xi0w1RHqtPMowiL2g&r=R58SsZ6FLB8iCRFGJzNOH0d2HRPVtaWKKj5fzuMiGlo&m=ILue9yYCY9fLwFNLsm3-mIbyPh6ehPGUPwbWBgqtxe4&s=FmyWQpErRafN779unRB23GMeoiN49uZNIZvPDcD1iVs&e=> Virus-free.
> www.avg.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.avg.com_email-2Dsignature-3Futm-5Fmedium-3Demail-26utm-5Fsource-3Dlink-26utm-5Fcampaign-3Dsig-2Demail-26utm-5Fcontent-3Demailclient&d=DwMDaQ&c=adz96Xi0w1RHqtPMowiL2g&r=R58SsZ6FLB8iCRFGJzNOH0d2HRPVtaWKKj5fzuMiGlo&m=ILue9yYCY9fLwFNLsm3-mIbyPh6ehPGUPwbWBgqtxe4&s=FmyWQpErRafN779unRB23GMeoiN49uZNIZvPDcD1iVs&e=>
> <#m_-330889386023926826_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
>

-- 
Steve Lacerda
e. steve.lacerda@datastax.com
w. www.datastax.com

Re: Digest mismatch

Posted by Joe Obernberger <jo...@gmail.com>.

Thank you.  OK - I can see from 'nodetool getendpoints keyspace table 
key' that 3 nodes respond as one would expect.  My theory is that once I 
encounter the error, a read repair is triggered, and by the time I 
execute nodetool, 3 nodes respond.

I tried a test with the same table, but with LOCAL_QUORUM on reads and 
writes of new data, and it works.  Thank you all for that!  If I don't 
care which version of the data is returned, then I should be able to use 
ONE on reads, if LOCAL_QUORUM was used on writes - yes?

-Joe

On 12/3/2020 12:49 AM, Erick Ramirez wrote:
>
>     Thank you Steve - once I have the key, how do I get to a node?
>
> Run this command to determine which replicas own the partition:
>
> $ nodetool getendpoints <partition_key>
>
>     So if the propagation has not taken place and a node doesn't have
>     the data and is the first to 'be asked' the client will get no data?
>
> That's correct. It will not return data it doesn't have when querying 
> with a consistency of ONE. There are limited cases where ONE is 
> applicable. In most cases, a strong consistency of LOCAL_QUORUM is 
> recommended to avoid the scenario you described. Cheers!
>
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> 
> 	Virus-free. www.avg.com 
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> 
>
>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Digest mismatch

Posted by Joe Obernberger <jo...@gmail.com>.

Some more info.

 From java using the Datastax 4.9.0 driver, I'm selecting an entire 
table, after about 17 million rows (the table is probably around 150 
million rows), I get:

com.datastax.oss.driver.api.core.servererrors.ReadFailureException: 
Cassandra failure during read query at consistency ONE (1 responses were 
required but only 0 replica responded, 1 failed)

It's almost as if the data was not written with LOCAL_QUORUM, but I've 
triple checked.

If I stop writes to the table and reduce the load on Cassandra, then it 
(java program) works OK.  Presto queries still fail, but that might be a 
Presto issue.  Interestingly they sometimes fail quickly, coming back 
with the 'Cassandra failure during read query' error very quickly, but 
sometimes go through 140 million rows and then die.

Are regular table repairs required to be run when using LOCAL_QUORUM?  I 
see no nodes down, or disk failures.

-Joe

On 12/14/2020 9:41 AM, Joe Obernberger wrote:
>
> Thanks all for the help on this.  I've changed all my writes to 
> LOCAL_QUORUM, and same with reads.  Under a constant load of doing 
> writes to a table and reads from the same table, I'm still getting the:
>
> DEBUG [ReadRepairStage:372] 2020-12-14 09:36:09,002 
> ReadCallback.java:244 - Digest mismatch:
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
> DecoratedKey(-7287062361589376757, 
> 44535f313034335f333332353839305f323032302d31322d31325430302d31392d33312e3330335a) 
> (054250ecd7170b1707ec36c6f1798ed0 vs 5752eec36bff050dd363b7803c500a95)
>         at 
> org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) 
> ~[apache-cassandra-3.11.9.jar:3.11.9]
>         at 
> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:235) 
> ~[apache-cassandra-3.11.9.jar:3.11.9]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
> [na:1.8.0_272]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
> [na:1.8.0_272]
>         at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) 
> [apache-cassandra-3.11.9.jar:3.11.9]
>         at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_272]
>
> Under load this happens a lot; several times a second on each of the 
> server nodes.  I started with a new table and under light load, it 
> worked wonderfully - no issues.  But under heavy load, it still 
> occurs.  Is there a different setting?
> Also, when this happens, I cannot query the table from presto as I 
> then get the familiar:
>
> "Query 20201214_143949_00000_b3fnt failed: Cassandra timeout during 
> read query at consistency LOCAL_QUORUM (2 responses were required but 
> only 1 replica responded)"
>
> Changed presto to use ONE results in an error about 1 were required, 
> but only 1 responded.
>
> Any ideas?  Things to try?  Thanks!
>
> -Joe
>
> On 12/3/2020 12:49 AM, Erick Ramirez wrote:
>>
>>     Thank you Steve - once I have the key, how do I get to a node?
>>
>> Run this command to determine which replicas own the partition:
>>
>> $ nodetool getendpoints <partition_key>
>>
>>     So if the propagation has not taken place and a node doesn't have
>>     the data and is the first to 'be asked' the client will get no data?
>>
>> That's correct. It will not return data it doesn't have when querying 
>> with a consistency of ONE. There are limited cases where ONE is 
>> applicable. In most cases, a strong consistency of LOCAL_QUORUM is 
>> recommended to avoid the scenario you described. Cheers!
>>
>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> 
>> 	Virus-free. www.avg.com 
>> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> 
>>
>>
>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Digest mismatch

Posted by Joe Obernberger <jo...@gmail.com>.

Thanks all for the help on this.  I've changed all my writes to 
LOCAL_QUORUM, and same with reads.  Under a constant load of doing 
writes to a table and reads from the same table, I'm still getting the:

DEBUG [ReadRepairStage:372] 2020-12-14 09:36:09,002 
ReadCallback.java:244 - Digest mismatch:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
DecoratedKey(-7287062361589376757, 
44535f313034335f333332353839305f323032302d31322d31325430302d31392d33312e3330335a) 
(054250ecd7170b1707ec36c6f1798ed0 vs 5752eec36bff050dd363b7803c500a95)
         at 
org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) 
~[apache-cassandra-3.11.9.jar:3.11.9]
         at 
org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:235) 
~[apache-cassandra-3.11.9.jar:3.11.9]
         at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_272]
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_272]
         at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) 
[apache-cassandra-3.11.9.jar:3.11.9]
         at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_272]

Under load this happens a lot; several times a second on each of the 
server nodes.  I started with a new table and under light load, it 
worked wonderfully - no issues.  But under heavy load, it still occurs.  
Is there a different setting?
Also, when this happens, I cannot query the table from presto as I then 
get the familiar:

"Query 20201214_143949_00000_b3fnt failed: Cassandra timeout during read 
query at consistency LOCAL_QUORUM (2 responses were required but only 1 
replica responded)"

Changed presto to use ONE results in an error about 1 were required, but 
only 1 responded.

Any ideas?  Things to try?  Thanks!

-Joe

On 12/3/2020 12:49 AM, Erick Ramirez wrote:
>
>     Thank you Steve - once I have the key, how do I get to a node?
>
> Run this command to determine which replicas own the partition:
>
> $ nodetool getendpoints <partition_key>
>
>     So if the propagation has not taken place and a node doesn't have
>     the data and is the first to 'be asked' the client will get no data?
>
> That's correct. It will not return data it doesn't have when querying 
> with a consistency of ONE. There are limited cases where ONE is 
> applicable. In most cases, a strong consistency of LOCAL_QUORUM is 
> recommended to avoid the scenario you described. Cheers!
>
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> 
> 	Virus-free. www.avg.com 
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> 
>
>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Digest mismatch

Posted by Erick Ramirez <er...@datastax.com>.

>
> Thank you Steve - once I have the key, how do I get to a node?
>
Run this command to determine which replicas own the partition:

$ nodetool getendpoints <partition_key>

> So if the propagation has not taken place and a node doesn't have the data
> and is the first to 'be asked' the client will get no data?
>
That's correct. It will not return data it doesn't have when querying with
a consistency of ONE. There are limited cases where ONE is applicable. In
most cases, a strong consistency of LOCAL_QUORUM is recommended to avoid
the scenario you described. Cheers!

Re: Digest mismatch

Posted by Joe Obernberger <jo...@gmail.com>.

Thank you Steve - once I have the key, how do I get to a node?

After reading some of the documentation, it looks like the 
load-balancing-policy below *is* a token aware policy.  Perhaps writes 
need to be done with QUORUM; I don't know how long Cassandra will take 
to make sure replicas are consistent when doing ONE for all writes.  So 
if the propagation has not taken place and a node doesn't have the data 
and is the first to 'be asked' the client will get no data?

-Joe

On 12/2/2020 2:09 PM, Steve Lacerda wrote:
> If you can determine the key, then you can determine which nodes do 
> and do not have the data. You may be able to glean a bit more 
> information like that, maybe one node is having problems, versus 
> entire cluster.
>
> On Wed, Dec 2, 2020 at 9:32 AM Joe Obernberger 
> <joseph.obernberger@gmail.com <ma...@gmail.com>> 
> wrote:
>
>     Clients are using an application.conf like:
>
>     datastax-java-driver {
>       basic.request.timeout = 60 seconds
>       basic.request.consistency = ONE
>       basic.contact-points = ["172.16.110.3:9042
>     <http://172.16.110.3:9042>", "172.16.110.4:9042
>     <http://172.16.110.4:9042>", "172.16.100.208:9042
>     <http://172.16.100.208:9042>", "172.16.100.224:9042
>     <http://172.16.100.224:9042>", "172.16.100.225:9042
>     <http://172.16.100.225:9042>", "172.16.100.253:9042
>     <http://172.16.100.253:9042>", "172.16.100.254:9042
>     <http://172.16.100.254:9042>"]
>       basic.load-balancing-policy {
>             local-datacenter = datacenter1
>       }
>     }
>
>     So no, I'm not using a token aware policy.  I'm googling that
>     now...cuz I don't know what it is!
>
>     -Joe
>
>     On 12/2/2020 12:18 PM, Carl Mueller wrote:
>>     Are you using token aware policy for the driver?
>>
>>     If your writes are one and your reads are one, the propagation
>>     may not have happened depending on the coordinator that is used.
>>
>>     TokenAware will make that a bit better.
>>
>>     On Wed, Dec 2, 2020 at 11:12 AM Joe Obernberger
>>     <joseph.obernberger@gmail.com
>>     <ma...@gmail.com>> wrote:
>>
>>         Hi Carl - thank you for replying.
>>         I am using Cassandra 3.11.9-1
>>
>>         Rows are not typically being deleted - I assume you're
>>         referring to Tombstones.  I don't think that should be the
>>         case here as I don't think we've deleted anything here.
>>         This is a test cluster and some of the machines are small
>>         (hence the one node with 128 tokens and 14.6% - it has a lot
>>         less disk space than the other nodes).  This is one of the
>>         features that I really like with Cassandra - being able to
>>         size nodes based on disk/CPU/RAM.
>>
>>         All data is currently written with ONE.  All data is read
>>         with ONE.  I can replicate this issue at will, so can try
>>         different things easily.  I tried changing the read process
>>         to use QUORUM and the issue still takes place.  Right now I'm
>>         running a 'nodetool repair' to see if that helps.  Our
>>         largest table 'doc' has the following stats:
>>
>>         Table: doc
>>         SSTable count: 28
>>         Space used (live): 113609995010
>>         Space used (total): 113609995010
>>         Space used by snapshots (total): 0
>>         Off heap memory used (total): 225006197
>>         SSTable Compression Ratio: 0.37730474570644196
>>         Number of partitions (estimate): 93641747
>>         Memtable cell count: 0
>>         Memtable data size: 0
>>         Memtable off heap memory used: 0
>>         Memtable switch count: 3712
>>         Local read count: 891065091
>>         Local read latency: NaN ms
>>         Local write count: 7448281135
>>         Local write latency: NaN ms
>>         Pending flushes: 0
>>         Percent repaired: 0.0
>>         Bloom filter false positives: 988
>>         Bloom filter false ratio: 0.00001
>>         Bloom filter space used: 151149880
>>         Bloom filter off heap memory used: 151149656
>>         Index summary off heap memory used: 38654701
>>         Compression metadata off heap memory used: 35201840
>>         Compacted partition minimum bytes: 104
>>         Compacted partition maximum bytes: 3379391
>>         Compacted partition mean bytes: 3389
>>         Average live cells per slice (last five minutes): NaN
>>         Maximum live cells per slice (last five minutes): 0
>>         Average tombstones per slice (last five minutes): NaN
>>         Maximum tombstones per slice (last five minutes): 0
>>         Dropped Mutations: 8174438
>>
>>         Thoughts/ideas?  Thank you!
>>
>>         -Joe
>>
>>         On 12/2/2020 11:49 AM, Carl Mueller wrote:
>>>         Why is one of your nodes only at 14.6% ownership? That's
>>>         weird, unless you have a small rowcount.
>>>
>>>         Are you frequently deleting rows? Are you frequently writing
>>>         rows at ONE?
>>>
>>>         What version of cassandra?
>>>
>>>
>>>
>>>         On Wed, Dec 2, 2020 at 9:56 AM Joe Obernberger
>>>         <joseph.obernberger@gmail.com
>>>         <ma...@gmail.com>> wrote:
>>>
>>>             Hi All - this is my first post here.  I've been using
>>>             Cassandra for
>>>             several months now and am loving it.  We are moving from
>>>             Apache HBase to
>>>             Cassandra for a big data analytics platform.
>>>
>>>             I'm using java to get rows from Cassandra and very
>>>             frequently get a
>>>             java.util.NoSuchElementException when iterating through
>>>             a ResultSet.  If
>>>             I retry this query again (often several times), it
>>>             works.  The debug log
>>>             on the Cassandra nodes show this message:
>>>             org.apache.cassandra.service.DigestMismatchException:
>>>             Mismatch for key
>>>             DecoratedKey
>>>
>>>             My cluster looks like this:
>>>
>>>             Datacenter: datacenter1
>>>             =======================
>>>             Status=Up/Down
>>>             |/ State=Normal/Leaving/Joining/Moving
>>>             --  Address         Load       Tokens Owns (effective) 
>>>             Host
>>>             ID                               Rack
>>>             UN  172.16.100.224  340.5 GiB  512 50.9%
>>>             8ba646ac-2b33-49de-a220-ae9842f18806  rack1
>>>             UN  172.16.100.208  269.19 GiB  384 40.3%
>>>             4e0ba42f-649b-425a-857a-34497eb3036e  rack1
>>>             UN  172.16.100.225  282.83 GiB  512 50.4%
>>>             247f3d70-d13b-4d68-9a53-2ed58e01a63e  rack1
>>>             UN  172.16.110.3    409.78 GiB  768 63.2%
>>>             0abea102-06d2-4309-af36-a3163e8f00d8  rack1
>>>             UN  172.16.110.4    330.15 GiB  512 50.6%
>>>             2a5ae735-6304-4e99-924b-44d9d5ec86b7  rack1
>>>             UN  172.16.100.253  98.88 GiB  128 14.6%
>>>             6b528b0b-d7f7-4378-bba8-1857802d4f18  rack1
>>>             UN  172.16.100.254  204.5 GiB  256 30.0%
>>>             87d0cb48-a57d-460e-bd82-93e6e52e93ea  rack1
>>>
>>>             I suspect this has to do with how I'm using consistency
>>>             levels?
>>>             Typically I'm using ONE.  I just set the
>>>             dclocal_read_repair_chance to
>>>             0.0, but I'm still seeing the issue.  Any help/tips?
>>>
>>>             Thank you!
>>>
>>>             -Joe Obernberger
>>>
>>>
>>>             ---------------------------------------------------------------------
>>>             To unsubscribe, e-mail:
>>>             user-unsubscribe@cassandra.apache.org
>>>             <ma...@cassandra.apache.org>
>>>             For additional commands, e-mail:
>>>             user-help@cassandra.apache.org
>>>             <ma...@cassandra.apache.org>
>>>
>>>
>>>         <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.avg.com_email-2Dsignature-3Futm-5Fmedium-3Demail-26utm-5Fsource-3Dlink-26utm-5Fcampaign-3Dsig-2Demail-26utm-5Fcontent-3Demailclient&d=DwMDaQ&c=adz96Xi0w1RHqtPMowiL2g&r=R58SsZ6FLB8iCRFGJzNOH0d2HRPVtaWKKj5fzuMiGlo&m=Fyv9e8-h-x9SK5jhrHZ0E8GuNrgtlMqrzqMWJPRf6dc&s=LpYSwEkia1rRuqN2D9BD7zWhq-f4KX3JgbvVN3yEeDI&e=>
>>>         	Virus-free. www.avg.com
>>>         <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.avg.com_email-2Dsignature-3Futm-5Fmedium-3Demail-26utm-5Fsource-3Dlink-26utm-5Fcampaign-3Dsig-2Demail-26utm-5Fcontent-3Demailclient&d=DwMDaQ&c=adz96Xi0w1RHqtPMowiL2g&r=R58SsZ6FLB8iCRFGJzNOH0d2HRPVtaWKKj5fzuMiGlo&m=Fyv9e8-h-x9SK5jhrHZ0E8GuNrgtlMqrzqMWJPRf6dc&s=LpYSwEkia1rRuqN2D9BD7zWhq-f4KX3JgbvVN3yEeDI&e=>
>>>
>>>
>>>         <#m_1355843925756209451_m_1378452758220018548_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>
>
> -- 
> Steve Lacerda
> e. steve.lacerda@datastax.com <ma...@datastax.com>
> w. www.datastax.com <http://www.datastax.com>
>

Re: Digest mismatch

Posted by Steve Lacerda <st...@datastax.com>.

If you can determine the key, then you can determine which nodes do and do
not have the data. You may be able to glean a bit more information like
that, maybe one node is having problems, versus entire cluster.

On Wed, Dec 2, 2020 at 9:32 AM Joe Obernberger <jo...@gmail.com>
wrote:

> Clients are using an application.conf like:
>
> datastax-java-driver {
>   basic.request.timeout = 60 seconds
>   basic.request.consistency = ONE
>   basic.contact-points = ["172.16.110.3:9042", "172.16.110.4:9042", "
> 172.16.100.208:9042", "172.16.100.224:9042", "172.16.100.225:9042", "
> 172.16.100.253:9042", "172.16.100.254:9042"]
>   basic.load-balancing-policy {
>         local-datacenter = datacenter1
>   }
> }
>
> So no, I'm not using a token aware policy.  I'm googling that now...cuz I
> don't know what it is!
>
> -Joe
> On 12/2/2020 12:18 PM, Carl Mueller wrote:
>
> Are you using token aware policy for the driver?
>
> If your writes are one and your reads are one, the propagation may not
> have happened depending on the coordinator that is used.
>
> TokenAware will make that a bit better.
>
> On Wed, Dec 2, 2020 at 11:12 AM Joe Obernberger <
> joseph.obernberger@gmail.com> wrote:
>
>> Hi Carl - thank you for replying.
>> I am using Cassandra 3.11.9-1
>>
>> Rows are not typically being deleted - I assume you're referring to
>> Tombstones.  I don't think that should be the case here as I don't think
>> we've deleted anything here.
>> This is a test cluster and some of the machines are small (hence the one
>> node with 128 tokens and 14.6% - it has a lot less disk space than the
>> other nodes).  This is one of the features that I really like with
>> Cassandra - being able to size nodes based on disk/CPU/RAM.
>>
>> All data is currently written with ONE.  All data is read with ONE.  I
>> can replicate this issue at will, so can try different things easily.  I
>> tried changing the read process to use QUORUM and the issue still takes
>> place.  Right now I'm running a 'nodetool repair' to see if that helps.
>> Our largest table 'doc' has the following stats:
>>
>> Table: doc
>> SSTable count: 28
>> Space used (live): 113609995010
>> Space used (total): 113609995010
>> Space used by snapshots (total): 0
>> Off heap memory used (total): 225006197
>> SSTable Compression Ratio: 0.37730474570644196
>> Number of partitions (estimate): 93641747
>> Memtable cell count: 0
>> Memtable data size: 0
>> Memtable off heap memory used: 0
>> Memtable switch count: 3712
>> Local read count: 891065091
>> Local read latency: NaN ms
>> Local write count: 7448281135
>> Local write latency: NaN ms
>> Pending flushes: 0
>> Percent repaired: 0.0
>> Bloom filter false positives: 988
>> Bloom filter false ratio: 0.00001
>> Bloom filter space used: 151149880
>> Bloom filter off heap memory used: 151149656
>> Index summary off heap memory used: 38654701
>> Compression metadata off heap memory used: 35201840
>> Compacted partition minimum bytes: 104
>> Compacted partition maximum bytes: 3379391
>> Compacted partition mean bytes: 3389
>> Average live cells per slice (last five minutes): NaN
>> Maximum live cells per slice (last five minutes): 0
>> Average tombstones per slice (last five minutes): NaN
>> Maximum tombstones per slice (last five minutes): 0
>> Dropped Mutations: 8174438
>>
>> Thoughts/ideas?  Thank you!
>>
>> -Joe
>> On 12/2/2020 11:49 AM, Carl Mueller wrote:
>>
>> Why is one of your nodes only at 14.6% ownership? That's weird, unless
>> you have a small rowcount.
>>
>> Are you frequently deleting rows? Are you frequently writing rows at ONE?
>>
>> What version of cassandra?
>>
>>
>>
>> On Wed, Dec 2, 2020 at 9:56 AM Joe Obernberger <
>> joseph.obernberger@gmail.com> wrote:
>>
>>> Hi All - this is my first post here.  I've been using Cassandra for
>>> several months now and am loving it.  We are moving from Apache HBase to
>>> Cassandra for a big data analytics platform.
>>>
>>> I'm using java to get rows from Cassandra and very frequently get a
>>> java.util.NoSuchElementException when iterating through a ResultSet.  If
>>> I retry this query again (often several times), it works.  The debug log
>>> on the Cassandra nodes show this message:
>>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>>> DecoratedKey
>>>
>>> My cluster looks like this:
>>>
>>> Datacenter: datacenter1
>>> =======================
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address         Load       Tokens       Owns (effective)  Host
>>> ID                               Rack
>>> UN  172.16.100.224  340.5 GiB  512          50.9%
>>> 8ba646ac-2b33-49de-a220-ae9842f18806  rack1
>>> UN  172.16.100.208  269.19 GiB  384          40.3%
>>> 4e0ba42f-649b-425a-857a-34497eb3036e  rack1
>>> UN  172.16.100.225  282.83 GiB  512          50.4%
>>> 247f3d70-d13b-4d68-9a53-2ed58e01a63e  rack1
>>> UN  172.16.110.3    409.78 GiB  768          63.2%
>>> 0abea102-06d2-4309-af36-a3163e8f00d8  rack1
>>> UN  172.16.110.4    330.15 GiB  512          50.6%
>>> 2a5ae735-6304-4e99-924b-44d9d5ec86b7  rack1
>>> UN  172.16.100.253  98.88 GiB  128          14.6%
>>> 6b528b0b-d7f7-4378-bba8-1857802d4f18  rack1
>>> UN  172.16.100.254  204.5 GiB  256          30.0%
>>> 87d0cb48-a57d-460e-bd82-93e6e52e93ea  rack1
>>>
>>> I suspect this has to do with how I'm using consistency levels?
>>> Typically I'm using ONE.  I just set the dclocal_read_repair_chance to
>>> 0.0, but I'm still seeing the issue.  Any help/tips?
>>>
>>> Thank you!
>>>
>>> -Joe Obernberger
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>
>>>
>>
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.avg.com_email-2Dsignature-3Futm-5Fmedium-3Demail-26utm-5Fsource-3Dlink-26utm-5Fcampaign-3Dsig-2Demail-26utm-5Fcontent-3Demailclient&d=DwMDaQ&c=adz96Xi0w1RHqtPMowiL2g&r=R58SsZ6FLB8iCRFGJzNOH0d2HRPVtaWKKj5fzuMiGlo&m=Fyv9e8-h-x9SK5jhrHZ0E8GuNrgtlMqrzqMWJPRf6dc&s=LpYSwEkia1rRuqN2D9BD7zWhq-f4KX3JgbvVN3yEeDI&e=> Virus-free.
>> www.avg.com
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.avg.com_email-2Dsignature-3Futm-5Fmedium-3Demail-26utm-5Fsource-3Dlink-26utm-5Fcampaign-3Dsig-2Demail-26utm-5Fcontent-3Demailclient&d=DwMDaQ&c=adz96Xi0w1RHqtPMowiL2g&r=R58SsZ6FLB8iCRFGJzNOH0d2HRPVtaWKKj5fzuMiGlo&m=Fyv9e8-h-x9SK5jhrHZ0E8GuNrgtlMqrzqMWJPRf6dc&s=LpYSwEkia1rRuqN2D9BD7zWhq-f4KX3JgbvVN3yEeDI&e=>
>> <#m_1355843925756209451_m_1378452758220018548_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>>

-- 
Steve Lacerda
e. steve.lacerda@datastax.com
w. www.datastax.com

Re: Digest mismatch

Posted by Joe Obernberger <jo...@gmail.com>.

Clients are using an application.conf like:

datastax-java-driver {
   basic.request.timeout = 60 seconds
   basic.request.consistency = ONE
   basic.contact-points = ["172.16.110.3:9042", "172.16.110.4:9042", 
"172.16.100.208:9042", "172.16.100.224:9042", "172.16.100.225:9042", 
"172.16.100.253:9042", "172.16.100.254:9042"]
   basic.load-balancing-policy {
         local-datacenter = datacenter1
   }
}

So no, I'm not using a token aware policy.  I'm googling that now...cuz 
I don't know what it is!

-Joe

On 12/2/2020 12:18 PM, Carl Mueller wrote:
> Are you using token aware policy for the driver?
>
> If your writes are one and your reads are one, the propagation may not 
> have happened depending on the coordinator that is used.
>
> TokenAware will make that a bit better.
>
> On Wed, Dec 2, 2020 at 11:12 AM Joe Obernberger 
> <joseph.obernberger@gmail.com <ma...@gmail.com>> 
> wrote:
>
>     Hi Carl - thank you for replying.
>     I am using Cassandra 3.11.9-1
>
>     Rows are not typically being deleted - I assume you're referring
>     to Tombstones.  I don't think that should be the case here as I
>     don't think we've deleted anything here.
>     This is a test cluster and some of the machines are small (hence
>     the one node with 128 tokens and 14.6% - it has a lot less disk
>     space than the other nodes).  This is one of the features that I
>     really like with Cassandra - being able to size nodes based on
>     disk/CPU/RAM.
>
>     All data is currently written with ONE.  All data is read with
>     ONE.  I can replicate this issue at will, so can try different
>     things easily.  I tried changing the read process to use QUORUM
>     and the issue still takes place. Right now I'm running a 'nodetool
>     repair' to see if that helps.  Our largest table 'doc' has the
>     following stats:
>
>     Table: doc
>     SSTable count: 28
>     Space used (live): 113609995010
>     Space used (total): 113609995010
>     Space used by snapshots (total): 0
>     Off heap memory used (total): 225006197
>     SSTable Compression Ratio: 0.37730474570644196
>     Number of partitions (estimate): 93641747
>     Memtable cell count: 0
>     Memtable data size: 0
>     Memtable off heap memory used: 0
>     Memtable switch count: 3712
>     Local read count: 891065091
>     Local read latency: NaN ms
>     Local write count: 7448281135
>     Local write latency: NaN ms
>     Pending flushes: 0
>     Percent repaired: 0.0
>     Bloom filter false positives: 988
>     Bloom filter false ratio: 0.00001
>     Bloom filter space used: 151149880
>     Bloom filter off heap memory used: 151149656
>     Index summary off heap memory used: 38654701
>     Compression metadata off heap memory used: 35201840
>     Compacted partition minimum bytes: 104
>     Compacted partition maximum bytes: 3379391
>     Compacted partition mean bytes: 3389
>     Average live cells per slice (last five minutes): NaN
>     Maximum live cells per slice (last five minutes): 0
>     Average tombstones per slice (last five minutes): NaN
>     Maximum tombstones per slice (last five minutes): 0
>     Dropped Mutations: 8174438
>
>     Thoughts/ideas?  Thank you!
>
>     -Joe
>
>     On 12/2/2020 11:49 AM, Carl Mueller wrote:
>>     Why is one of your nodes only at 14.6% ownership? That's weird,
>>     unless you have a small rowcount.
>>
>>     Are you frequently deleting rows? Are you frequently writing rows
>>     at ONE?
>>
>>     What version of cassandra?
>>
>>
>>
>>     On Wed, Dec 2, 2020 at 9:56 AM Joe Obernberger
>>     <joseph.obernberger@gmail.com
>>     <ma...@gmail.com>> wrote:
>>
>>         Hi All - this is my first post here.  I've been using
>>         Cassandra for
>>         several months now and am loving it.  We are moving from
>>         Apache HBase to
>>         Cassandra for a big data analytics platform.
>>
>>         I'm using java to get rows from Cassandra and very frequently
>>         get a
>>         java.util.NoSuchElementException when iterating through a
>>         ResultSet.  If
>>         I retry this query again (often several times), it works. 
>>         The debug log
>>         on the Cassandra nodes show this message:
>>         org.apache.cassandra.service.DigestMismatchException:
>>         Mismatch for key
>>         DecoratedKey
>>
>>         My cluster looks like this:
>>
>>         Datacenter: datacenter1
>>         =======================
>>         Status=Up/Down
>>         |/ State=Normal/Leaving/Joining/Moving
>>         --  Address         Load       Tokens       Owns (effective) 
>>         Host
>>         ID                               Rack
>>         UN  172.16.100.224  340.5 GiB  512          50.9%
>>         8ba646ac-2b33-49de-a220-ae9842f18806  rack1
>>         UN  172.16.100.208  269.19 GiB  384          40.3%
>>         4e0ba42f-649b-425a-857a-34497eb3036e  rack1
>>         UN  172.16.100.225  282.83 GiB  512          50.4%
>>         247f3d70-d13b-4d68-9a53-2ed58e01a63e  rack1
>>         UN  172.16.110.3    409.78 GiB  768          63.2%
>>         0abea102-06d2-4309-af36-a3163e8f00d8  rack1
>>         UN  172.16.110.4    330.15 GiB  512          50.6%
>>         2a5ae735-6304-4e99-924b-44d9d5ec86b7  rack1
>>         UN  172.16.100.253  98.88 GiB  128          14.6%
>>         6b528b0b-d7f7-4378-bba8-1857802d4f18  rack1
>>         UN  172.16.100.254  204.5 GiB  256          30.0%
>>         87d0cb48-a57d-460e-bd82-93e6e52e93ea  rack1
>>
>>         I suspect this has to do with how I'm using consistency levels?
>>         Typically I'm using ONE.  I just set the
>>         dclocal_read_repair_chance to
>>         0.0, but I'm still seeing the issue.  Any help/tips?
>>
>>         Thank you!
>>
>>         -Joe Obernberger
>>
>>
>>         ---------------------------------------------------------------------
>>         To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>         <ma...@cassandra.apache.org>
>>         For additional commands, e-mail:
>>         user-help@cassandra.apache.org
>>         <ma...@cassandra.apache.org>
>>
>>
>>     <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>     	Virus-free. www.avg.com
>>     <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>>
>>
>>     <#m_1378452758220018548_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>

Re: Digest mismatch

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.

Are you using token aware policy for the driver?

If your writes are one and your reads are one, the propagation may not have
happened depending on the coordinator that is used.

TokenAware will make that a bit better.

On Wed, Dec 2, 2020 at 11:12 AM Joe Obernberger <
joseph.obernberger@gmail.com> wrote:

> Hi Carl - thank you for replying.
> I am using Cassandra 3.11.9-1
>
> Rows are not typically being deleted - I assume you're referring to
> Tombstones.  I don't think that should be the case here as I don't think
> we've deleted anything here.
> This is a test cluster and some of the machines are small (hence the one
> node with 128 tokens and 14.6% - it has a lot less disk space than the
> other nodes).  This is one of the features that I really like with
> Cassandra - being able to size nodes based on disk/CPU/RAM.
>
> All data is currently written with ONE.  All data is read with ONE.  I can
> replicate this issue at will, so can try different things easily.  I tried
> changing the read process to use QUORUM and the issue still takes place.
> Right now I'm running a 'nodetool repair' to see if that helps.  Our
> largest table 'doc' has the following stats:
>
> Table: doc
> SSTable count: 28
> Space used (live): 113609995010
> Space used (total): 113609995010
> Space used by snapshots (total): 0
> Off heap memory used (total): 225006197
> SSTable Compression Ratio: 0.37730474570644196
> Number of partitions (estimate): 93641747
> Memtable cell count: 0
> Memtable data size: 0
> Memtable off heap memory used: 0
> Memtable switch count: 3712
> Local read count: 891065091
> Local read latency: NaN ms
> Local write count: 7448281135
> Local write latency: NaN ms
> Pending flushes: 0
> Percent repaired: 0.0
> Bloom filter false positives: 988
> Bloom filter false ratio: 0.00001
> Bloom filter space used: 151149880
> Bloom filter off heap memory used: 151149656
> Index summary off heap memory used: 38654701
> Compression metadata off heap memory used: 35201840
> Compacted partition minimum bytes: 104
> Compacted partition maximum bytes: 3379391
> Compacted partition mean bytes: 3389
> Average live cells per slice (last five minutes): NaN
> Maximum live cells per slice (last five minutes): 0
> Average tombstones per slice (last five minutes): NaN
> Maximum tombstones per slice (last five minutes): 0
> Dropped Mutations: 8174438
>
> Thoughts/ideas?  Thank you!
>
> -Joe
> On 12/2/2020 11:49 AM, Carl Mueller wrote:
>
> Why is one of your nodes only at 14.6% ownership? That's weird, unless you
> have a small rowcount.
>
> Are you frequently deleting rows? Are you frequently writing rows at ONE?
>
> What version of cassandra?
>
>
>
> On Wed, Dec 2, 2020 at 9:56 AM Joe Obernberger <
> joseph.obernberger@gmail.com> wrote:
>
>> Hi All - this is my first post here.  I've been using Cassandra for
>> several months now and am loving it.  We are moving from Apache HBase to
>> Cassandra for a big data analytics platform.
>>
>> I'm using java to get rows from Cassandra and very frequently get a
>> java.util.NoSuchElementException when iterating through a ResultSet.  If
>> I retry this query again (often several times), it works.  The debug log
>> on the Cassandra nodes show this message:
>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>> DecoratedKey
>>
>> My cluster looks like this:
>>
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load       Tokens       Owns (effective)  Host
>> ID                               Rack
>> UN  172.16.100.224  340.5 GiB  512          50.9%
>> 8ba646ac-2b33-49de-a220-ae9842f18806  rack1
>> UN  172.16.100.208  269.19 GiB  384          40.3%
>> 4e0ba42f-649b-425a-857a-34497eb3036e  rack1
>> UN  172.16.100.225  282.83 GiB  512          50.4%
>> 247f3d70-d13b-4d68-9a53-2ed58e01a63e  rack1
>> UN  172.16.110.3    409.78 GiB  768          63.2%
>> 0abea102-06d2-4309-af36-a3163e8f00d8  rack1
>> UN  172.16.110.4    330.15 GiB  512          50.6%
>> 2a5ae735-6304-4e99-924b-44d9d5ec86b7  rack1
>> UN  172.16.100.253  98.88 GiB  128          14.6%
>> 6b528b0b-d7f7-4378-bba8-1857802d4f18  rack1
>> UN  172.16.100.254  204.5 GiB  256          30.0%
>> 87d0cb48-a57d-460e-bd82-93e6e52e93ea  rack1
>>
>> I suspect this has to do with how I'm using consistency levels?
>> Typically I'm using ONE.  I just set the dclocal_read_repair_chance to
>> 0.0, but I'm still seeing the issue.  Any help/tips?
>>
>> Thank you!
>>
>> -Joe Obernberger
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>
>
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> Virus-free.
> www.avg.com
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
> <#m_1378452758220018548_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
>

Re: Digest mismatch

Posted by Joe Obernberger <jo...@gmail.com>.

Hi Carl - thank you for replying.
I am using Cassandra 3.11.9-1

Rows are not typically being deleted - I assume you're referring to 
Tombstones.  I don't think that should be the case here as I don't think 
we've deleted anything here.
This is a test cluster and some of the machines are small (hence the one 
node with 128 tokens and 14.6% - it has a lot less disk space than the 
other nodes).  This is one of the features that I really like with 
Cassandra - being able to size nodes based on disk/CPU/RAM.

All data is currently written with ONE.  All data is read with ONE.  I 
can replicate this issue at will, so can try different things easily.  I 
tried changing the read process to use QUORUM and the issue still takes 
place.  Right now I'm running a 'nodetool repair' to see if that helps.  
Our largest table 'doc' has the following stats:

Table: doc
SSTable count: 28
Space used (live): 113609995010
Space used (total): 113609995010
Space used by snapshots (total): 0
Off heap memory used (total): 225006197
SSTable Compression Ratio: 0.37730474570644196
Number of partitions (estimate): 93641747
Memtable cell count: 0
Memtable data size: 0
Memtable off heap memory used: 0
Memtable switch count: 3712
Local read count: 891065091
Local read latency: NaN ms
Local write count: 7448281135
Local write latency: NaN ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 988
Bloom filter false ratio: 0.00001
Bloom filter space used: 151149880
Bloom filter off heap memory used: 151149656
Index summary off heap memory used: 38654701
Compression metadata off heap memory used: 35201840
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 3379391
Compacted partition mean bytes: 3389
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 8174438

Thoughts/ideas?  Thank you!

-Joe

On 12/2/2020 11:49 AM, Carl Mueller wrote:
> Why is one of your nodes only at 14.6% ownership? That's weird, unless 
> you have a small rowcount.
>
> Are you frequently deleting rows? Are you frequently writing rows at ONE?
>
> What version of cassandra?
>
>
>
> On Wed, Dec 2, 2020 at 9:56 AM Joe Obernberger 
> <joseph.obernberger@gmail.com <ma...@gmail.com>> 
> wrote:
>
>     Hi All - this is my first post here.  I've been using Cassandra for
>     several months now and am loving it.  We are moving from Apache
>     HBase to
>     Cassandra for a big data analytics platform.
>
>     I'm using java to get rows from Cassandra and very frequently get a
>     java.util.NoSuchElementException when iterating through a
>     ResultSet.  If
>     I retry this query again (often several times), it works.  The
>     debug log
>     on the Cassandra nodes show this message:
>     org.apache.cassandra.service.DigestMismatchException: Mismatch for
>     key
>     DecoratedKey
>
>     My cluster looks like this:
>
>     Datacenter: datacenter1
>     =======================
>     Status=Up/Down
>     |/ State=Normal/Leaving/Joining/Moving
>     --  Address         Load       Tokens       Owns (effective) Host
>     ID                               Rack
>     UN  172.16.100.224  340.5 GiB  512          50.9%
>     8ba646ac-2b33-49de-a220-ae9842f18806  rack1
>     UN  172.16.100.208  269.19 GiB  384          40.3%
>     4e0ba42f-649b-425a-857a-34497eb3036e  rack1
>     UN  172.16.100.225  282.83 GiB  512          50.4%
>     247f3d70-d13b-4d68-9a53-2ed58e01a63e  rack1
>     UN  172.16.110.3    409.78 GiB  768          63.2%
>     0abea102-06d2-4309-af36-a3163e8f00d8  rack1
>     UN  172.16.110.4    330.15 GiB  512          50.6%
>     2a5ae735-6304-4e99-924b-44d9d5ec86b7  rack1
>     UN  172.16.100.253  98.88 GiB  128          14.6%
>     6b528b0b-d7f7-4378-bba8-1857802d4f18  rack1
>     UN  172.16.100.254  204.5 GiB  256          30.0%
>     87d0cb48-a57d-460e-bd82-93e6e52e93ea  rack1
>
>     I suspect this has to do with how I'm using consistency levels?
>     Typically I'm using ONE.  I just set the
>     dclocal_read_repair_chance to
>     0.0, but I'm still seeing the issue.  Any help/tips?
>
>     Thank you!
>
>     -Joe Obernberger
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>     <ma...@cassandra.apache.org>
>     For additional commands, e-mail: user-help@cassandra.apache.org
>     <ma...@cassandra.apache.org>
>
>
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> 
> 	Virus-free. www.avg.com 
> <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> 
>
>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Digest mismatch

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.

Why is one of your nodes only at 14.6% ownership? That's weird, unless you
have a small rowcount.

Are you frequently deleting rows? Are you frequently writing rows at ONE?

What version of cassandra?



On Wed, Dec 2, 2020 at 9:56 AM Joe Obernberger <jo...@gmail.com>
wrote:

> Hi All - this is my first post here.  I've been using Cassandra for
> several months now and am loving it.  We are moving from Apache HBase to
> Cassandra for a big data analytics platform.
>
> I'm using java to get rows from Cassandra and very frequently get a
> java.util.NoSuchElementException when iterating through a ResultSet.  If
> I retry this query again (often several times), it works.  The debug log
> on the Cassandra nodes show this message:
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey
>
> My cluster looks like this:
>
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address         Load       Tokens       Owns (effective)  Host
> ID                               Rack
> UN  172.16.100.224  340.5 GiB  512          50.9%
> 8ba646ac-2b33-49de-a220-ae9842f18806  rack1
> UN  172.16.100.208  269.19 GiB  384          40.3%
> 4e0ba42f-649b-425a-857a-34497eb3036e  rack1
> UN  172.16.100.225  282.83 GiB  512          50.4%
> 247f3d70-d13b-4d68-9a53-2ed58e01a63e  rack1
> UN  172.16.110.3    409.78 GiB  768          63.2%
> 0abea102-06d2-4309-af36-a3163e8f00d8  rack1
> UN  172.16.110.4    330.15 GiB  512          50.6%
> 2a5ae735-6304-4e99-924b-44d9d5ec86b7  rack1
> UN  172.16.100.253  98.88 GiB  128          14.6%
> 6b528b0b-d7f7-4378-bba8-1857802d4f18  rack1
> UN  172.16.100.254  204.5 GiB  256          30.0%
> 87d0cb48-a57d-460e-bd82-93e6e52e93ea  rack1
>
> I suspect this has to do with how I'm using consistency levels?
> Typically I'm using ONE.  I just set the dclocal_read_repair_chance to
> 0.0, but I'm still seeing the issue.  Any help/tips?
>
> Thank you!
>
> -Joe Obernberger
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>