You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Aleksei Zotov (Jira)" <ji...@apache.org> on 2021/07/12 22:46:00 UTC
[jira] [Commented] (CASSANDRA-12922) Bloom filter miss counts are
not measured correctly
[ https://issues.apache.org/jira/browse/CASSANDRA-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379453#comment-17379453 ]
Aleksei Zotov commented on CASSANDRA-12922:
-------------------------------------------
[~blambov]
I know you've already confirmed that the patch looks good, but I want to confirm one use case is valid. I analyzed the code and I see the following valid exit paths for {{getPosition}} method:
||Use Case||Behavior||
|key is not present in NF|addTrueNegative and exit|
|key is present in Key Cache|addTruePositive and exit|
|key is not within sstable's keys range|addFalsePositive and exit|
|there is no index file|exit|
|key is not present in index file|*addFalsePositive (that's what we're fixing)* and exit|
|key is present in index file|addTruePositive and exit|
|else|addFalsePositive and exit|
The question is: don't we need to track "false positive" if there is no index file? I know that having no index file is not smth expected, but from BF perspective, I see no difference between "key is not present in index file" and "there is no index file" use cases. Please, let me know your thoughts.
cc: [~blerer]
-
> Bloom filter miss counts are not measured correctly
> ---------------------------------------------------
>
> Key: CASSANDRA-12922
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12922
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Local Write-Read Paths
> Reporter: Branimir Lambov
> Assignee: Benjamin Lerer
> Priority: Normal
> Labels: lhf
> Fix For: 4.x
>
> Attachments: 12922-trunk.txt
>
>
> Bloom filter hits and misses are evaluated incorrectly in {{BigTableReader.getPosition}}: we properly record hits, but not misses. In particular, if we don't find a match for a key in the index, which is where almost all non-matches will be rejected, [we don't record a bloom filter false positive|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/format/big/BigTableReader.java#L228].
> This leads to very misleading output from e.g. {{nodetool tablestats}}.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org