You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paulo Motta (Jira)" <ji...@apache.org> on 2022/01/19 02:07:00 UTC

[jira] [Commented] (CASSANDRA-17267) Snapshot true size is miscalculated

    [ https://issues.apache.org/jira/browse/CASSANDRA-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478310#comment-17478310 ] 

Paulo Motta commented on CASSANDRA-17267:
-----------------------------------------

The snapshot true size is calculated by [Directories.getTrueAllocatedSizeIn|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/Directories.java#L960].

This method creates a [SSTableSizeSummer|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/Directories.java#L1054] using the snapshot folder as the list of files to be iterated/counted and the list of live sstables as the list of files to be skipped (toSkip set).

The [isAcceptable|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/db/Directories.java#L1064] method decides whether the snapshot file size must be counted by checking if it's an sstable component and if it's not present on the "toSkip" set.

However the snapshot files will never be present in the "toSkip" set, causing the snapshot file sizes to always be accounted - whether or not a "corresponding" live sstable is found.

I believe the original implementer's intent was to verify that the "corresponding" sstable file is present in the "toSkip" set, but it doesn't reconstruct the original sstable file from the snapshot file before checking it's present on the set.

I created a [PR|https://github.com/apache/cassandra/pull/1408] with a reproduction and preliminary fix.

The reproduction can be found [on this test|https://github.com/apache/cassandra/pull/1408/files#diff-ef5be0b69d0440b76021282c4b24bad69770ef9419be260df2169f49921db377R346].

[The fix|https://github.com/apache/cassandra/pull/1408/files#diff-bb20d0c655884c2211213190ae4787ace619cdff4c0235f147db7dfbf1e7a869R1067] only counts the snapshot file size if the file is an sstable component *AND* if a live sstable component can *not* be found on "snapshot_dir/../../file_name" (since the snapshot file is found on <table_dir>/snapshots/<snapshot_name>/file).

The same snapshot of the ticket description is displayed as following after the fix:
{noformat}
$ nodetool listsnapshots
Snapshot Details:
Snapshot name Keyspace name Column family name True size Size on disk
test          ks1           tbl1               0 bytes   5.69 KiB

Total TrueDiskSpaceUsed: 0 bytes
{noformat}

> Snapshot true size is miscalculated
> -----------------------------------
>
>                 Key: CASSANDRA-17267
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17267
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>            Priority: Normal
>
> As far as I understand, the snapshot "size on disk" is the total size of the snapshot, while the "true size" is the (size_on_disk - size_of_live_sstables).
> I created a snapshot on a 3.11 node without traffic and I expected the "true size" to be 0KB since the original sstables were still present, but this didn't seem to be the case:
> {noformat}
> $ nodetool listsnapshots
> Snapshot Details:
> Snapshot name Keyspace name Column family name True size Size on disk
> test          ks1           tbl1               4.86 KiB  5.69 KiB
> Total TrueDiskSpaceUsed: 4.86 KiB
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org