You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Marcus Eriksson (JIRA)" <ji...@apache.org> on 2019/08/15 18:34:00 UTC

[jira] [Assigned] (CASSANDRA-14941) Expired secondary index sstables are not promptly discarded under TWCS

     [ https://issues.apache.org/jira/browse/CASSANDRA-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson reassigned CASSANDRA-14941:
-------------------------------------------

    Assignee: Marcus Eriksson

> Expired secondary index sstables are not promptly discarded under TWCS
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-14941
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14941
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/2i Index
>            Reporter: Samuel Klock
>            Assignee: Marcus Eriksson
>            Priority: Normal
>
> We have a table in a cluster running 3.0.17 storing roughly time-series data using TWCS with a secondary index. We've noticed that while expired sstables for the table are discarded mostly when we expect them to be, the expired sstables for the secondary index would linger for weeks longer than expected – essentially indefinitely. Eventually the sstables would fill disks, which would require manual steps (deleting ancient index sstables) to address. We verified with {{sstableexpiredblockers}} that there wasn't anything on disk blocking the expired sstables from being dropped, so this looks like a bug.
> Through some debugging, we traced the problem to the index's memtables, which were consistently (except _just_ after node restarts) reporting a minimum timestamp from September 2015 – much older than any of our live data – which causes {{CompactionController.getFullyExpiredSSTables()}} to consistently return an empty set. The reason that the index sstables report this minimum timestamp is because of how index updates are created, using {{PartitionUpdate.singleRowUpdate()}}:
> {code:java}
>     public static PartitionUpdate singleRowUpdate(CFMetaData metadata, DecoratedKey key, Row row, Row staticRow)
>     {
>         MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
>         Holder holder = new Holder(
>             new PartitionColumns(
>                 staticRow == null ? Columns.NONE : Columns.from(staticRow.columns()),
>                 row == null ? Columns.NONE : Columns.from(row.columns())
>             ),
>             row == null ? BTree.empty() : BTree.singleton(row),
>             deletionInfo,
>             staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow,
>             EncodingStats.NO_STATS
>         );
>         return new PartitionUpdate(metadata, key, holder, deletionInfo, false);
>     }
> {code}
> The use of {{EncodingStats.NO_STATS}} makes it appear as though the earliest timestamp in the resulting {{PartitionUpdate}} is from September 2015. That timestamp becomes the minimum for the memtable.
> Modifying this version of {{PartitionUpdate.singleRowUpdate()}} to:
> {code:java}
>     public static PartitionUpdate singleRowUpdate(CFMetaData metadata, DecoratedKey key, Row row, Row staticRow)
>     {
>         MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
>         staticRow = (staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow);
>         EncodingStats stats = EncodingStats.Collector.collect(staticRow,
>                                                               (row == null ?
>                                                                Collections.emptyIterator() :
>                                                                Iterators.singletonIterator(row)),
>                                                               deletionInfo);
>         Holder holder = new Holder(
>             new PartitionColumns(
>                 staticRow == Rows.EMPTY_STATIC_ROW ? Columns.NONE : Columns.from(staticRow.columns()),
>                 row == null ? Columns.NONE : Columns.from(row.columns())
>             ),
>             row == null ? BTree.empty() : BTree.singleton(row),
>             deletionInfo,
>             staticRow,
>             stats
>         );
>         return new PartitionUpdate(metadata, key, holder, deletionInfo, false);
>     }
> {code}
> (i.e., computing an {{EncodingStats}} from the contents of the update) seems to fix the problem. However, we're not certain whether A) there's a functional reason the method was using {{EncodingStats.NO_STATS}} previously or B) whether the {{EncodingStats}} the revised version creates is correct (in particular, the use of {{deletionInfo}} feels a little suspect). We're also not sure whether there's a more appropriate fix (e.g., changing how the memtables compute the minimum timestamp, particularly in the {{NO_STATS}} case).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org