You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Thomas Steinmaurer (Jira)" <ji...@apache.org> on 2021/11/10 06:56:00 UTC

[jira] [Commented] (CASSANDRA-16619) Loss of commit log data possible after sstable ingest

    [ https://issues.apache.org/jira/browse/CASSANDRA-16619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441530#comment-17441530 ] 

Thomas Steinmaurer commented on CASSANDRA-16619:
------------------------------------------------

Regarding the WARN log, which got introduced by that ticket, e.g.:
{noformat}
WARN  [main] 2021-11-08 21:54:06,826 CommitLogReplayer.java:253 - Origin of 1 sstables is unknown or doesn't match the local node; commitLogIntervals for them were ignored
{noformat}

While I understand the intention to ensure / avoid things when SSTables have been copied around (or e.g. due to a restore), the WARN log also seems to happen when Cassandra 3.11.11 reads pre-"*me*" SSTables, thus e.g. from 3.11.10. I understand that the WARN log will go away eventually on its own resp. for sure (I guess?) after running "nodetool upgradesstables".

These sort of WARN log has produced quite some confusion and customer interaction for on-premise customer installations.
* Would it be possible to WARN only if we are in context of a "me" SSTable to avoid confusion after upgrading from pre-3.11.11?
* Would it be possible to mention a SSTable minor upgrade in e.g. {{NEWS.txt}} (or perhaps I missed it), as there might be tooling out there which counts number of SSTables per "format" via file name

Many thanks.

> Loss of commit log data possible after sstable ingest
> -----------------------------------------------------
>
>                 Key: CASSANDRA-16619
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16619
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Commit Log
>            Reporter: Jacek Lewandowski
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 3.0.25, 3.11.11, 4.0-rc2, 4.0
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> SSTable metadata contains commit log positions of the sstable. These positions are used to filter out mutations from the commit log on restart and only make sense for the node on which the data was flushed.
> If an SSTable is moved between nodes they may cover regions that the receiving node has not yet flushed, and result in valid data being lost should these sections of the commit log need to be replayed.
> Solution:
> The chosen solution introduces a new sstable metadata (StatsMetadata) - originatingHostId (UUID), which is the local host id of the node on which the sstable was created, or null if not known. Commit log intervals from an sstable are taken into account during Commit Log replay only when the originatingHostId of the sstable matches the local node's hostId.
> For new sstables the originatingHostId is set according to StorageService's local hostId.
> For compacted sstables the originatingHostId set according to StorageService's local hostId, and only commit log intervals from local sstables is preserved in the resulting sstable.
> discovered by [~jakubzytka]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org