You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/03/04 04:51:40 UTC

[jira] [Resolved] (KUDU-969) Bootstrap may occasionally mis-identify previously flushed updates

     [ https://issues.apache.org/jira/browse/KUDU-969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved KUDU-969.
------------------------------
       Resolution: Fixed
    Fix Version/s: 0.8.0

> Bootstrap may occasionally mis-identify previously flushed updates
> ------------------------------------------------------------------
>
>                 Key: KUDU-969
>                 URL: https://issues.apache.org/jira/browse/KUDU-969
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: 0.5.0, 0.6.0, 0.7.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.8.0
>
>
> tablet_bootstrap has the following TODO:
> {code}
>    if (!FindCopy(flushed_dms_by_drs_id_, target.rs_id(), &last_durable_dms_id)) {
>       // if we have no data about this RowSet, then it must have been flushed and
>       // then deleted.
>       // TODO: how do we avoid a race where we get an update on a rowset before
>       // it is persisted? add docs about the ordering of flush.
>       return true;
>     }
> {code}
> alter_table-randomized-test, when looped in TSAN, seems to fail after around 30 iterations with a sequence like:
> - a compaction enters "duplicating" phase
> - an update arrives, which is duplicated into the old and new rowsets ids
> -- the new rowset ID isn't part of the metadata yet
> - we get kill -9ed before we flush the metadata from the compaction
> It seems that we then mis-identify the update to the "new" store as already flushed, which can cause the bootstrap to fail (or maybe cause a missing update).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)