You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Istvan Toth (Jira)" <ji...@apache.org> on 2021/01/15 12:08:00 UTC

[jira] [Updated] (PHOENIX-5743) Concurrent read repairs on the same index row should be idempotent

     [ https://issues.apache.org/jira/browse/PHOENIX-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Istvan Toth updated PHOENIX-5743:
---------------------------------
    Fix Version/s: 5.1.0

> Concurrent read repairs on the same index row should be idempotent
> ------------------------------------------------------------------
>
>                 Key: PHOENIX-5743
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5743
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.0.0, 4.14.3
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Critical
>             Fix For: 5.1.0
>
>         Attachments: PHOENIX-5743.4.x-HBase-1.3.001.patch, PHOENIX-5743.master.001.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It is possible that two or more read repairs can work on the same row. Regardless of how many read repairs concurrently happen on this row, the end result should be the same.  The current implementation does not satisfy this property in one case. This can happen with the following steps:
>  # An update on a data table row fails due to the data table row write failure (the phase two write). Since the phase 1 (unverified index write) has completed here, this leaves an unverified row in the index table.
>  # Two (or more) concurrent queries on this table scans this unverified index row. 
>  # Each query triggers a separate read repair activity.
>  # The first one deletes the unverified row correctly.
>  # The subsequent ones may leave a wrong delete marker which corrupts this index row.
> Step 5 can happen because of two bugs in deleteRowIfAgedEnough() in GlobalIndexChecker.GlobalIndexScanner:
>  # "deleteRowScan.setTimeRange(0, ts + 1);" should read "deleteRowScan.setTimeRange(ts, ts + 1);". This will make sure that the first read repair will retrieve the cells of the unverified row with the timestamp ts but the subsequent read repair gets either the same set of cells the first one got, or no cell (i.e., empty row).
>  # If the unverified row has been already deleted, deleteRowIfAgedEnough() should do nothing and return. However, the current implementation either the read repair will retrieve the previous row version (i.e., previous to the unverified row) and leaves DeleteColumn markers for wrong cells,  or it will get no cells (if no previous row version exists) and leaves a DeleteFamily marker which will deletes all previous versions of the row if such rows are inserted back by index rebuild.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)