You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Kadir OZDEMIR (Jira)" <ji...@apache.org> on 2019/11/07 06:03:00 UTC
[jira] [Commented] (PHOENIX-5562) The last concurrent update can complete the last write phase

    [ https://issues.apache.org/jira/browse/PHOENIX-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968951#comment-16968951 ] 

Kadir OZDEMIR commented on PHOENIX-5562:
----------------------------------------

In the current implementation, we have two pending-rows sets. The first one is at the table region level and the second one is at the batch level. The table region level set is a set of structures, one for each active row. This per row structure includes the row key, the timestamp of the last update on the row, and the number of concurrent updates on the row (i.e., the reference count). The second set is a simply set of row keys such that the corresponding rows for these row keys have at least one pending update at the time this batch is in the first phase. It turns out that we do not actually need the batch level set. All we need is that after acquiring the lock for the second time for a given update, we need to check if the update is the last update for the row using the region level set. if so, the last phase is executed. Otherwise, it is skipped.

> The last concurrent update can complete the last write phase
> ------------------------------------------------------------
>
>                 Key: PHOENIX-5562
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5562
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.1.0
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>
> From the consistent secondary indexing design (PHOENIX-5156) perspective, two or more updates on the same row are concurrent updates if and only if all of them have acquired the row lock for reading the data table row before any of of them acquires the row lock the second time for updating the data table. In other words, all of them are in the first update phase concurrently.
> In the current implementation, these updates can detect the existence of each other in two places
> (1) after acquiring the lock to read the existing row on the data table 
> (2) after acquiring the row lock to update the data table
> This allows all the concurrent updates to detect each other and complete first two update phases but skip the last update phase. This means the data table row will be updated by these updates but the corresponding index table rows will be left with the unverified status. Then, the read repair process will repair these unverified index rows during scans. Although this behavior leads to the correct end result, ideally we would like to see that the concurrent update with most recent timestamp proceeds with the last phase instead of leaving the index rows in the unverified status. This would reduce the number of unverified rows due to concurrent updates.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)