You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Kadir OZDEMIR (Jira)" <ji...@apache.org> on 2019/11/07 11:24:00 UTC
[jira] [Comment Edited] (PHOENIX-5562) Simplify detection of concurrent updates on data tables with indexes

    [ https://issues.apache.org/jira/browse/PHOENIX-5562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968951#comment-16968951 ] 

Kadir OZDEMIR edited comment on PHOENIX-5562 at 11/7/19 11:23 AM:
------------------------------------------------------------------

In the current implementation, we have two pending-rows sets. The first one is at the table region level and the second one is at the batch level. The table region level set is a set of structures, one for each active row. This per row structure includes the row key, the timestamp of the last update on the row, and the number of concurrent updates on the row (i.e., the reference count). The second set is a simply set of row keys such that the corresponding rows for these row keys have at least one pending update at the time this batch is in the first phase. It turns out that we do not actually need the batch level set. All we need is that after acquiring the lock for the second time for a given update, we need to check if there are concurrent updates on the row using the region level set. if so, the last phase is skipped.


was (Author: kozdemir):
In the current implementation, we have two pending-rows sets. The first one is at the table region level and the second one is at the batch level. The table region level set is a set of structures, one for each active row. This per row structure includes the row key, the timestamp of the last update on the row, and the number of concurrent updates on the row (i.e., the reference count). The second set is a simply set of row keys such that the corresponding rows for these row keys have at least one pending update at the time this batch is in the first phase. It turns out that we do not actually need the batch level set. All we need is that after acquiring the lock for the second time for a given update, we need to check if the update is the last update for the row using the region level set. if so, the last phase is executed. Otherwise, it is skipped.

> Simplify detection of concurrent updates on data tables with indexes
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-5562
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5562
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.1.0
>            Reporter: Kadir OZDEMIR
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> From the consistent secondary indexing design (PHOENIX-5156) perspective, two or more updates on the same row are concurrent updates if and only if all of them have acquired the row lock for reading the data table row before any of of them acquires the row lock the second time for updating the data table. In other words, all of them are in the first update phase concurrently.
> In the current implementation, these updates can detect the existence of each other in two places
> (1) after acquiring the lock to read the existing row on the data table 
> (2) after acquiring the row lock to update the data table
> This allows all the concurrent updates to detect each other and complete first two update phases but skip the last update phase. This means the data table row will be updated by these updates but the corresponding index table rows will be left with the unverified status. Then, the read repair process will repair these unverified index rows during scans.
> The detection of concurrent updates can be simplified and done one in one place, i.e., after acquiring the row lock to update the data table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)