You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Thomas D'Silva (JIRA)" <ji...@apache.org> on 2016/01/22 23:20:39 UTC

[jira] [Comment Edited] (PHOENIX-2582) Creating an index while a batch of rows is being written leads to missing rows in the index table

    [ https://issues.apache.org/jira/browse/PHOENIX-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113226#comment-15113226 ] 

Thomas D'Silva edited comment on PHOENIX-2582 at 1/22/16 10:20 PM:
-------------------------------------------------------------------

Attaching a possible solution from a email conversation with [~apurtell]

In lieu of an (external) transaction manager, maybe you could run a Procedure that must complete before the index create is declared successful? Procedure is HBase's i?>internal coordination framework. HBase 0.98 and 1.0 have ProcedureV1. HBase 1.1+ has ProcedureV2. 

Your procedure workers would set the writestate on each region to readonly, wait for in flight writes to finish, and then join the barrier. Once inside the barrier your workers could make the index related state changes, or just return if no further work needed. Your procedure workers would reset writestate in the cleanup callback. Your coordinator (in the master) can wait on a monitor for global completion or poll on a completion status check. Note Procedures will complete in either successful or failed state. Failure >may be explicit (worker posted failure notice) or a timeout. If failed, you'll need to retry. Once one of these has completed successfully, you would be good. 


was (Author: tdsilva):
Attaching a possible solution from a email conversation with [~apurtell]

>In lieu of an (external) transaction manager, maybe you could run a Procedure that must complete before the index create is declared successful? Procedure is HBase's i?>internal coordination framework. HBase 0.98 and 1.0 have ProcedureV1. HBase 1.1+ has ProcedureV2. 
>
>Your procedure workers would set the writestate on each region to readonly, wait for in flight writes to finish, and then join the barrier. Once inside the barrier your workers >could make the index related state changes, or just return if no further work needed. Your procedure workers would reset writestate in the cleanup callback. Your coordinator >(in the master) can wait on a monitor for global completion or poll on a completion status check. Note Procedures will complete in either successful or failed state. Failure >may be explicit (worker posted failure notice) or a timeout. If failed, you'll need to retry. Once one of these has completed successfully, you would be good. 

> Creating an index while a batch of rows is being written leads to missing rows in the index table
> -------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2582
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2582
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Thomas D'Silva
>
> If we create an index while we are upserting rows to the table its possible we can miss writing corresponding rows to the index table. 
> If a region server is writing a batch of rows and we create an index just before the batch is written we will miss writing that batch to the index table. This is because we run the inital UPSERT SELECT to populate the index with an SCN that we get from the server which will be before the timestamp the batch of rows is written. 
> We need to figure out if there is a way to determine that are pending batches have been written before running the UPSERT SELECT to do the initial index population.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)