You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2015/12/02 03:50:11 UTC

[jira] [Commented] (PHOENIX-2478) Rows committed in transaction overlapping index creation are not populated

    [ https://issues.apache.org/jira/browse/PHOENIX-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035168#comment-15035168 ] 

James Taylor commented on PHOENIX-2478:
---------------------------------------

I can think of a few potential solutions:
# Special case the call to check if the schema is up to date at commit time to use the write pointer instead of the read pointer. In this case, the index would be noticed since it was added after the read pointer, but before the write pointer. This solution has several drawbacks:
#* an extra RPC would be required at commit time
#* a potential race condition exists between the index creation and the commit (there'd likely still be some potential for not seeing the index as we're seeing this occur for non transactional table sometimes - see PHOENIX-2446)
#* given that we want to make our DDL commands transactional, this would we an issue again then
# Hold off on building and marking an index as active for the transaction timeout period to give in progress transactions a chance to finish. This is obviously not idea, especially when creating an index over a small or empty table.
# Enhance Tephra's conflict detection to help with this.
#* One way would be to have a kind of "wildcard" row key that we could put in the change set for a table which would conflict with any rows that overlap the read pointer and write pointer timespan. The CREATE INDEX call would add this wildcard and our commit logic could handle the exception that would occur by resubmitting the commit (with the updated metadata).
#* Another possibility would be to allow an entry in the change set to be declared as a read versus a write. A read/read conflict would be allowed while a read/write or write/write wouldn't. Phoenix could use this by adding the metadata table row involved in the DML command to the change set as a "read" and the DDL command of creating an index on a table as a "write". Then we'd get an exception which we could react to if DML is being done on a table at the same time as DDL (i.e. index creation), but we wouldn't if two simultaneous DML commands are executed (we'd still get a conflict, of course, if the row keys of the data being mutated overlapped).

Seems like either of the last alternatives is what we really need to make DDL play nicely with DML. Any thoughts/ideas [~poornachandra], [~tdsilva]?

> Rows committed in transaction overlapping index creation are not populated
> --------------------------------------------------------------------------
>
>                 Key: PHOENIX-2478
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2478
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> For a reproducible case, see IndexIT.testCreateIndexAfterUpsertStarted() and the associated FIXME comments for PHOENIX-2446.
> The case that is failing is when a commit starts before an index exists, but commits after the index build is completed. For transactional data, this is problematic because the index gets a timestamp after the commit of the data table mutation and thus these mutations won't be seen during the commit. Also, when the index is being built, the data hasn't yet been committed and thus won't be part of the initial index build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)