You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2016/02/05 10:23:39 UTC

[jira] [Commented] (PHOENIX-2635) Partial index rebuild doesn't delete prior index row

    [ https://issues.apache.org/jira/browse/PHOENIX-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133888#comment-15133888 ] 

James Taylor commented on PHOENIX-2635:
---------------------------------------

In looking closer at the existing mechanism, I think it can be salvaged without too many changes. Since the thread that does the partial rebuild is associated with the SYSTEM.CATALOG table and there's only a single region of this table (though that won't always be the case), the code can continue to live in MetaDataRegionObserver.

Here are the changes that are necessary
- Have a way of opening a PhoenixConnection that won't trigger an upgrade. There may be a new Phoenix server jar that was deployed but running against old Phoenix client jars (in which case triggering an upgrade would be premature). The upgrade is triggered in ConnectionQueryServicesImpl.init().
- Run a raw scan over the data table specifying the min time range based on the INDEX_DISABLE_TIMESTAMP timestamp. You need a raw scan because you need to replay the mutations that occurred (including Deletes).
- Replay all mutations as-is by just submitting then again through a batchMutation, but make sure to attach the required index metadata to them. This will trigger the coprocessors for doing index maintenance. See MutationState.send() method here for how to get the index maintainer metadata to attach:
{code}
        PTable dataTable = PhoenixRuntime.getTable(connection, fullTableNameOfDataTable);
        dataTable.getIndexMaintainers(indexMetaDataPtr, connection);
        byte[] attribValue = ByteUtil.copyKeyBytesIfNecessary(indexMetaDataPtr);
        byte[] uuidValue = ServerCacheClient.generateId();

        for (Mutation mutation : mutations) { // attach metadata to every mutation
            mutation.setAttribute(PhoenixIndexCodec.INDEX_UUID, uuidValue);
            mutation.setAttribute(PhoenixIndexCodec.INDEX_MD, attribValue);
        }
{code}
- It seems like you wouldn't need the INDEX_FAILURE_HANDLING_REBUILD_OVERLAP_TIME_ATTRIB to me. You'd just need to use INDEX_DISABLE_TIMESTAMP-1.

> Partial index rebuild doesn't delete prior index row
> ----------------------------------------------------
>
>                 Key: PHOENIX-2635
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2635
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> The partial rebuild index feature for mutable secondary indexes does not do the correct index maintenance. We currently only insert the new index rows based on the current data row values which would not correctly remove the previous index row (thus leading to an invalid index). Instead, we should replay the data row mutations so that the coprocessors generate the correct deletes and updates.
> Also, instead of *every* region running the partial index rebuild, we should have each region only replay their own data mutations so that we're not duplicating work.
> A third (and perhaps most serious) issue is that the partial index rebuild could trigger the upgrade code before a cluster is ready to be upgraded. We'll definitely want to prevent that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)