You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Kadir OZDEMIR (Jira)" <ji...@apache.org> on 2019/09/15 04:05:00 UTC
[jira] [Comment Edited] (PHOENIX-5027) PhoenixIndexImportDirectMapper retried mappers can succeed without inserting all index data

    [ https://issues.apache.org/jira/browse/PHOENIX-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929878#comment-16929878 ] 

Kadir OZDEMIR edited comment on PHOENIX-5027 at 9/15/19 4:04 AM:
-----------------------------------------------------------------

[~gjacoby] and [~vincentpoon] , Regarding the scenario in the description of this bug, I think the following happened:

1. DROP/CREATE put index in the BUILDING state.

2. During heavy split both index rebuilds and regular writes failed after retries. The rebuild write failures put index in the DISABLED state due to PHOENIX-5473. If the index were mutable, the regular write failures would have also put the index in the DISABLED state.

3. Failed mapper tasks were retried and succeeded.

4. Regular writes continued to update the data table but index updates were skipped for these updates as the index was in the DISABLED state.

5. Finally MR job completed and changed the index state to ACTIVE. 

6. RowCounter and IndexScrutinyTool showed millions of rows that were missing from the index, with keys that imply they were part of the failed mappers because the index updates were skipped during the regular data table updates (i.e., because of 2).

Suggested solution besides fixing PHOENIX-5473, IndexTool should check the current index state and set the state based on the current state. Actually, this should have been done everywhere the index state changes. Index state transitions should be atomic.

Note, the new index design is immune to this type of problems as it never disables the indexes. 


was (Author: kozdemir):
[~gjacoby] and [~vincentpoon] , Regarding the scenario in the description of this bug, I think the following happened:

1. DROP/CREATE put index in the BUILDING state.

2. During heavy split both index rebuilds and regular writes failed after retries. The regular write failures put index in the DISABLED state.

3. Failed mapper tasks were retried and succeeded.

4. Regular writes continued to update the data table but index updates were skipped for these updates as the index was in the DISABLED state.

5. Finally MR job completed and changed the index state to ACTIVE. 

6. RowCounter and IndexScrutinyTool showed millions of rows that were missing from the index, with keys that imply they were part of the failed mappers because the index updates were skipped during the regular data table updates (i.e., because of 2).

Suggested solution, IndexTool should check the current index state and set the state based on the current state. Actually, this should have been done everywhere the index state changes. Index state transitions should be atomic.

Note, the new index design is immune to this type of problems as it never disables the indexes. 

> PhoenixIndexImportDirectMapper retried mappers can succeed without inserting all index data
> -------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5027
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5027
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Geoffrey Jacoby
>            Assignee: Geoffrey Jacoby
>            Priority: Major
>
> On two recent occasions I've rebuilt a large global immutable index by doing a DROP/CREATE and ended up with missing index data, though it doesn't happen every time. Here's what happened:
> 1. PhoenixMRJobSubmitter correctly detects the index rebuild is necessary, and invokes IndexTool.
> 2. IndexTool enqueues a MapReduce job using PhoenixIndexImportDirectMapper
> 3. Some mappers fail because of timeouts due to heavy splitting on the new index table
> 4. Those mappers are retried and succeed. The MR job as a whole completes successfully.
> 5. RowCounter and IndexScrutinyTool show millions of rows are missing from the index, with keys that imply they were part of the failed mappers
> Aside from the timestamp glitch I pointed out in PHOEIX-5018, the code in PhoenixIndexImportDirectMapper _looks_ idempotent on a rerun, so I've been struggling to find the cause of the missing index data. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)