You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2017/05/11 23:51:04 UTC
[jira] [Updated] (PHOENIX-3847) Handle out of order rows during index maintenance

     [ https://issues.apache.org/jira/browse/PHOENIX-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Taylor updated PHOENIX-3847:
----------------------------------
    Description: 
Based on the investigation and work done in PHOENIX-3825 plus the existence of the ignoreNewerMutations flag, it seems that out of order rows are not handled correctly during index maintenance. When the user handles replaying failed batches, we force them to submit them in timestamp order. As long as the user provides the original timestamp, the order shouldn't matter. Regardless of the order the server processes data table mutations, the resulting index rows should be the same and should purely be based on the cell time stamp of the data rows. Ideally, we shouldn't need the ignoreNewerMutations flag at all. Perhaps that was the intent with IndexUpdateManager.fixUpCurrentUpdates(), but it doesn't to be working.

Would it work to simply generate all the index rows for the mutating data rows for all versions? We should walk through a series of examples to see if this would work.  For example, with the following data table:

|Type|RowKey|Value|Timestamp
| Put | 1 | A | 1000
| Put | 1 | C | 3000

the index table would look like this:

|Type|RowKey|Timestamp
| Put | A,1 | 1000
| Del | A,1 | 3000
| Put | C,1 | 3000

Then if a Put comes in out of order at 2000, the data table would look like this:

|Type|RowKey|Value|Timestamp
| Put | 1 | A | 1000
| Put | 1 | B | 2000
| Put | 1 | C | 3000

and the index table should look like this:

|Type|RowKey|Timestamp
| Put | A,1 | 1000
| Del | A,1 | 2000
| Put | B,1 | 2000
| Del | B,1 | 3000
| Put | C,1 | 3000

Given that we can't reverse Delete markers, I'm not sure we can get there completely. We'd still have a Delete of A,1 @ 3000. But perhaps this is not a problem? We'd need to play this out further and include scenarios with row delete as well.


  was:
Based on the investigation and work done in PHOENIX-3825 plus the existence of the ignoreNewerMutations flag, it seems that out of order rows are not handled correctly during index maintenance. Regardless of the order the server processes data table mutations, the resulting index rows should be the same and should purely be based on the cell time stamp of the data rows. Ideally, we shouldn't need the ignoreNewerMutations flag at all. Perhaps that was the intent with IndexUpdateManager.fixUpCurrentUpdates(), but it doesn't to be working.

Would it work to simply generate all the index rows for the mutating data rows for all versions? We should walk through a series of examples to see if this would work.  For example, with the following data table:

|Type|RowKey|Value|Timestamp
| Put | 1 | A | 1000
| Put | 1 | C | 3000

the index table would look like this:

|Type|RowKey|Timestamp
| Put | A,1 | 1000
| Del | A,1 | 3000
| Put | C,1 | 3000

Then if a Put comes in out of order at 2000, the data table would look like this:

|Type|RowKey|Value|Timestamp
| Put | 1 | A | 1000
| Put | 1 | B | 2000
| Put | 1 | C | 3000

and the index table should look like this:

|Type|RowKey|Timestamp
| Put | A,1 | 1000
| Del | A,1 | 2000
| Put | B,1 | 2000
| Del | B,1 | 3000
| Put | C,1 | 3000

Given that we can't reverse Delete markers, I'm not sure we can get there completely. We'd still have a Delete of A,1 @ 3000. But perhaps this is not a problem? We'd need to play this out further and include scenarios with row delete as well.



> Handle out of order rows during index maintenance
> -------------------------------------------------
>
>                 Key: PHOENIX-3847
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3847
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> Based on the investigation and work done in PHOENIX-3825 plus the existence of the ignoreNewerMutations flag, it seems that out of order rows are not handled correctly during index maintenance. When the user handles replaying failed batches, we force them to submit them in timestamp order. As long as the user provides the original timestamp, the order shouldn't matter. Regardless of the order the server processes data table mutations, the resulting index rows should be the same and should purely be based on the cell time stamp of the data rows. Ideally, we shouldn't need the ignoreNewerMutations flag at all. Perhaps that was the intent with IndexUpdateManager.fixUpCurrentUpdates(), but it doesn't to be working.
> Would it work to simply generate all the index rows for the mutating data rows for all versions? We should walk through a series of examples to see if this would work.  For example, with the following data table:
> |Type|RowKey|Value|Timestamp
> | Put | 1 | A | 1000
> | Put | 1 | C | 3000
> the index table would look like this:
> |Type|RowKey|Timestamp
> | Put | A,1 | 1000
> | Del | A,1 | 3000
> | Put | C,1 | 3000
> Then if a Put comes in out of order at 2000, the data table would look like this:
> |Type|RowKey|Value|Timestamp
> | Put | 1 | A | 1000
> | Put | 1 | B | 2000
> | Put | 1 | C | 3000
> and the index table should look like this:
> |Type|RowKey|Timestamp
> | Put | A,1 | 1000
> | Del | A,1 | 2000
> | Put | B,1 | 2000
> | Del | B,1 | 3000
> | Put | C,1 | 3000
> Given that we can't reverse Delete markers, I'm not sure we can get there completely. We'd still have a Delete of A,1 @ 3000. But perhaps this is not a problem? We'd need to play this out further and include scenarios with row delete as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)