You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Kadir OZDEMIR (JIRA)" <ji...@apache.org> on 2019/01/04 19:31:00 UTC
[jira] [Commented] (PHOENIX-5018) Index mutations created by IndexTool will have wrong timestamps

    [ https://issues.apache.org/jira/browse/PHOENIX-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734512#comment-16734512 ] 

Kadir OZDEMIR commented on PHOENIX-5018:
----------------------------------------

*Background*

There are two ways to fully build an index: synchronously as part of the index create command, and asynchronously using IndexTool. The synchronous full build is done by using an UPSERT SELECT statement where data is read from the data table and inserted into the index table. The UPSERT SELECT code path does not include any logic special to index rebuild, and thus, the values for HBase timestamps for the insert operation are taken from the current wall clock. This is the root cause for getting incorrect timestamps when index mutations generated by UPSERT SELECT.

IndexTool is implemented using MapReduce. There are two types of mapper for IndexTool: PhoenixIndexImportMapper and PhoenixIndexImportDirectMapper. The former is used for the bulk loading option and the latter for the direct option of IndexTool. For both of these options, PhoenixIndexDBWriatable is used to read and write table rows. Reading rows is done through the ResultSet interface and writing is done through the PreparedStatement interface. Here, scanning of the data table, i.e., the SELECT statement, and inserting new rows into the index table, (the UPSERT statement) are executed separately. Since the PreparedStatement interface and thus the PhoenixPreparedStatement class do not have a method to specify the timestamps for individual columns, the index table rows gets the timestamps from the current wall clock. This is the root cause for getting incorrect timestamps when index mutations generated by using an UPSERT statement.

In addition to building index fully, there is also a way to rebuild index partially to recover from index write failures which can happen during data table updates. When such a failure happens on an index table, the index is disabled by setting the INDEX_STATE column of the row corresponding to the index table in SYSTEM.CATALOG.  Also the timestamp associated with the failed write is recorded in the INDEX_DISABLE_TIMESTAMP column of the same row. This is done in the UpdateIndexState method of IndexUtil. 

MetaDataRegionObserver runs a periodic task to go through SYSTEM.CATALOG to identify index tables that are ready for index rebuild. Using the INDEX_DISABLE_TIMESTAMP value, the rows of the data table is identified to be replayed to rebuild the index. During these replay writes, the timestamps values in the data table is correctly passed to the index table. In other words, the partial index rebuild does not have the timestamp problem existing in full index rebuilds. 

*Alternative Solutions*

There two main approaches to solve the timestamp problem. 

The first one is to enhance the PhoenixIndexDBWriatable class and the UPSERT and UPSERT SELECT code paths with the ability to retrieve and set HBase cell timestamps. This requires adding a new hint (e.g., TIMESTAMP) to UPSERT and UPSERT SELECT statements. This hint for UPSERT implies that the provided timestamp values should be used instead of using the current wall clock. For UPSERT SELECT, the hint means that cell timestamps should be retrieved by the SELECT statement and passed to the UPSERT statement. The following changes have been identified after a high level inspection of the code. 

PhoenixIndexDBWritable:
 * Change the type of the rowTs property from long to List<Long>
 * Change the methods accessing rowTs (i.e., write and getRowTs) accordingly
 * Add the timestamp hint the UPSERT statement
 * Pass the timestamps to the PreparedStatement argument. To do that unwrap it as PhoenixPreparedStatement and call setParamaterTimestamp(int parameterIndex, long timestamp) that would be introduced as part of this solution alternative

PhoenixPreparedStatment:
 * Add a method (e.g., void setParameterTimestamp(int parameterIndex, long timestamp))
 * Add a property called timestamps (List<long> timestamps)

PhoenixSQL grammar:
 * Add a new hint for preserving timestamps when populating an index from a data table, e.g., TIMESTAMP

HintNode:
 * Add the new hint for timestamps

UpsertCompiler:
 * Various part of the UpsertCompliler code needs to be changed when the timestamp hint is given (1) to retrieve cell timestamps and create mutations with these timestamps for the UPSERT SELECT statements and (2) to use the parameter timestamps in PhoenixPreparedStatment objects to create the mutations for UPSERT statements.

This list may not be complete and more changes may be required. As can be seen from the identified changes, the first alternative requires surgical changes in the very core of Phoenix. It requires changes in the SQL grammar (in the external API) which may be solely used for internal purposes.

The second alternative solution is to use MetaDataRegionObserver for both full index build and index partial rebuild. This solution implies abandoning using UPSERT SELECT for synchronous index full build and MapReduce for asynchronous index full build. Actually, with this alternative, we do not need to make a distinction between partial and full index builds or asynchronous or synchronous builds since there will be one code path that is used for all purposes. This solution requires setting INDEX_DISABLE_TIMESTAMP to a predefined value (or data table creation time) after an index table is created so that all the rows of the data table will considered by MetaDataRegionObserver (i.e., the partial rebuild will be turn into full index build). This also implies that all index full builds will be asynchronous, i.e., the full build will not happen in the context of index table create. 

The second solution requires less code changes and simplifies the current index implementation by eliminating the code and bugs for IndexTool. The main drawback of the second solution is that current MetaDataRegionObserver is single threaded and rebuilds one index at a time. However, it can be enhanced to be multithreaded which will benefit not only full index builds but also index rebuilds. Thus, I am inclining to implement the second alternative but would like to hear comments as the implications of the second alternative go beyond fixing this issue. I also want to make sure I did not miss something important in my analysis, [~gjacoby],[~vincentpoon].

> Index mutations created by IndexTool will have wrong timestamps
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5018
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5018
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.14.0, 5.0.0
>            Reporter: Geoffrey Jacoby
>            Assignee: Kadir OZDEMIR
>            Priority: Major
>
> When doing a full rebuild (or initial async build) on an index using the IndexTool and PhoenixIndexImportDirectMapper, we generate the index mutations by creating an UPSERT SELECT query from the base table to the index, then taking the Mutations from it and inserting it directly into the index via an HBase HTable. 
> The timestamps of the Mutations use the default HBase behavior, which is to take the current wall clock. However, the timestamp of an index KeyValue should use the timestamp of the initial KeyValue in the base table.
> Having base table and index timestamps out of sync can cause all sorts of weird side effects, such as if the base table has data with an expired TTL that isn't expired in the index yet. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)