You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Chinmay Kulkarni (Jira)" <ji...@apache.org> on 2019/11/12 00:52:00 UTC

[jira] [Comment Edited] (PHOENIX-5546) TASK_TS being set as HConstants.LATEST_TIMESTAMP in SYSTEM.TASK table

    [ https://issues.apache.org/jira/browse/PHOENIX-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971933#comment-16971933 ] 

Chinmay Kulkarni edited comment on PHOENIX-5546 at 11/12/19 12:51 AM:
----------------------------------------------------------------------

There are 2 issues here.

First as Thomas said, we don't re-resolve SYSTEM tables and so when we get the server-side timestamp, we get HConstants.LATEST_TIMESTAMP [here|https://github.com/apache/phoenix/blob/30a67eec10f1012405143de2478b7554291eb0db/phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java#L820], which we later set as the ROW_TIMESTAMP inside generateMutations. We can solve this be ensuring to re-resolve SYSTEM tables that use the ROW_TIMESTAMP column.

After that fix, we would be able to repeatedly create and drop without an issue, *however there is another problem*.

When inserting the task and updating the status of the task from CREATED to STARTED, we explicitly set some columns like TASK_END_TS and TASK_DATA to null (see [this|https://github.com/apache/phoenix/blob/30a67eec10f1012405143de2478b7554291eb0db/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/TaskRegionObserver.java#L198]). Now since STORE_NULLS is false, this appears as a `DeleteColumn` marker on the table with the [same timestamp|https://github.com/apache/phoenix/blob/30a67eec10f1012405143de2478b7554291eb0db/phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java#L635] as the ROW_TIMESTAMP:

!created.png|width=600,height=150!

Now, when the task is completed, we add a TASK_END_TS and update the TASK_DATA, but those puts also have the same ROW_TIMESTAMP timestamp set and get masked by the previous delete markers:

!completed.png|width=600,height=150!

So, as far as Phoenix is considered, we've only modified the TASK_STATUS to "COMPLETED". The TASK_TS and TASK_DATA are not visible to us. The status transition looks like this in sqlline (after the updateCache fix to re-resolve the task table):

!sqlline.png|width=600,height=150!

One solution is to set STORE_NULLS to true for SYSTEM.TASK. Do you guys see any downside to this change (probably not, but would there be any issues since SYSTEM.TASK has a TTL set)? [~larsh] [~tdsilva] [~kozdemir]


was (Author: ckulkarni):
There are 2 issues here.

First as Thomas said, we don't re-resolve SYSTEM tables and so when we get the server-side timestamp, we get HConstants.LATEST_TIMESTAMP [here|https://github.com/apache/phoenix/blob/30a67eec10f1012405143de2478b7554291eb0db/phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java#L820], which we later set as the ROW_TIMESTAMP inside generateMutations. We can solve this be ensuring to re-resolve SYSTEM tables that use the ROW_TIMESTAMP column.

After that fix, we would be able to repeatedly create and drop without an issue, *however there is another problem*. 

When inserting the task and updating the status of the task from CREATED to STARTED, we explicitly set some columns like TASK_END_TS and TASK_DATA to null (see [this|https://github.com/apache/phoenix/blob/30a67eec10f1012405143de2478b7554291eb0db/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/TaskRegionObserver.java#L198]. Now since STORE_NULLS is false, this appears as a `DeleteColumn` marker on the table with the [same timestamp|https://github.com/apache/phoenix/blob/30a67eec10f1012405143de2478b7554291eb0db/phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java#L635] as the ROW_TIMESTAMP:

!created.png|width=600,height=150!

Now, when the task is completed, we add a TASK_END_TS and update the TASK_DATA, but those puts also have the same ROW_TIMESTAMP timestamp set and get masked by the previous delete markers:

!completed.png|width=600,height=150!

So, as far as Phoenix is considered, we've only modified the TASK_STATUS to "COMPLETED". The TASK_TS and TASK_DATA are not visible to us. The status transition looks like this in sqlline (after the updateCache fix to re-resolve the task table):

!sqlline.png|width=600,height=150!

One solution is to set STORE_NULLS to true for SYSTEM.TASK. Do you guys see any downside to this change (probably not, but would there be any issues since SYSTEM.TASK has a TTL set)? [~larsh] [~tdsilva] [~kozdemir]

> TASK_TS being set as HConstants.LATEST_TIMESTAMP in SYSTEM.TASK table
> ---------------------------------------------------------------------
>
>                 Key: PHOENIX-5546
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5546
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.15.0, 5.1.0
>            Reporter: Chinmay Kulkarni
>            Assignee: Chinmay Kulkarni
>            Priority: Blocker
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: MaxTs-repro-test.txt, PHOENIX-5546-4.x-HBase-1.3-v1.patch, completed-after-change.png, completed.png, created-after-change.png, created.png, sqlline-after-change.png, sqlline.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we upsert DropChildViewTask entries into SYSTEM.TASK, the TASK_TS field which is designated as a ROW_TIMESTAMP always gets the HConstants.LATEST_TIMESTAMP value instead of the current server-side wall clock time.
> *The main side-effect of this bug is, subsequent creation and dropping of the same base table will not upsert new DropChildViewTasks into the SYSTEM.TASK table.*
> Steps to reproduce:
>  1) Start HBase server with 4.15.0 Phoenix
>  2) Create a base table and a view on top of that base table:
> {code:sql}
> CREATE TABLE IF NOT EXISTS Z_BASE_TABLE (ID INTEGER NOT NULL PRIMARY KEY, HOST VARCHAR(10), FLAG BOOLEAN);
> CREATE VIEW Z_VIEW1 (col1 INTEGER, col2 INTEGER, col3 INTEGER, col4 INTEGER, col5 INTEGER) AS SELECT * FROM Z_BASE_TABLE WHERE ID>10;
> {code}
> 3) Drop the base table with the cascade option:
> {code:sql}
> DROP TABLE Z_BASE_TABLE CASCADE;
> {code}
> 4) Observe the SYSTEM.TASK table:
> {code:sql}
> SELECT TASK_TYPE, TASK_TS, TABLE_NAME, TASK_STATUS FROM SYSTEM.TASK;
> {code}
> --> gives the following:
> {code:sql}
> +------------+-------------------------------+---------------+--------------+
> | TASK_TYPE  |            TASK_TS            |  TABLE_NAME   | TASK_STATUS  |
> +------------+-------------------------------+---------------+--------------+
> | 1          | 292278994-08-16 23:12:55.807  | Z_BASE_TABLE  | COMPLETED    |
> +------------+-------------------------------+---------------+--------------+
> {code}
> That timestamp is basically HConstants.LATEST_TIMESTAMP.
> 5) Recreate the base table and view, then drop the base table, then observe SYSTEM.TASK again (Steps 2 to 4) and no new DropChildViewTask is added for the base table created the second time
> {code:sql}
> +------------+-------------------------------+---------------+--------------+
> | TASK_TYPE  |            TASK_TS            |  TABLE_NAME   | TASK_STATUS  |
> +------------+-------------------------------+---------------+--------------+
> | 1          | 292278994-08-16 23:12:55.807  | Z_BASE_TABLE  | COMPLETED    |
> +------------+-------------------------------+---------------+--------------+
> {code}
> Thus, the views are still there and this seems to be an issue with the ROW_TIMESTAMP being assigned HConstants.LATEST_TIMESTAMP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)