You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Chinmay Kulkarni (Jira)" <ji...@apache.org> on 2019/11/12 00:52:00 UTC
[jira] [Comment Edited] (PHOENIX-5546) TASK_TS being set as
HConstants.LATEST_TIMESTAMP in SYSTEM.TASK table
[ https://issues.apache.org/jira/browse/PHOENIX-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971933#comment-16971933 ]
Chinmay Kulkarni edited comment on PHOENIX-5546 at 11/12/19 12:51 AM:
----------------------------------------------------------------------
There are 2 issues here.
First as Thomas said, we don't re-resolve SYSTEM tables and so when we get the server-side timestamp, we get HConstants.LATEST_TIMESTAMP [here|https://github.com/apache/phoenix/blob/30a67eec10f1012405143de2478b7554291eb0db/phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java#L820], which we later set as the ROW_TIMESTAMP inside generateMutations. We can solve this be ensuring to re-resolve SYSTEM tables that use the ROW_TIMESTAMP column.
After that fix, we would be able to repeatedly create and drop without an issue, *however there is another problem*.
When inserting the task and updating the status of the task from CREATED to STARTED, we explicitly set some columns like TASK_END_TS and TASK_DATA to null (see [this|https://github.com/apache/phoenix/blob/30a67eec10f1012405143de2478b7554291eb0db/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/TaskRegionObserver.java#L198]). Now since STORE_NULLS is false, this appears as a `DeleteColumn` marker on the table with the [same timestamp|https://github.com/apache/phoenix/blob/30a67eec10f1012405143de2478b7554291eb0db/phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java#L635] as the ROW_TIMESTAMP:
!created.png|width=600,height=150!
Now, when the task is completed, we add a TASK_END_TS and update the TASK_DATA, but those puts also have the same ROW_TIMESTAMP timestamp set and get masked by the previous delete markers:
!completed.png|width=600,height=150!
So, as far as Phoenix is considered, we've only modified the TASK_STATUS to "COMPLETED". The TASK_TS and TASK_DATA are not visible to us. The status transition looks like this in sqlline (after the updateCache fix to re-resolve the task table):
!sqlline.png|width=600,height=150!
One solution is to set STORE_NULLS to true for SYSTEM.TASK. Do you guys see any downside to this change (probably not, but would there be any issues since SYSTEM.TASK has a TTL set)? [~larsh] [~tdsilva] [~kozdemir]
was (Author: ckulkarni):
There are 2 issues here.
First as Thomas said, we don't re-resolve SYSTEM tables and so when we get the server-side timestamp, we get HConstants.LATEST_TIMESTAMP [here|https://github.com/apache/phoenix/blob/30a67eec10f1012405143de2478b7554291eb0db/phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java#L820], which we later set as the ROW_TIMESTAMP inside generateMutations. We can solve this be ensuring to re-resolve SYSTEM tables that use the ROW_TIMESTAMP column.
After that fix, we would be able to repeatedly create and drop without an issue, *however there is another problem*.
When inserting the task and updating the status of the task from CREATED to STARTED, we explicitly set some columns like TASK_END_TS and TASK_DATA to null (see [this|https://github.com/apache/phoenix/blob/30a67eec10f1012405143de2478b7554291eb0db/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/TaskRegionObserver.java#L198]. Now since STORE_NULLS is false, this appears as a `DeleteColumn` marker on the table with the [same timestamp|https://github.com/apache/phoenix/blob/30a67eec10f1012405143de2478b7554291eb0db/phoenix-core/src/main/java/org/apache/phoenix/execute/MutationState.java#L635] as the ROW_TIMESTAMP:
!created.png|width=600,height=150!
Now, when the task is completed, we add a TASK_END_TS and update the TASK_DATA, but those puts also have the same ROW_TIMESTAMP timestamp set and get masked by the previous delete markers:
!completed.png|width=600,height=150!
So, as far as Phoenix is considered, we've only modified the TASK_STATUS to "COMPLETED". The TASK_TS and TASK_DATA are not visible to us. The status transition looks like this in sqlline (after the updateCache fix to re-resolve the task table):
!sqlline.png|width=600,height=150!
One solution is to set STORE_NULLS to true for SYSTEM.TASK. Do you guys see any downside to this change (probably not, but would there be any issues since SYSTEM.TASK has a TTL set)? [~larsh] [~tdsilva] [~kozdemir]
> TASK_TS being set as HConstants.LATEST_TIMESTAMP in SYSTEM.TASK table
> ---------------------------------------------------------------------
>
> Key: PHOENIX-5546
> URL: https://issues.apache.org/jira/browse/PHOENIX-5546
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.15.0, 5.1.0
> Reporter: Chinmay Kulkarni
> Assignee: Chinmay Kulkarni
> Priority: Blocker
> Fix For: 4.15.0, 5.1.0
>
> Attachments: MaxTs-repro-test.txt, PHOENIX-5546-4.x-HBase-1.3-v1.patch, completed-after-change.png, completed.png, created-after-change.png, created.png, sqlline-after-change.png, sqlline.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When we upsert DropChildViewTask entries into SYSTEM.TASK, the TASK_TS field which is designated as a ROW_TIMESTAMP always gets the HConstants.LATEST_TIMESTAMP value instead of the current server-side wall clock time.
> *The main side-effect of this bug is, subsequent creation and dropping of the same base table will not upsert new DropChildViewTasks into the SYSTEM.TASK table.*
> Steps to reproduce:
> 1) Start HBase server with 4.15.0 Phoenix
> 2) Create a base table and a view on top of that base table:
> {code:sql}
> CREATE TABLE IF NOT EXISTS Z_BASE_TABLE (ID INTEGER NOT NULL PRIMARY KEY, HOST VARCHAR(10), FLAG BOOLEAN);
> CREATE VIEW Z_VIEW1 (col1 INTEGER, col2 INTEGER, col3 INTEGER, col4 INTEGER, col5 INTEGER) AS SELECT * FROM Z_BASE_TABLE WHERE ID>10;
> {code}
> 3) Drop the base table with the cascade option:
> {code:sql}
> DROP TABLE Z_BASE_TABLE CASCADE;
> {code}
> 4) Observe the SYSTEM.TASK table:
> {code:sql}
> SELECT TASK_TYPE, TASK_TS, TABLE_NAME, TASK_STATUS FROM SYSTEM.TASK;
> {code}
> --> gives the following:
> {code:sql}
> +------------+-------------------------------+---------------+--------------+
> | TASK_TYPE | TASK_TS | TABLE_NAME | TASK_STATUS |
> +------------+-------------------------------+---------------+--------------+
> | 1 | 292278994-08-16 23:12:55.807 | Z_BASE_TABLE | COMPLETED |
> +------------+-------------------------------+---------------+--------------+
> {code}
> That timestamp is basically HConstants.LATEST_TIMESTAMP.
> 5) Recreate the base table and view, then drop the base table, then observe SYSTEM.TASK again (Steps 2 to 4) and no new DropChildViewTask is added for the base table created the second time
> {code:sql}
> +------------+-------------------------------+---------------+--------------+
> | TASK_TYPE | TASK_TS | TABLE_NAME | TASK_STATUS |
> +------------+-------------------------------+---------------+--------------+
> | 1 | 292278994-08-16 23:12:55.807 | Z_BASE_TABLE | COMPLETED |
> +------------+-------------------------------+---------------+--------------+
> {code}
> Thus, the views are still there and this seems to be an issue with the ROW_TIMESTAMP being assigned HConstants.LATEST_TIMESTAMP.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)