You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Chinmay Kulkarni (Jira)" <ji...@apache.org> on 2020/11/20 22:43:00 UTC

[jira] [Comment Edited] (PHOENIX-6086) Take a snapshot of all SYSTEM tables before attempting to upgrade them

    [ https://issues.apache.org/jira/browse/PHOENIX-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236488#comment-17236488 ] 

Chinmay Kulkarni edited comment on PHOENIX-6086 at 11/20/20, 10:42 PM:
-----------------------------------------------------------------------

I created this Jira to extend safety during upgrades for all SYSTEM tables, so that we now would take a snapshot of each one. I just realized that we also then extended the restore-snapshot logic to all SYSTEM tables in case of an exception during EXECUTE UPGRADE (see [this|https://github.com/apache/phoenix/blob/0d0e86e7ba63d20e92d4e8c03259344a958dbcd1/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L3940-L3944]). Thinking about this a little bit, there can be various downsides to automatically restoring from the SYSTEM table snapshot, such as:

# Any DDLs issued since the upgrade began would be lost when we restore the snapshot of SYSTEM.CATALOG
# If we encounter an issue when upgrading any SYSTEM table, we would end up restoring the snapshots of all of them (I don't think there is necessarily a better way to handle this since we can't just restore the snapshot for the table whose upgrade failed or we'd be in a weird mixed metadata state)
# The point above also means that SYSTEM.SEQUENCE would be restored and that would break(?) sequences issued during this time.

I wanted to get your thoughts on how to handle this [~gjacoby] [~kadir] [~jisaac] [~vjasani] [~yanxinyi] [~sukumaddineni]. Since we currently don't log DDLs issued during the upgrade path and because of the problem with sequences I think for now, maybe it is safer to just keep the snapshots around and allow the operator to decide how to handle the upgrade failure rather than blindly forcing a restore from snapshots. What do you guys think?


was (Author: ckulkarni):
[~vjasani] I created this Jira to extend safety during upgrades for all SYSTEM tables, so that we now would take a snapshot of each one. I just realized that we also then extended the restore-snapshot logic to all SYSTEM tables in case of an exception during EXECUTE UPGRADE (see [this|https://github.com/apache/phoenix/blob/0d0e86e7ba63d20e92d4e8c03259344a958dbcd1/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L3940-L3944]). Thinking about this a little bit, there can be various downsides to automatically restoring from the SYSTEM table snapshot, such as:

# Any DDLs issued since the upgrade began would be lost when we restore the snapshot of SYSTEM.CATALOG
# If we encounter an issue when upgrading any SYSTEM table, we would end up restoring the snapshots of all of them (I don't think there is necessarily a better way to handle this since we can't just restore the snapshot for the table whose upgrade failed or we'd be in a weird mixed metadata state)
# The point above also means that SYSTEM.SEQUENCE would be restored and that would break(?) sequences issued during this time.

I wanted to get your thoughts on how to handle this [~gjacoby] [~kadir] [~jisaac]. Since we currently don't log DDLs issued during the upgrade path and because of the problem with sequences I think for now, maybe it is safer to just keep the snapshots around and allow the operator to decide how to handle the upgrade failure rather than blindly forcing a restore from snapshots. What do you guys think?

> Take a snapshot of all SYSTEM tables before attempting to upgrade them
> ----------------------------------------------------------------------
>
>                 Key: PHOENIX-6086
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6086
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.0.0, 4.15.0
>            Reporter: Chinmay Kulkarni
>            Assignee: Viraj Jasani
>            Priority: Critical
>             Fix For: 5.1.0, 4.16.0
>
>         Attachments: PHOENIX-6086.4.x.000.patch, PHOENIX-6086.master.000.patch, PHOENIX-6086.master.002.patch, PHOENIX-6086.master.003.patch
>
>
> Currently we only take a snapshot of SYSTEM.CATALOG before attempting to upgrade it (see [this|https://github.com/apache/phoenix/blob/1922895dfe5960dc025709b04acfaf974d3959dc/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L3718]). From 4.15 onwards we also store critical metadata information in other SYSTEM tables like SYSTEM.CHILD_LINK, so it is beneficial to also snapshot those tables before upgrading them henceforth.
> We also currently don't take a snapshot of SYSTEM.CATALOG on receiving an [UpgradeRequiredException|https://github.com/apache/phoenix/blob/1922895dfe5960dc025709b04acfaf974d3959dc/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L3685-L3707] which we should do.
> In case of any errors during the upgrade, we restore SYSTEM.CATALOG from this snapshot and we should extend this to all tables. In cases where the table didn't exist before the upgrade, we need to ensure it is dropped so that a subsequent upgrade attempt can start afresh.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)