You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2016/08/11 19:28:20 UTC
[jira] [Comment Edited] (PHOENIX-3165) System table integrity check and repair tool

    [ https://issues.apache.org/jira/browse/PHOENIX-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417792#comment-15417792 ] 

Andrew Purtell edited comment on PHOENIX-3165 at 8/11/16 7:27 PM:
------------------------------------------------------------------

bq. Unfortunately, that's not possible across all the features of Phoenix:

No integrity check or repair tool will handle 100% of the cases. Indeed there will be cases where fallback to manual recovery approaches will be necessary and some aspects of metadata tricky or not amenable at all to automated repair approaches. That said, I'm comfortable stating long operator experience with 'fsck' class tools over the history of operation of computing systems demonstrates their utility. Take HBase fsck as an example. I know it to only cover a subset of possible problems, but it allowed me to recover a critical production system in minutes. Imagine if I only had as recourse hacking of META table HFiles with the Hadoop fsshell! It would have been hours of high profile downtime as opposed to minutes, which was serious enough. 

bq. Corruption can take many forms, though. I think it's important to understand the root cause of the corruption, as IMHO prevention is the best medicine.

It's not possible to prevent corruption. There are so many opportunities, so many chains of events that lead to this outcome. Like I said even with a recovery tool there are going to be cases where the tool won't help, but on the other hand there are cases - and with care and attention, those likely to be common - where a recovery tool will allow the user to bring their systems back to an available state very quickly. Snapshots and backups increase the margin of safety overall but are never a quick nor complete solution for system recovery. By definition they miss latest updates. Recovering from latest state by applying a dynamically analyzed delta is faster and a deft surgical tool compared to the big drop-and-restore hammer.

Metadata repair tools are not different from index rebuild tools. The RDBMS system has metadata. The system is meant for mission critical operation. The system requires operational tools that meet that objective.

bq. Updating HBase metadata with every change to the SYSTEM.CATALOG would put a huge drag on the system.

How so? 

bq. If we're going to do something like that, better to change the design and keep the system-of-record in zookeeper instead.

I don't think "system of record" is a use case suitable for ZooKeeper, and I believe this to be a common understanding. It's certainly a frequent conclusion in system design discussions of which I have been a part. That is not a knock on ZooKeeper. It is rock solid as a coordination and consensus service.

bq. Best to have Phoenix-level APIs instead that can guarantee that the system catalog is kept in a valid state with commits being performed transactionally.

Sure, "Does not depend on the Phoenix client" is rephrased alternatively and hopefully better as "Is Phoenix code using blessed repair mechanisms that do not depend on the normal client code paths"

I don't think we can depend on transactional functionality to always be in a workable state, if you are referring to the 4.8+ transactional functionality that requires Tephra and its metadata to be in working order. 

bq.  If the table becomes corrupt, it'd be potentially ambiguous on how to fix it. In theory, I suppose, a tool could let the user choose between the possible choices it'd make to fix it.

This is often the case in other 'fsck' style applications and a very common option. Consider Windows CHKDSK and the Linux fsck suite as two very widely deployed examples of this.


was (Author: apurtell):
bq. Unfortunately, that's not possible across all the features of Phoenix:

No integrity check or repair tool will handle 100% of the cases. Indeed there will be cases where fallback to manual recovery approaches will be necessary and some aspects of metadata tricky or not amenable at all to automated repair approaches. That said, I'm comfortable stating long operator experience with 'fsck' class tools over the history of operation of computing systems demonstrates their utility. Take HBase fsck as an example. I know it to only cover a subset of possible problems, but it allowed me to recover a critical production system in minutes. Imagine if I only had as recourse hacking of META table HFiles with the Hadoop fsshell! It would have been hours of high profile downtime as opposed to minutes, which was serious enough. 

bq. Corruption can take many forms, though. I think it's important to understand the root cause of the corruption, as IMHO prevention is the best medicine.

It's not possible to prevent corruption. There are so many opportunities, so many chains of events that lead to this outcome. Like I said even with a recovery tool there are going to be cases where the tool won't help, but on the other hand there are cases - and with care and attention, those likely to be common - where a recovery tool will allow the user to bring their systems back to an available state very quickly. Snapshots and backups increase the margin of safety overall but are never a quick nor complete solution for system recovery. By definition they miss latest updates. Recovering from latest state by applying a dynamically analyzed delta is faster and a deft surgical tool compared to the big drop-and-restore hammer.

Metadata repair tools are not different from index rebuild tools. The RDBMS system has metadata. The system is meant for mission critical operation. The system requires operational tools that meet that objective.

bq. Updating HBase metadata with every change to the SYSTEM.CATALOG would put a huge drag on the system.

How so? 

bq. If we're going to do something like that, better to change the design and keep the system-of-record in zookeeper instead.

I don't think "system of record" is a use case suitable for ZooKeeper, and I believe this to be a common understanding. It's certainly a frequent conclusion in system design discussions of which I have been a part. That is not a knock on ZooKeeper. It is rock solid as a coordination and consensus service.

bq. Best to have Phoenix-level APIs instead that can guarantee that the system catalog is kept in a valid state with commits being performed transactionally.

Sure, "Does not depend on the Phoenix client" is rephrased alternatively and hopefully better as "Is Phoenix code using blessed repair mechanisms that do not depend on the normal client code paths"

I don't think we can depend on transactional functionality to always be in a workable state, if you are referring to the 4.8+ transactional functionality that requires Tephra and its metadata to be in working order. 

> System table integrity check and repair tool
> --------------------------------------------
>
>                 Key: PHOENIX-3165
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3165
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>            Priority: Critical
>
> When the Phoenix system tables become corrupt recovery is a painstaking process of low level examination of table contents and manipulation of same with the HBase shell. This is very difficult work providing no margin of safety, and is a critical gap in terms of usability.
> At the OS level, we have fsck.
> At the HDFS level, we have fsck (integrity checking only, though)
> At the HBase level, we have hbck. 
> At the Phoenix level, we lack a system table repair tool. 
> Implement a tool that:
> - Does not depend on the Phoenix client.
> - Supports integrity checking of SYSTEM tables. Check for the existence of all required columns in entries. Check that entries exist for all Phoenix managed tables (implies Phoenix should add supporting advisory-only metadata to the HBase table schemas). Check that serializations are valid. 
> - Supports complete repair of SYSTEM.CATALOG and recreation, if necessary, of other tables like SYSTEM.STATS which can be dropped to recover from an emergency. We should be able to drop SYSTEM.CATALOG (or any other SYSTEM table), run the tool, and have a completely correct recreation of SYSTEM.CATALOG available at the end of its execution.
> - To the extent we have or introduce cross-system-table invariants, check them and offer a repair or reconstruction option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)