You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Geoffrey Jacoby (Jira)" <ji...@apache.org> on 2020/01/29 18:52:00 UTC

[jira] [Created] (HBASE-23766) Support Point-In-Time Queries

Geoffrey Jacoby created HBASE-23766:
---------------------------------------

             Summary: Support Point-In-Time Queries
                 Key: HBASE-23766
                 URL: https://issues.apache.org/jira/browse/HBASE-23766
             Project: HBase
          Issue Type: New Feature
            Reporter: Geoffrey Jacoby
            Assignee: Geoffrey Jacoby


HBase currently offers a snapshot feature which allows operators to capture the state of a table at a point in time in a way that can be cloned or queried in the future. It's quite useful in some circumstances, but limited because it's a heavyweight operation, and because it requires prior knowledge of the time you want to capture. 

Phoenix currently offers a feature called "SCN", which uses the max timestamp on Scans to provide the illusion of a "lookback" query at a point in time. It's imperfect, however, because of HBase's filtering and cleanup logic for deletes, max versions and TTLs can prevent users from seeing certain Cells they would have been able to see at a previous point in time. Even PHOENIX-5645, and the equivalent HBASE-23602, which try to control major compaction cleanup, don't cover all edge cases completely. (For example, you can't see rows whose TTL has expired now but hadn't back then. Same with max versions.) 

There are useful non-Phoenix applications as well, such as a change stream that shows before/after images, as DynamoDB offers. 

Since full support will require new configuration options added not just to major compaction, but also to the read pipeline, I'm filing this as an umbrella JIRA so we can have smaller sub-tasks, rather than trying to cram everything into HBASE-23602. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)