You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2017/03/21 22:45:41 UTC

[jira] [Commented] (PHOENIX-3744) Support snapshot scanners

    [ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935491#comment-15935491 ] 

James Taylor commented on PHOENIX-3744:
---------------------------------------

Key questions:
- Do we handle the case in which there's unflushed changes in the memstore for the table being read? Maybe we don't care - at most you'd see 1 hour old data.
- How do we indicate to Phoenix that it should do a snapshot read as opposed to it's regular read path? Maybe the MR read path (used by Pig and Spark) always (or optionally) does a snapshot read?
- What about our Phoenix coprocessors (which I believe will be bypassed if we use HDFS snapshot reads)? If snapshot reads are only for the MR path, we might not need them. If we do need them, we can probably wrap the scanner as needed, but there might be some refactoring required.


> Support snapshot scanners
> -------------------------
>
>                 Key: PHOENIX-3744
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3744
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses the region directly in HDFS. We should make sure that Phoenix can support that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes that will be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any data committed after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)