You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/05/08 21:13:04 UTC

[jira] [Commented] (PHOENIX-3744) Support snapshot scanners for MR-based queries

    [ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001550#comment-16001550 ] 

ASF GitHub Bot commented on PHOENIX-3744:
-----------------------------------------

Github user JamesRTaylor commented on the issue:

    https://github.com/apache/phoenix/pull/239
  
    I don't think it's necessary to fully understand the functionality to do the refactoring I've mentioned, @akshita-malhotra. Here's how I'd recommend approaching it:
    
    * create a new interface solely for the purpose of only abstracting RegionCoprocessorEnvironment access called RegionContext. The interface would have at least two methods: getRegion and getConfiguration. We might need more if other methods are called in RegionCoprocessorEnvironment.
    * have two implementations of this interface: RegionCoprocessorContext and RegionShapshotContext. The constructor of RegionCoprocessorContext would take a RegionCoprocessorEnvironment as an argument, while the RegionShapshotContext would take a Region and Configuration.
    * do an across the board replace of RegionCoprocessorEnvironment with RegionContext. You can likely not do this for secondary index related code (org.apache.phoenix.hbase.index.Indexer and PhoenixTransactionalIndexer). You'll find out here if other methods are called from RegionCoprocessorEnvironment or ObserverContext (which can be dealt with in a variety of ways, for example by throwing an UnsupportedOperationException if need be in the snapshot implementation).
    * in the top level coprocessor methods that pass in RegionCoprocessorEnvironment (mostly abstract BaseScannerRegionObserver class), instantiate a RegionCoprocessorContext by passing in the RegionCoprocessorEnvironment. From this point onward, all access will go through the RegionContext interface.
    
    You could do this refactoring completely separate from the PHOENIX-3744 so that you don't mix the two. Then PHOENIX-3744 would have something like a RegionScannerFactory (your RegionObserverUtil) that gives you back a RegionScanner given a RegionContext and you'd create a RegionShapshotContext as the backing implementation in your snapshot reading code.
    
    
    
    



> Support snapshot scanners for MR-based queries
> ----------------------------------------------
>
>                 Key: PHOENIX-3744
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3744
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Akshita Malhotra
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses the region directly in HDFS. We should make sure that Phoenix can support that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes that will be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any data committed after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)