You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/03/05 14:30:00 UTC

[jira] [Commented] (FLINK-8862) Support HBase snapshot read

    [ https://issues.apache.org/jira/browse/FLINK-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386135#comment-16386135 ] 

ASF GitHub Bot commented on FLINK-8862:
---------------------------------------

GitHub user neoremind opened a pull request:

    https://github.com/apache/flink/pull/5639

    [FLINK-8862] [HBase] Support HBase snapshot read

    ## What is the purpose of the change
    
    *Flink-hbase connector only supports reading/scanning HBase over region server scanner, there is also [snapshot](http://hbase.apache.org/book.html#ops.snapshots) scanning solution, just like Hadoop provides 2 ways to scan HBase, one is [TableInputFormat](https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html), the other is [TableSnapshotInputFormat](https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.html), so it would be great if flink supports both solutions to ensure more wider usage scope and provide alternatives for users.*
    
    
    ## Brief change log
    
      - *Create `TableInputSplitStrategy` interface and its implementations as abstraction logic for `AbstractTableInputFormat`*
      - *Update `HBaseRowInputFormat` and `TableInputFormat`*
      - *Add `HBaseSnapshotRowInputFormat` and `TableSnapshotInputFormat`*
      - *Extract 2 interfaces including `HBaseTableScannerAware` and `ResultToTupleMapper`*
      - *Add `HBaseSnapshotReadExample`*
    
    
    ## Verifying this change
    
    This change is already covered by existing tests as follows, and new test cases has been added as well.
    
    `org.apache.flink.addons.hbase.HBaseConnectorITCase`
    
    This change added tests and can be verified as follows:
    
      - *Manually create one snapshot for a specific HBase table, and use TableSnapshotInputFormat to do full scan.*
      - *Running existing HBaseReadExample to do full scan.*
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
      - The S3 file system connector: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (**yes** / no)
      - If yes, how is the feature documented? (not applicable / **docs** / **JavaDocs** / not documented)
      - For document, please visit [JIRA ticket](https://issues.apache.org/jira/projects/FLINK/issues/FLINK-8862?filter=allopenissues), a detailed design doc and class diagram have been attached.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/neoremind/flink snapshot

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5639
    
----
commit 0b36b434f987a971b6463ce3441c483380cfa9dd
Author: neoremind <xu...@...>
Date:   2018-03-05T14:14:09Z

    Support HBase snapshot read

----


> Support HBase snapshot read
> ---------------------------
>
>                 Key: FLINK-8862
>                 URL: https://issues.apache.org/jira/browse/FLINK-8862
>             Project: Flink
>          Issue Type: Improvement
>          Components: Batch Connectors and Input/Output Formats
>    Affects Versions: 1.2.0
>            Reporter: Xu Zhang
>            Priority: Major
>         Attachments: FLINK-8862-Design-Class-Diagram.png, FLINK-8862-DesignDoc.pdf
>
>
> Flink-hbase connector only supports reading/scanning HBase over region server scanner, there is also snapshot scanning solution, just like Hadoop provides 2 ways to scan HBase, one is TableInputFormat, the other is TableSnapshotInputFormat, so it would be great if flink supports both solutions to ensure more wider usage scope and provide alternatives for users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)