You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Ismaël Mejía (JIRA)" <ji...@apache.org> on 2018/05/02 20:52:00 UTC

[jira] [Commented] (BEAM-2955) Create a Cloud Bigtable HBase connector

    [ https://issues.apache.org/jira/browse/BEAM-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461613#comment-16461613 ] 

Ismaël Mejía commented on BEAM-2955:
------------------------------------

Hi, recent work is around implementing the new way to write IOs on Beam (SplittableDoFn). See BEAM-4020 and the PR (still to be finished). However the public API is not changing at all, it is only internals, just a different implementation for the expand method. So I think you can go ahead and start, changes are easily mergeable / rebaseable (if that world even exists).
Also take a look at the new restirction tracker ByteKeyRangeTracker this for sure will be useful when you guys decid eto move the Bigtable implementation to SDF (probably once Dataflow supports it and it will be able to do liquid sharding there too).

> Create a Cloud Bigtable HBase connector
> ---------------------------------------
>
>                 Key: BEAM-2955
>                 URL: https://issues.apache.org/jira/browse/BEAM-2955
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-gcp
>            Reporter: Solomon Duskis
>            Assignee: Solomon Duskis
>            Priority: Major
>
> The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a different repo for awhile. Recently, we did some reworking of the Cloud Bigtable client that would allow it to better coexist in the Beam ecosystem, and we also released a Beam connector in our repository that exposes HBase idioms rather than the Protobuf idioms of BigtableIO.  More information about the customer experience of the HBase connector can be found here: [https://cloud.google.com/bigtable/docs/dataflow-hbase].
> The Beam repo is a much better place to house a Cloud Bigtable HBase connector.  There are a couple of ways we can implement this new connector:
> # The CBT connector depends on artifacts in the io/hbase maven project.  We can create a new extend HBaseIO for the purposes of CBT.  We would have to add some features to HBaseIO to make that work (dynamic rebalancing, and a way for HBase and CBT's size estimation models to coexist)
> # The BigtableIO connector works well, and we can add an adapter layer on top of it.  I have a proof of concept of it here: [https://github.com/sduskis/cloud-bigtable-client/tree/add_beam/bigtable-dataflow-parent/bigtable-hbase-beam].
> # We can build a separate CBT HBase connector.
> I'm happy to do the work.  I would appreciate some guidance and discussion about the right approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)