You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Leon Bein <le...@student.hpi.de> on 2020/12/04 15:08:51 UTC

Topic: CDC for Flink Source

Hi @all,

we are currently developing an HBase source for Flink with the new API 
(FLIP-27 
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface>).
For this, we try to implement change data capturing on HBase.

What are your opinions on which approach to pursue?
Options that we found so far include

  * using Coprocessors, e.g. RegionObserver
  * reading the WAL e.g. with ProtobufLogReader (which is marked as
    LimitedPrivate)
  * and using ReplicationEndpoint (which also seem to be more internal).

Best regards,
Leon Bein

Re: Topic: CDC for Flink Source

Posted by Bharath Vissapragada <bh...@apache.org>.
cc: dev@ bcc: user@

We are also building a database change stream from HBase/Phoenix data as
the data source (for another event bus implementation) and we hope to open
source it soon. We had similar discussions and ended up taking SEP
<https://www.ngdata.com/the-hbase-side-effect-processor-and-hbase-replication-monitoring/>
like
approach. For us it was critical to not have external service dependencies
or third party code running inside HBase code (which is the case if we take
a coprocessor / a custom ReplicationEndPoint approach)

Also building something like this on top of the replication framework has a
lot of benefits because it already handles stuff like checkpointing,
builtin backpressure handling (incase of downstream delays) and it has been
battle tested for a while. There are some edge cases with ordering
guarantees because of quirks in our replication design but we plan to fix
that too in the coming months.

On Fri, Dec 4, 2020 at 7:09 AM Leon Bein <le...@student.hpi.de> wrote:

> Hi @all,
>
> we are currently developing an HBase source for Flink with the new API
> (FLIP-27
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> >).
> For this, we try to implement change data capturing on HBase.
>
> What are your opinions on which approach to pursue?
> Options that we found so far include
>
>   * using Coprocessors, e.g. RegionObserver
>   * reading the WAL e.g. with ProtobufLogReader (which is marked as
>     LimitedPrivate)
>   * and using ReplicationEndpoint (which also seem to be more internal).
>
> Best regards,
> Leon Bein
>

Re: Topic: CDC for Flink Source

Posted by Bharath Vissapragada <bh...@apache.org>.
cc: dev@ bcc: user@

We are also building a database change stream from HBase/Phoenix data as
the data source (for another event bus implementation) and we hope to open
source it soon. We had similar discussions and ended up taking SEP
<https://www.ngdata.com/the-hbase-side-effect-processor-and-hbase-replication-monitoring/>
like
approach. For us it was critical to not have external service dependencies
or third party code running inside HBase code (which is the case if we take
a coprocessor / a custom ReplicationEndPoint approach)

Also building something like this on top of the replication framework has a
lot of benefits because it already handles stuff like checkpointing,
builtin backpressure handling (incase of downstream delays) and it has been
battle tested for a while. There are some edge cases with ordering
guarantees because of quirks in our replication design but we plan to fix
that too in the coming months.

On Fri, Dec 4, 2020 at 7:09 AM Leon Bein <le...@student.hpi.de> wrote:

> Hi @all,
>
> we are currently developing an HBase source for Flink with the new API
> (FLIP-27
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> >).
> For this, we try to implement change data capturing on HBase.
>
> What are your opinions on which approach to pursue?
> Options that we found so far include
>
>   * using Coprocessors, e.g. RegionObserver
>   * reading the WAL e.g. with ProtobufLogReader (which is marked as
>     LimitedPrivate)
>   * and using ReplicationEndpoint (which also seem to be more internal).
>
> Best regards,
> Leon Bein
>