You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Ewen Cheslack-Postava (JIRA)" <ji...@apache.org> on 2015/12/02 09:01:11 UTC

[jira] [Commented] (KAFKA-2914) Kafka Connect Source connector for HBase

    [ https://issues.apache.org/jira/browse/KAFKA-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035439#comment-15035439 ] 

Ewen Cheslack-Postava commented on KAFKA-2914:
----------------------------------------------

[~nielsbasjes] Agreed that an HBase source connector would be great, and thanks for the pointer on how other projects grab the WAL. I think something like this is probably the right way to hook into HBase since it gives you the complete picture and probably gives the most flexibility wrt how to translate the WAL into messages in Kafka.

The plan was to keep the connector development federated, which means connectors like this would generally be maintained outside Kafka's source tree. This is partly just a practical decision, since pulling in a large variety of connectors would drastically complicate Kafka, its packaging, and its release process. But it also has nice side effects like decoupling connector release schedules from Kafka's, such that connectors can iterate more quickly than Kafka itself.

We have one very simple set of connectors implemented in Kafka for demonstration purposes, and while we do have KAFKA-2375 filed for an elasticsearch connector, we really only used it as a possible example to include in Kafka itself since it would be a more realistic example that doesn't have any extra dependencies.

I think adding an HBase connector would be hugely valuable, but should probably be done outside Kafka. I'll circle back soon with a template repository that can be used to bootstrap new connectors. This would be a good starting point for an HBase connector.

> Kafka Connect Source connector for HBase 
> -----------------------------------------
>
>                 Key: KAFKA-2914
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2914
>             Project: Kafka
>          Issue Type: New Feature
>          Components: copycat
>            Reporter: Niels Basjes
>            Assignee: Ewen Cheslack-Postava
>
> In many cases I see HBase being used to persist data.
> I would like to listen to the changes and process them in a streaming system (like Apache Flink).
> Feature request: A Kafka Connect "Source" that listens to the changes in a specified HBase table. These changes are then stored in a 'standardized' form in Kafka so that it becomes possible to process the observed changes in near-realtime. I expect this 'standard' to be very HBase specific.
> Implementation suggestion: Perhaps listening to the HBase WAL like the "HBase Side Effects Processor" does?
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)