You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Joel Koshy (JIRA)" <ji...@apache.org> on 2011/08/04 00:25:27 UTC
[jira] [Updated] (KAFKA-74) Kafka mirror (corp replica): auto-discovery of topics

     [ https://issues.apache.org/jira/browse/KAFKA-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-74:
----------------------------

    Attachment: svn_diff_1153618_1312409247

Patch for KAFKA-74 and KAFKA-75. It includes an enhanced embedded consumer system test that was used to verify the changes. The patch also addresses a small bug in the generation of consumer-id's which caused an occasional collision when instantiating multiple consumer connectors on the same host at the same time.

> Kafka mirror (corp replica): auto-discovery of topics
> -----------------------------------------------------
>
>                 Key: KAFKA-74
>                 URL: https://issues.apache.org/jira/browse/KAFKA-74
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joel Koshy
>         Attachments: svn_diff_1153618_1312409247
>
>
> The corp replica's kafka embedded consumer requires a whitelist of topics to be 
> specified in its configuration. This does not scale very well as more and more
> topics are added. Instead, it can keep track of the current topics in zookeeper.
> With this approach, there should be a blacklist configuration as well if the user
> wishes to omit designated topics in the replica.
> Furthermore, the "replica/replication" terms can become confusing when we start
> working on the replication feature. So, as part of this issue, we can address this
> ambiguity as well:
> Replication vs. Mirroring:
> Kafka's roadmap includes a "replication" feature 
> (https://issues.apache.org/jira/browse/KAFKA-50) that will improve its
> durability and availability guarantees. In the past, we have also used the
> term "replication" to describe the process of building a replica of a Kafka
> cluster. This is done by providing a consumer configuration when starting up
> a kafka server. The configuration should contain a parameter
> (embeddedconsumer.topics) which is a whitelist of topics that the user
> wishes to replicate. The kafka server then instantiates an embedded consumer
> to fetch the corresponding logs from the source cluster.  The messages that
> the embedded consumer consumes are written to local kafka logs.
> In order to avoid any confusion between the two features going forward, I
> think it will be good to make a clearer distinction. We can call the former
> feature "replication", and the latter feature (i.e., building a replica)
> "mirroring". So, if the user provides an (embedded) consumer configuration
> to the Kafka server, then it will implicitly run as a "mirror". We can also
> improve the clarity of the related config parameters as described below.
> Config change - Default topic whitelists for mirroring:
> The embedded consumer's whitelist is currently specified as part of
> ConsumerConfig. E.g.,embeddedconsumer.topics=topic1:3,topic2:1. However, the
> common case is to mirror all topics. Therefore, it may be more convenient to
> discover topics through the source cluster's ZooKeeper, mirror all topics by
> default and provide a new blacklist configuration option. If you wish to
> mirror only a few topics, the whitelist option is still available.
> At most one of the following options can be present in the embedded
> consumer's configuration. If neither option is present, all topics will be
> mirrored.
> mirror.topics.blacklist: (topics to skip for mirroring)
> mirror.topics.whitelist: (alias for embeddedconsumer.topics, which can
> eventually be deprecated)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira