You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Joel Koshy (Updated) (JIRA)" <ji...@apache.org> on 2012/02/23 20:34:49 UTC

[jira] [Updated] (KAFKA-249) Separate out Kafka mirroring into a stand-alone app

     [ https://issues.apache.org/jira/browse/KAFKA-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-249:
-----------------------------

    Attachment: KAFKA-249.v1.patch

Overview of changes:

- New abstract consumer-agent that embeds a topic event watcher to allow
  topic discovery (similar to what the embedded consumer does). It provides
  a processMessage hook that concrete implementations can specify.
- New stand-alone mirror-maker tool that extends from consumer-agent.
- Console consumer now also extends from consumer-agent, and supports
  wildcarding in the topic list.
- New mirror-maker system test that is similar to the embedded consumer
  system test, but tests with multiple source clusters. When/if we deprecate
  the embedded consumer we can delete its system test.

Some comments:

- Shutdown logic is somewhat tricky - and some of it was driven by
  requirements from console consumer. Let me know if you have suggestions on
  restructuring to make it simpler. The afterStoppingWorkerThread call is
  before the workersStoppedLatch.countDown so the concrete implementation
  should not block there forever. Or we could just move it after the
  countDown, or remove that hook altogether.
- I thought it would be best to continue to have the embedded consumer and
  deprecate it later. So I also kept the "mirror" prefix in the
  whitelist/blacklist consumer config options.
- The mirror-maker tool right now only uses one producer - which is "ok"
  with hacking around the broker.list config property to use multiple
  producer send threads underneath. However, we probably want multiple
  producers. So we can keep this under review until after Kafka-253 is
  merged into 0.7/trunk as that will affect this piece of code.
- Wildcarding is standard Java regex. However, for convenience a
  comma-separated list of topics is allowed, in which case ',' is replaced
  by '|'. In doing so, I have assumed that commas were never allowed as part
  of topic names.
- If the whitelist is non-trivial (where triviality is defined by having
  only alpha-numeric characters and '|') then a topic event watcher is
  created. Otherwise, no watcher is created. So for example, "whitetopic.*"
  would require a topic watcher, but "whitetopic01,whitetopic02" would not.
- I would have preferred making consumer agent a trait, but that would limit
  extending it in Java as it contains implementation.
- I needed to synchronize console consumer's processMessage to implement
  support for the maxMessages option. One way to avoid this complexity is to
  add maxMessages option to ConsumerConfig. If it is set, then we count
  numMessages as an atomic long in ConsumerAgent's loop and add (numMessages
  < maxMessages) to the Stream.continually condition.
- There is a consumer shutdown bug (KAFKA-282). It is a simple fix but I
  think it is better to keep that in a separate jira.

So how about we keep this pending until the producer refactoring has been
well-tested and makes it to trunk? In the interim I can also prepare changes
to update the Mirroring-howto and add a console consumer how-to as well.

(Also, thanks to Neha and Jun for helping with debugging this.)

                
> Separate out Kafka mirroring into a stand-alone app
> ---------------------------------------------------
>
>                 Key: KAFKA-249
>                 URL: https://issues.apache.org/jira/browse/KAFKA-249
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>             Fix For: 0.7.1
>
>         Attachments: KAFKA-249.v1.patch
>
>
> I would like to discuss on this jira, the feasibility/benefits of separating
> out Kafka's mirroring feature from the broker into a stand-alone app, as it
> currently has a couple of limitations and issues.
> For example, we recently had to deal with Kafka mirrors that were in fact
> idle due to the fact that mirror threads were not created at start-up due to
> a rebalancing exception, but the Kafka broker itself did not shutdown. This
> has since been fixed, but is indicative of (avoidable) problems in embedding
> non-broker specific features in the broker.
> Logically, it seems to make sense to separate it out to achieve better
> division of labor.  Furthermore, enhancements to mirroring may be less
> clunky to implement and use with a stand-alone app.  For example to support
> custom partitioning on the target cluster, or to mirror from multiple
> clusters we would probably need to be able to pass in multiple embedded
> consumer/embedded producer configs, which would be less ugly if the
> mirroring process were a stand-alone app.  Also, if we break it out, it
> would be convenient to use as a "consumption engine" for the console
> consumer which will make it easier to add on features such as wildcards in
> topic consumption, since it contains a ZooKeeper topic discovery component.
> Any suggestions and/or objections to this?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira