You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Gwen Shapira <gs...@cloudera.com> on 2015/02/01 07:34:01 UTC

Re: Review Request 30063: Patch for KAFKA-1840

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30063/#review70523
-----------------------------------------------------------


I like the idea of a message handler in MirrorMaker, but I think we can do better. Let me know if you think I'm taking it far beyond your original scope... I can add it as a follow up jira.

1. I think we need to let users pass parameters to the handlers. We need a "configure" or "init" method in the handler, which MirrorMaker will call once with the right properties and the handler can use them for basic setup. For example, imagine a "regexp filter" handler - I get a regexp from the commandline and filter messages that don't match. My "init" method will set up the regexp so it will be available for all handle() calls.
2. I think the handle() method should take List<Record> as input, not just a Record. MirrorMaker will be able to consume until it fills a batch (or until we waited too long), "handle" a batch - which will be able to use Scala's Sequence operators - Filter, Map, etc, and then produce an entire list of records. This sounds more efficient to me. 

Another comment - I think the addition of "destination partition" is unrelated to this change and Jira?

- Gwen Shapira


On Jan. 31, 2015, 2:25 a.m., Jiangjie Qin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30063/
> -----------------------------------------------------------
> 
> (Updated Jan. 31, 2015, 2:25 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1840
>     https://issues.apache.org/jira/browse/KAFKA-1840
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> Merge branch 'trunk' of http://git-wip-us.apache.org/repos/asf/kafka into KAFKA-1840
> 
> 
> Addressed Joel's comments
> 
> 
> Allow message handler to specify partitions for produce
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/tools/MirrorMaker.scala 81ae205ef7b2050d0152f29f8da7dd91b17b8b00 
> 
> Diff: https://reviews.apache.org/r/30063/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jiangjie Qin
> 
>


Re: Review Request 30063: Patch for KAFKA-1840

Posted by Jiangjie Qin <be...@gmail.com>.

> On Feb. 1, 2015, 6:34 a.m., Gwen Shapira wrote:
> > I like the idea of a message handler in MirrorMaker, but I think we can do better. Let me know if you think I'm taking it far beyond your original scope... I can add it as a follow up jira.
> > 
> > 1. I think we need to let users pass parameters to the handlers. We need a "configure" or "init" method in the handler, which MirrorMaker will call once with the right properties and the handler can use them for basic setup. For example, imagine a "regexp filter" handler - I get a regexp from the commandline and filter messages that don't match. My "init" method will set up the regexp so it will be available for all handle() calls.
> > 2. I think the handle() method should take List<Record> as input, not just a Record. MirrorMaker will be able to consume until it fills a batch (or until we waited too long), "handle" a batch - which will be able to use Scala's Sequence operators - Filter, Map, etc, and then produce an entire list of records. This sounds more efficient to me. 
> > 
> > Another comment - I think the addition of "destination partition" is unrelated to this change and Jira?

I think it's a great idea to extend message handler to take commandline arguments. I'll add that to the this patch.
I'm not sure I understand why it will improve performance by passing in a list of record. Shouldn't the message handler decide whether it wants to buffer some records or not? Some worries about buffering messages outside message handler are:
1. It is usually very difficult to decide how many messages to buffer because it is likely depending on the business logic and could vary from time to time.
2. It might further increase the delay.

The addition of destination partition is actually mainly to serve partition to partition copy, which is something related to message handler. I feel it could be put into either KAFKA-1840 or KAFKA-1839. I put it here as it seems to be more related to the message handler because the handler has to assigne partition explicitly. 

Maybe as a follow up patch, I'm thinking that we can add a commandline option of --mirror-cluster so it does the following:
1. Automatically create same number of partitions in target cluster when new topic is created in source cluster. (Based on KAFKA-1839)
2. Produce the messages to the destination partition same as the source partition. (Follow-up patch for KAFKA-1840)


- Jiangjie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30063/#review70523
-----------------------------------------------------------


On Jan. 31, 2015, 2:25 a.m., Jiangjie Qin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30063/
> -----------------------------------------------------------
> 
> (Updated Jan. 31, 2015, 2:25 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1840
>     https://issues.apache.org/jira/browse/KAFKA-1840
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> Merge branch 'trunk' of http://git-wip-us.apache.org/repos/asf/kafka into KAFKA-1840
> 
> 
> Addressed Joel's comments
> 
> 
> Allow message handler to specify partitions for produce
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/tools/MirrorMaker.scala 81ae205ef7b2050d0152f29f8da7dd91b17b8b00 
> 
> Diff: https://reviews.apache.org/r/30063/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jiangjie Qin
> 
>