You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Joel Koshy (Created) (JIRA)" <ji...@apache.org> on 2012/04/13 23:51:17 UTC

[jira] [Created] (KAFKA-332) Mirroring should use multiple producers; add producer retries to DefaultEventHandler

Mirroring should use multiple producers; add producer retries to DefaultEventHandler
------------------------------------------------------------------------------------

                 Key: KAFKA-332
                 URL: https://issues.apache.org/jira/browse/KAFKA-332
             Project: Kafka
          Issue Type: Improvement
          Components: core
            Reporter: Joel Koshy
            Assignee: Joel Koshy
            Priority: Minor


I'm clubbing these two together as these are both important for mirroring.

(1) Multiple producers:

Shallow iteration (KAFKA-315) helps improve mirroring throughput when
messages are compressed. With shallow iteration, the mirror-maker's consumer
does not do deep iteration over compressed messages. However, when its
embedded producer sends these messages to the target cluster's brokers, the
receiving broker does deep iteration to validate the messages before
appending to the log.

In the current (pre- KAFKA-48) request handling mechanism, one producer
effectively translates to one server-side thread for handling produce
requests, so there is still a bottleneck due to decompression (due to
message validation) on the target broker.

One way to work around this is to use broker.list with multiple brokers
specified per broker. E.g.,
broker.list=0:localhost:9191,1:localhost:9191,2:localhost:9191,... which
effectively emulates multiple server-side threads. It would be better to
just add a num.producers option to the mirror-maker and instantiate that
many producers.

(2) Retries:

If the mirror-maker uses broker.list and one of the brokers is bounced for
any reason, messages can get lost. Message loss can be reduced/avoided if
the brokers are behind a VIP and if retries are supported. This option will
not work for the zk-based producer because the decision of which broker to
send to has already been made, so retries would go to the same (potentially
still down) broker. (With KAFKA-253 it would work for zk-based producers as
well, but that is only in 0.8).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-332) Mirroring should use multiple producers; add producer retries to DefaultEventHandler

Posted by "Jun Rao (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256026#comment-13256026 ] 

Jun Rao commented on KAFKA-332:
-------------------------------

Some comments:
1. DefaultEventHandler: 
1.1. It would be useful to see the retry # in trace log
1.2 We should capture all Throwable.

2. ProducerConfig: explain a bit more why num.retries is not appropriate for zk-based producer. Basically, during resend, we don't re-select brokers.

3. MirrorMaker: The usage of circularIterator is pretty fancy. Would it be simpler to just put all producers in an array and loop through it circularly? 
                
> Mirroring should use multiple producers; add producer retries to DefaultEventHandler
> ------------------------------------------------------------------------------------
>
>                 Key: KAFKA-332
>                 URL: https://issues.apache.org/jira/browse/KAFKA-332
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>            Priority: Minor
>         Attachments: KAFKA-332-v1.patch
>
>
> I'm clubbing these two together as these are both important for mirroring.
> (1) Multiple producers:
> Shallow iteration (KAFKA-315) helps improve mirroring throughput when
> messages are compressed. With shallow iteration, the mirror-maker's consumer
> does not do deep iteration over compressed messages. However, when its
> embedded producer sends these messages to the target cluster's brokers, the
> receiving broker does deep iteration to validate the messages before
> appending to the log.
> In the current (pre- KAFKA-48) request handling mechanism, one producer
> effectively translates to one server-side thread for handling produce
> requests, so there is still a bottleneck due to decompression (due to
> message validation) on the target broker.
> One way to work around this is to use broker.list with multiple brokers
> specified per broker. E.g.,
> broker.list=0:localhost:9191,1:localhost:9191,2:localhost:9191,... which
> effectively emulates multiple server-side threads. It would be better to
> just add a num.producers option to the mirror-maker and instantiate that
> many producers.
> (2) Retries:
> If the mirror-maker uses broker.list and one of the brokers is bounced for
> any reason, messages can get lost. Message loss can be reduced/avoided if
> the brokers are behind a VIP and if retries are supported. This option will
> not work for the zk-based producer because the decision of which broker to
> send to has already been made, so retries would go to the same (potentially
> still down) broker. (With KAFKA-253 it would work for zk-based producers as
> well, but that is only in 0.8).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-332) Mirroring should use multiple producers; add producer retries to DefaultEventHandler

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-332:
-----------------------------

    Attachment: KAFKA-332-v3.patch

tryToHandle catches all now.
                
> Mirroring should use multiple producers; add producer retries to DefaultEventHandler
> ------------------------------------------------------------------------------------
>
>                 Key: KAFKA-332
>                 URL: https://issues.apache.org/jira/browse/KAFKA-332
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>            Priority: Minor
>         Attachments: KAFKA-332-v1.patch, KAFKA-332-v2.patch, KAFKA-332-v3.patch
>
>
> I'm clubbing these two together as these are both important for mirroring.
> (1) Multiple producers:
> Shallow iteration (KAFKA-315) helps improve mirroring throughput when
> messages are compressed. With shallow iteration, the mirror-maker's consumer
> does not do deep iteration over compressed messages. However, when its
> embedded producer sends these messages to the target cluster's brokers, the
> receiving broker does deep iteration to validate the messages before
> appending to the log.
> In the current (pre- KAFKA-48) request handling mechanism, one producer
> effectively translates to one server-side thread for handling produce
> requests, so there is still a bottleneck due to decompression (due to
> message validation) on the target broker.
> One way to work around this is to use broker.list with multiple brokers
> specified per broker. E.g.,
> broker.list=0:localhost:9191,1:localhost:9191,2:localhost:9191,... which
> effectively emulates multiple server-side threads. It would be better to
> just add a num.producers option to the mirror-maker and instantiate that
> many producers.
> (2) Retries:
> If the mirror-maker uses broker.list and one of the brokers is bounced for
> any reason, messages can get lost. Message loss can be reduced/avoided if
> the brokers are behind a VIP and if retries are supported. This option will
> not work for the zk-based producer because the decision of which broker to
> send to has already been made, so retries would go to the same (potentially
> still down) broker. (With KAFKA-253 it would work for zk-based producers as
> well, but that is only in 0.8).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-332) Mirroring should use multiple producers; add producer retries to DefaultEventHandler

Posted by "Joel Koshy (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-332:
-----------------------------

    Attachment: KAFKA-332-v2.patch

Thanks for the review.

- Captured throwable
- Trace logging retry attempt
- Additional doc on num.retries

Circular iterator: I thought it would be a convenient pattern for round-robin selection and in fact makes the code simpler/clearer. If it is hard to read, we can just go with explicit modulo-based selection based on a counter - although IMO that is less clean.

                
> Mirroring should use multiple producers; add producer retries to DefaultEventHandler
> ------------------------------------------------------------------------------------
>
>                 Key: KAFKA-332
>                 URL: https://issues.apache.org/jira/browse/KAFKA-332
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>            Priority: Minor
>         Attachments: KAFKA-332-v1.patch, KAFKA-332-v2.patch
>
>
> I'm clubbing these two together as these are both important for mirroring.
> (1) Multiple producers:
> Shallow iteration (KAFKA-315) helps improve mirroring throughput when
> messages are compressed. With shallow iteration, the mirror-maker's consumer
> does not do deep iteration over compressed messages. However, when its
> embedded producer sends these messages to the target cluster's brokers, the
> receiving broker does deep iteration to validate the messages before
> appending to the log.
> In the current (pre- KAFKA-48) request handling mechanism, one producer
> effectively translates to one server-side thread for handling produce
> requests, so there is still a bottleneck due to decompression (due to
> message validation) on the target broker.
> One way to work around this is to use broker.list with multiple brokers
> specified per broker. E.g.,
> broker.list=0:localhost:9191,1:localhost:9191,2:localhost:9191,... which
> effectively emulates multiple server-side threads. It would be better to
> just add a num.producers option to the mirror-maker and instantiate that
> many producers.
> (2) Retries:
> If the mirror-maker uses broker.list and one of the brokers is bounced for
> any reason, messages can get lost. Message loss can be reduced/avoided if
> the brokers are behind a VIP and if retries are supported. This option will
> not work for the zk-based producer because the decision of which broker to
> send to has already been made, so retries would go to the same (potentially
> still down) broker. (With KAFKA-253 it would work for zk-based producers as
> well, but that is only in 0.8).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-332) Mirroring should use multiple producers; add producer retries to DefaultEventHandler

Posted by "Joel Koshy (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-332:
-----------------------------

    Attachment: KAFKA-332-v1.patch
    
> Mirroring should use multiple producers; add producer retries to DefaultEventHandler
> ------------------------------------------------------------------------------------
>
>                 Key: KAFKA-332
>                 URL: https://issues.apache.org/jira/browse/KAFKA-332
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>            Priority: Minor
>         Attachments: KAFKA-332-v1.patch
>
>
> I'm clubbing these two together as these are both important for mirroring.
> (1) Multiple producers:
> Shallow iteration (KAFKA-315) helps improve mirroring throughput when
> messages are compressed. With shallow iteration, the mirror-maker's consumer
> does not do deep iteration over compressed messages. However, when its
> embedded producer sends these messages to the target cluster's brokers, the
> receiving broker does deep iteration to validate the messages before
> appending to the log.
> In the current (pre- KAFKA-48) request handling mechanism, one producer
> effectively translates to one server-side thread for handling produce
> requests, so there is still a bottleneck due to decompression (due to
> message validation) on the target broker.
> One way to work around this is to use broker.list with multiple brokers
> specified per broker. E.g.,
> broker.list=0:localhost:9191,1:localhost:9191,2:localhost:9191,... which
> effectively emulates multiple server-side threads. It would be better to
> just add a num.producers option to the mirror-maker and instantiate that
> many producers.
> (2) Retries:
> If the mirror-maker uses broker.list and one of the brokers is bounced for
> any reason, messages can get lost. Message loss can be reduced/avoided if
> the brokers are behind a VIP and if retries are supported. This option will
> not work for the zk-based producer because the decision of which broker to
> send to has already been made, so retries would go to the same (potentially
> still down) broker. (With KAFKA-253 it would work for zk-based producers as
> well, but that is only in 0.8).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-332) Mirroring should use multiple producers; add producer retries to DefaultEventHandler

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258667#comment-13258667 ] 

Jun Rao commented on KAFKA-332:
-------------------------------

Patch v2 looks good. One more comment:
4. ProducerSendThread.tryToHandle() should catch Throwable too.

                
> Mirroring should use multiple producers; add producer retries to DefaultEventHandler
> ------------------------------------------------------------------------------------
>
>                 Key: KAFKA-332
>                 URL: https://issues.apache.org/jira/browse/KAFKA-332
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>            Priority: Minor
>         Attachments: KAFKA-332-v1.patch, KAFKA-332-v2.patch
>
>
> I'm clubbing these two together as these are both important for mirroring.
> (1) Multiple producers:
> Shallow iteration (KAFKA-315) helps improve mirroring throughput when
> messages are compressed. With shallow iteration, the mirror-maker's consumer
> does not do deep iteration over compressed messages. However, when its
> embedded producer sends these messages to the target cluster's brokers, the
> receiving broker does deep iteration to validate the messages before
> appending to the log.
> In the current (pre- KAFKA-48) request handling mechanism, one producer
> effectively translates to one server-side thread for handling produce
> requests, so there is still a bottleneck due to decompression (due to
> message validation) on the target broker.
> One way to work around this is to use broker.list with multiple brokers
> specified per broker. E.g.,
> broker.list=0:localhost:9191,1:localhost:9191,2:localhost:9191,... which
> effectively emulates multiple server-side threads. It would be better to
> just add a num.producers option to the mirror-maker and instantiate that
> many producers.
> (2) Retries:
> If the mirror-maker uses broker.list and one of the brokers is bounced for
> any reason, messages can get lost. Message loss can be reduced/avoided if
> the brokers are behind a VIP and if retries are supported. This option will
> not work for the zk-based producer because the decision of which broker to
> send to has already been made, so retries would go to the same (potentially
> still down) broker. (With KAFKA-253 it would work for zk-based producers as
> well, but that is only in 0.8).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-332) Mirroring should use multiple producers; add producer retries to DefaultEventHandler

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-332:
--------------------------

       Resolution: Fixed
    Fix Version/s: 0.7.1
           Status: Resolved  (was: Patch Available)

Thanks for the patch. Committed to trunk.
                
> Mirroring should use multiple producers; add producer retries to DefaultEventHandler
> ------------------------------------------------------------------------------------
>
>                 Key: KAFKA-332
>                 URL: https://issues.apache.org/jira/browse/KAFKA-332
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>            Priority: Minor
>             Fix For: 0.7.1
>
>         Attachments: KAFKA-332-v1.patch, KAFKA-332-v2.patch, KAFKA-332-v3.patch
>
>
> I'm clubbing these two together as these are both important for mirroring.
> (1) Multiple producers:
> Shallow iteration (KAFKA-315) helps improve mirroring throughput when
> messages are compressed. With shallow iteration, the mirror-maker's consumer
> does not do deep iteration over compressed messages. However, when its
> embedded producer sends these messages to the target cluster's brokers, the
> receiving broker does deep iteration to validate the messages before
> appending to the log.
> In the current (pre- KAFKA-48) request handling mechanism, one producer
> effectively translates to one server-side thread for handling produce
> requests, so there is still a bottleneck due to decompression (due to
> message validation) on the target broker.
> One way to work around this is to use broker.list with multiple brokers
> specified per broker. E.g.,
> broker.list=0:localhost:9191,1:localhost:9191,2:localhost:9191,... which
> effectively emulates multiple server-side threads. It would be better to
> just add a num.producers option to the mirror-maker and instantiate that
> many producers.
> (2) Retries:
> If the mirror-maker uses broker.list and one of the brokers is bounced for
> any reason, messages can get lost. Message loss can be reduced/avoided if
> the brokers are behind a VIP and if retries are supported. This option will
> not work for the zk-based producer because the decision of which broker to
> send to has already been made, so retries would go to the same (potentially
> still down) broker. (With KAFKA-253 it would work for zk-based producers as
> well, but that is only in 0.8).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-332) Mirroring should use multiple producers; add producer retries to DefaultEventHandler

Posted by "Joel Koshy (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-332:
-----------------------------

    Status: Patch Available  (was: Open)
    
> Mirroring should use multiple producers; add producer retries to DefaultEventHandler
> ------------------------------------------------------------------------------------
>
>                 Key: KAFKA-332
>                 URL: https://issues.apache.org/jira/browse/KAFKA-332
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Joel Koshy
>            Assignee: Joel Koshy
>            Priority: Minor
>         Attachments: KAFKA-332-v1.patch
>
>
> I'm clubbing these two together as these are both important for mirroring.
> (1) Multiple producers:
> Shallow iteration (KAFKA-315) helps improve mirroring throughput when
> messages are compressed. With shallow iteration, the mirror-maker's consumer
> does not do deep iteration over compressed messages. However, when its
> embedded producer sends these messages to the target cluster's brokers, the
> receiving broker does deep iteration to validate the messages before
> appending to the log.
> In the current (pre- KAFKA-48) request handling mechanism, one producer
> effectively translates to one server-side thread for handling produce
> requests, so there is still a bottleneck due to decompression (due to
> message validation) on the target broker.
> One way to work around this is to use broker.list with multiple brokers
> specified per broker. E.g.,
> broker.list=0:localhost:9191,1:localhost:9191,2:localhost:9191,... which
> effectively emulates multiple server-side threads. It would be better to
> just add a num.producers option to the mirror-maker and instantiate that
> many producers.
> (2) Retries:
> If the mirror-maker uses broker.list and one of the brokers is bounced for
> any reason, messages can get lost. Message loss can be reduced/avoided if
> the brokers are behind a VIP and if retries are supported. This option will
> not work for the zk-based producer because the decision of which broker to
> send to has already been made, so retries would go to the same (potentially
> still down) broker. (With KAFKA-253 it would work for zk-based producers as
> well, but that is only in 0.8).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira