You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Neha Narkhede (Created) (JIRA)" <ji...@apache.org> on 2012/02/06 23:00:59 UTC

[jira] [Created] (KAFKA-265) Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts

Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts
-------------------------------------------------------------------------------------------------------------

                 Key: KAFKA-265
                 URL: https://issues.apache.org/jira/browse/KAFKA-265
             Project: Kafka
          Issue Type: Sub-task
          Components: core
    Affects Versions: 0.7
            Reporter: Neha Narkhede
            Assignee: Jun Rao
             Fix For: 0.7.1


The correct fix for KAFKA-262 and other known issues with the current consumer rebalancing approach, is to get rid of the cache in the zookeeper consumer. 
The side-effect of that fix, though, is the large number of zookeeper notifications that will trigger a full rebalance operation on the consumer. 

Ideally, the zookeeper notifications can be batched and only one rebalance operation can be triggered for several such ZK notifications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-265) Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts

Posted by "Jun Rao (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-265:
--------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Thanks for the review. Just committed this.
                
> Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-265
>                 URL: https://issues.apache.org/jira/browse/KAFKA-265
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Jun Rao
>             Fix For: 0.7.1
>
>         Attachments: kafka-265.patch, kafka-265_v2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The correct fix for KAFKA-262 and other known issues with the current consumer rebalancing approach, is to get rid of the cache in the zookeeper consumer. 
> The side-effect of that fix, though, is the large number of zookeeper notifications that will trigger a full rebalance operation on the consumer. 
> Ideally, the zookeeper notifications can be batched and only one rebalance operation can be triggered for several such ZK notifications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-265) Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts

Posted by "Jun Rao (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-265:
--------------------------

    Attachment: kafka-265_v2.patch

That's a good idea. Attach patch v2.
                
> Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-265
>                 URL: https://issues.apache.org/jira/browse/KAFKA-265
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Jun Rao
>             Fix For: 0.7.1
>
>         Attachments: kafka-265.patch, kafka-265_v2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The correct fix for KAFKA-262 and other known issues with the current consumer rebalancing approach, is to get rid of the cache in the zookeeper consumer. 
> The side-effect of that fix, though, is the large number of zookeeper notifications that will trigger a full rebalance operation on the consumer. 
> Ideally, the zookeeper notifications can be batched and only one rebalance operation can be triggered for several such ZK notifications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-265) Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts

Posted by "Neha Narkhede (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209846#comment-13209846 ] 

Neha Narkhede commented on KAFKA-265:
-------------------------------------

+1. Doesn't the test in system_test/broker_failure catch this ?
                
> Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-265
>                 URL: https://issues.apache.org/jira/browse/KAFKA-265
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Jun Rao
>             Fix For: 0.7.1
>
>         Attachments: kafka-265.patch, kafka-265_await_fix.patch, kafka-265_shutdown.patch, kafka-265_v2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The correct fix for KAFKA-262 and other known issues with the current consumer rebalancing approach, is to get rid of the cache in the zookeeper consumer. 
> The side-effect of that fix, though, is the large number of zookeeper notifications that will trigger a full rebalance operation on the consumer. 
> Ideally, the zookeeper notifications can be batched and only one rebalance operation can be triggered for several such ZK notifications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-265) Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts

Posted by "Jun Rao (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-265:
--------------------------

    Attachment: kafka-265_await_fix.patch

Found a bug. Condition should use await/notify, instead of wait/notify. Attach a patch.
                
> Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-265
>                 URL: https://issues.apache.org/jira/browse/KAFKA-265
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Jun Rao
>             Fix For: 0.7.1
>
>         Attachments: kafka-265.patch, kafka-265_await_fix.patch, kafka-265_v2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The correct fix for KAFKA-262 and other known issues with the current consumer rebalancing approach, is to get rid of the cache in the zookeeper consumer. 
> The side-effect of that fix, though, is the large number of zookeeper notifications that will trigger a full rebalance operation on the consumer. 
> Ideally, the zookeeper notifications can be batched and only one rebalance operation can be triggered for several such ZK notifications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-265) Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts

Posted by "Neha Narkhede (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204981#comment-13204981 ] 

Neha Narkhede commented on KAFKA-265:
-------------------------------------

If the queue is full, the ZKclient listener can hang temporarily. This is not ideal, since ZKClient will not be able to deliver more events until a rebalance operation is completed and the queue is cleared. In practice, this might not be a big issue, but can be easily avoided.

I think there is an alternative solution to this problem, one that will 

1. avoid maintaining this queue 
2. reduce memory consumption in the consumer
3. avoid adding another config option 

How about using just using a boolean variable that will indicate at least one rebalancing operation request ? The watcher thread can use a Condition to wait if the boolean variable is false. The ZK listener can merely set the boolean to true and signal the Condition, so that the watcher thread can proceed with a rebalancing operation.
                
> Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-265
>                 URL: https://issues.apache.org/jira/browse/KAFKA-265
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Jun Rao
>             Fix For: 0.7.1
>
>         Attachments: kafka-265.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The correct fix for KAFKA-262 and other known issues with the current consumer rebalancing approach, is to get rid of the cache in the zookeeper consumer. 
> The side-effect of that fix, though, is the large number of zookeeper notifications that will trigger a full rebalance operation on the consumer. 
> Ideally, the zookeeper notifications can be batched and only one rebalance operation can be triggered for several such ZK notifications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-265) Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts

Posted by "Jun Rao (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-265:
--------------------------

    Status: Patch Available  (was: Open)

patch attached.
                
> Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-265
>                 URL: https://issues.apache.org/jira/browse/KAFKA-265
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Jun Rao
>             Fix For: 0.7.1
>
>         Attachments: kafka-265.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The correct fix for KAFKA-262 and other known issues with the current consumer rebalancing approach, is to get rid of the cache in the zookeeper consumer. 
> The side-effect of that fix, though, is the large number of zookeeper notifications that will trigger a full rebalance operation on the consumer. 
> Ideally, the zookeeper notifications can be batched and only one rebalance operation can be triggered for several such ZK notifications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-265) Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts

Posted by "Neha Narkhede (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207200#comment-13207200 ] 

Neha Narkhede commented on KAFKA-265:
-------------------------------------

+1 for v2. Thanks for accomodating the change
                
> Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-265
>                 URL: https://issues.apache.org/jira/browse/KAFKA-265
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Jun Rao
>             Fix For: 0.7.1
>
>         Attachments: kafka-265.patch, kafka-265_v2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The correct fix for KAFKA-262 and other known issues with the current consumer rebalancing approach, is to get rid of the cache in the zookeeper consumer. 
> The side-effect of that fix, though, is the large number of zookeeper notifications that will trigger a full rebalance operation on the consumer. 
> Ideally, the zookeeper notifications can be batched and only one rebalance operation can be triggered for several such ZK notifications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-265) Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts

Posted by "Jun Rao (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-265:
--------------------------

    Attachment: kafka-265_shutdown.patch

Found a new problem. The watch executor thread doesn't shutdown properly. Attached another patch.
                
> Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-265
>                 URL: https://issues.apache.org/jira/browse/KAFKA-265
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Jun Rao
>             Fix For: 0.7.1
>
>         Attachments: kafka-265.patch, kafka-265_await_fix.patch, kafka-265_shutdown.patch, kafka-265_v2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The correct fix for KAFKA-262 and other known issues with the current consumer rebalancing approach, is to get rid of the cache in the zookeeper consumer. 
> The side-effect of that fix, though, is the large number of zookeeper notifications that will trigger a full rebalance operation on the consumer. 
> Ideally, the zookeeper notifications can be batched and only one rebalance operation can be triggered for several such ZK notifications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-265) Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts

Posted by "Jun Rao (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209992#comment-13209992 ] 

Jun Rao commented on KAFKA-265:
-------------------------------

Committed the fix for both Condition and shutdown to trunk.
                
> Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-265
>                 URL: https://issues.apache.org/jira/browse/KAFKA-265
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Jun Rao
>             Fix For: 0.7.1
>
>         Attachments: kafka-265.patch, kafka-265_await_fix.patch, kafka-265_shutdown.patch, kafka-265_v2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The correct fix for KAFKA-262 and other known issues with the current consumer rebalancing approach, is to get rid of the cache in the zookeeper consumer. 
> The side-effect of that fix, though, is the large number of zookeeper notifications that will trigger a full rebalance operation on the consumer. 
> Ideally, the zookeeper notifications can be batched and only one rebalance operation can be triggered for several such ZK notifications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-265) Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts

Posted by "Jun Rao (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated KAFKA-265:
--------------------------

    Attachment: kafka-265.patch
    
> Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-265
>                 URL: https://issues.apache.org/jira/browse/KAFKA-265
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Jun Rao
>             Fix For: 0.7.1
>
>         Attachments: kafka-265.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The correct fix for KAFKA-262 and other known issues with the current consumer rebalancing approach, is to get rid of the cache in the zookeeper consumer. 
> The side-effect of that fix, though, is the large number of zookeeper notifications that will trigger a full rebalance operation on the consumer. 
> Ideally, the zookeeper notifications can be batched and only one rebalance operation can be triggered for several such ZK notifications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-265) Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts

Posted by "Neha Narkhede (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208848#comment-13208848 ] 

Neha Narkhede commented on KAFKA-265:
-------------------------------------

Subtle, but critical :-) 
+1
                
> Add a queue of zookeeper notifications in the zookeeper consumer to reduce the number of rebalancing attempts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-265
>                 URL: https://issues.apache.org/jira/browse/KAFKA-265
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Jun Rao
>             Fix For: 0.7.1
>
>         Attachments: kafka-265.patch, kafka-265_await_fix.patch, kafka-265_v2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The correct fix for KAFKA-262 and other known issues with the current consumer rebalancing approach, is to get rid of the cache in the zookeeper consumer. 
> The side-effect of that fix, though, is the large number of zookeeper notifications that will trigger a full rebalance operation on the consumer. 
> Ideally, the zookeeper notifications can be batched and only one rebalance operation can be triggered for several such ZK notifications. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira