You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Neha Narkhede (JIRA)" <ji...@apache.org> on 2012/11/18 02:04:12 UTC

[jira] [Comment Edited] (KAFKA-612) move shutting down of fetcher thread out of critical path

    [ https://issues.apache.org/jira/browse/KAFKA-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499572#comment-13499572 ] 

Neha Narkhede edited comment on KAFKA-612 at 11/18/12 1:03 AM:
---------------------------------------------------------------

Overall, the fix looks good, here are a few review comments -

1. AbstractFetcherThread
1.1 I'm trying to understand what can cause a particular topic partition from the response object to have no offset in the fetchMap. Right now, we ignore this case altogether. I'm guessing if the leader changes after the broker sends the fetch response, this could happen, which is ok. Wondering if this is what was intended or if we should log and handle the error case ?
1.2 hasPartition and partCount could use a refactor. You can return a value from a try catch block, the finally block will still execute and unlock the fetch map lock.

2. ReplicaManager
The right way to measure leader election latency is at the controller by measuring the time after it sent a leader/isr request to the time it received the corresponding ACK fro
m the broker. Given that, we should probably move the following lines at the end of handleLeaderAndIsrRequest in KafkaApis.
    info("Completed leader and isr request %s".format(leaderAndISRRequest))
    replicaFetcherManager.shutdownEmptyFetcherThread()
 
3. AbstractFetcherThread 
3.1 fetchMap is a confusing name, this map is keeping track of fetch offsets per topic partition. I haven't looked at this code in much detail before, but when I read it while 
reviewing this patch, I thought the names of the maps could be improved. For example, the map in AbstractFetcherManager is called fetcherThreadMap and the map in this class is 
called fetcherMap.
3.2 How about renaming shutdownEmptyFetcherThread to shutdownIdleFetcherThreads ?

                
      was (Author: nehanarkhede):
    Overall, the fix looks good, here are a few review comments -

1. AbstractFetcherThread
1.1 I'm trying to understand what can cause a particular topic partition from the response object to have no offset in the fetchMap. Right now, we ignore this case altogether. I'm guessing if the leader changes after the broker sends the fetch response, this could happen, which is ok. Wondering if this is what was intended or if we should log and handle the error case ?

1.2 hasPartition and partCount could use a refactor. You can return a value from a try catch block, the finally block will still execute and unlock the fetch map lock.
2. ReplicaManager

The right way to measure leader election latency is at the controller by measuring the time after it sent a leader/isr request to the time it received the corresponding ACK fro
m the broker. Given that, we should probably move the following lines at the end of handleLeaderAndIsrRequest in KafkaApis.

    info("Completed leader and isr request %s".format(leaderAndISRRequest))
    replicaFetcherManager.shutdownEmptyFetcherThread()
           3. AbstractFetcherThread 
3.1 fetchMap is a confusing name, this map is keeping track of fetch offsets per topic partition. I haven't looked at this code in much detail before, but when I read it while 
reviewing this patch, I thought the names of the maps could be improved. For example, the map in AbstractFetcherManager is called fetcherThreadMap and the map in this class is 
called fetcherMap.
3.2 How about renaming shutdownEmptyFetcherThread to shutdownIdleFetcherThreads ?

                  
> move shutting down of fetcher thread out of critical path
> ---------------------------------------------------------
>
>                 Key: KAFKA-612
>                 URL: https://issues.apache.org/jira/browse/KAFKA-612
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: kafka-612.patch
>
>
> Shutting down a fetch thread seems to take more than 200ms since we need to interrupt the thread. Currently, we shutdown fetcher threads while processing a leaderAndIsr request. This can delay some of the partitions to become a leader.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira