You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Guozhang Wang (JIRA)" <ji...@apache.org> on 2014/09/05 00:28:25 UTC
[jira] [Updated] (KAFKA-747) RequestChannel re-design

     [ https://issues.apache.org/jira/browse/KAFKA-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Guozhang Wang updated KAFKA-747:
--------------------------------
    Fix Version/s:     (was: 0.8.2)
                   0.9.0

> RequestChannel re-design
> ------------------------
>
>                 Key: KAFKA-747
>                 URL: https://issues.apache.org/jira/browse/KAFKA-747
>             Project: Kafka
>          Issue Type: New Feature
>          Components: network
>            Reporter: Jay Kreps
>            Assignee: Neha Narkhede
>             Fix For: 0.9.0
>
>
> We have had some discussion around how to handle queuing requests. There are two competing concerns:
> 1. We need to maintain request order on a per-socket basis.
> 2. We want to be able to balance load flexibly over a pool of threads so that if one thread blocks on I/O request processing continues.
> Two Approaches We Have Considered
> 1. Have a global queue of unprocessed requests. All I/O threads read requests off this global queue and process them. To avoid re-ordering have the network layer only read one request at a time.
> 2. Have a queue per I/O thread and have the network threads statically map sockets to I/O thread request queues.
> Problems With These Approaches
> In the first case you are not able to get any per-producer parallelism. That is you can't read the next request while the current one is being handled. This seems like it would not be a big deal, but preliminary benchmarks show that it might be. 
> In the second case there are two problems. The first is that when an I/O thread gets blocked all request processing for sockets attached to that I/O thread will grind to a halt. If you have 10,000 connections, and  10 I/O threads, then each blockage will stop 1,000 producers. If there is one topic that has long synchronous flush times enabled (or is experiencing fsync locking) this will cause big latency blips for all producers using that I/O thread. The next problem is around backpressure and memory management. Say we use BlockingQueues to feed the I/O threads. And say that one I/O thread stalls. It's request queue will fill up and it will then block ALL network threads, since they will block on inserting into that queue, even though the other I/O threads are unused and have empty queues.
> A Proposed Better Solution
> The problem with the first solution is that we are not pipelining requests. The problem with the second approach is that we are too constrained in moving work from one I/O thread to another.
> Instead we should have a single request queue-like structure, but internally enforce the condition that requests are not re-ordered.
> Here are the details. We retain RequestChannel but refactor its internals. Internally we replace the blocking queue with a linked list. We also keep an in-flight-keys array with one entry per I/O thread. When removing a work item from the list we can't just take the first thing. Instead we need to walk the list and look for something with a request key not in the in-flight-keys array. When a response is sent, we remove that key from the in-flight array.
> This guarantees that requests for a socket with key K are ordered, but that processing for K can only block requests made by K.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)