You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jun Rao <ju...@gmail.com> on 2013/02/12 15:44:16 UTC

Clients and partition leaders

David,

The benefit of the strategy used in Solr is that it simplifies client
routing. The downside is potential additional RPC overhead and a bit more
logic in the server. Technically, you can achieve what Solr does in the
client layer too. You can run a proxy that runs the java version of Kafka
producer and exposes a restful api. Then, your non-java client can talk to
the proxy.

We do plan to support a restful api for the producer in the future. Doing
the Solr strategy needs more thinking since currently, not every broker
knows the leader of all partitions.

Thanks,

Jun

---------- Forwarded message ----------
From: David Arthur <mu...@gmail.com>
Date: Mon, Feb 11, 2013 at 7:45 AM
Subject: Clients and replica leaders
To: "dev@kafka.apache.org" <de...@kafka.apache.org>


In writing a client for 0.8, I now have to keep state of which
topic+partition is owned by what broker. This is inherently a pain to deal
with and has the downside that I must wait for an error before I am
notified about a change in the broker topology.

I would be nice if the clients didn't need to know so much about the
brokers. In Apache Solr, which actually has a similar partition+replication
strategy, each server (broker) can handle requests for any shard
(partition) in the cluster. If the current server happens to be the leader
then it will process the request; if not it will forward it to the correct
server, wait for a response, then forward the response back to the client.

Dumb clients will pay the extra cost of the additional hop, but do not need
to know anything about the brokers. Smart clients will work basically like
they would now with the added benefit of not getting an error when leader
changes.

Would a strategy like this work in 0.8? Do the brokers know about one
another?

-David

Re: Clients and partition leaders

Posted by Jay Kreps <ja...@gmail.com>.
This is a good idea. There are actually two ways to implement this:
1. A RESTFUL interface, as Jun mentions. This might make more sense
since if you don't mind the overhead of sending all the data twice
then you probably won't mind the overhead of HTTP.
2. Re-route misdirected requests in the brokers.

To effectively implement re-routing of requests requires a
non-blocking request router. Otherwise you end up blocking a thread
just waiting on the request. This might be okay (maybe just double the
number of threads) but isn't ideal. Our producer doesn't currently do
this kind of request pipelining, so a naive implementation that just
sent a produce request if the request wasn't local wouldn't quite do
it. The later strategy could be implemented much better when we have a
non-blocking producer.

-Jay


On Wed, Feb 13, 2013 at 7:08 AM, David Arthur <mu...@gmail.com> wrote:
> Thanks, Jun, this answers my questions.
>
> I wasn't necessarily thinking of an HTTP interface like Solr, but rather the
> way it routes requests to leaders. However, since brokers are not aware of
> all the partition leaders, then the Solr approach will not work.
>
> I actually worked a bit on a REST interface a while ago:
> https://github.com/mumrah/kafka/tree/rest/contrib/rest-proxy, once 0.8 is
> out I might pick it up and clean it up a bit.
>
> -David
>
>
>
> On 2/12/13 9:44 AM, Jun Rao wrote:
>>
>> David,
>>
>> The benefit of the strategy used in Solr is that it simplifies client
>> routing. The downside is potential additional RPC overhead and a bit more
>> logic in the server. Technically, you can achieve what Solr does in the
>> client layer too. You can run a proxy that runs the java version of Kafka
>> producer and exposes a restful api. Then, your non-java client can talk to
>> the proxy.
>>
>> We do plan to support a restful api for the producer in the future. Doing
>> the Solr strategy needs more thinking since currently, not every broker
>> knows the leader of all partitions.
>>
>> Thanks,
>>
>> Jun
>>
>> ---------- Forwarded message ----------
>> From: David Arthur <mu...@gmail.com>
>> Date: Mon, Feb 11, 2013 at 7:45 AM
>> Subject: Clients and replica leaders
>> To: "dev@kafka.apache.org" <de...@kafka.apache.org>
>>
>>
>> In writing a client for 0.8, I now have to keep state of which
>> topic+partition is owned by what broker. This is inherently a pain to deal
>> with and has the downside that I must wait for an error before I am
>> notified about a change in the broker topology.
>>
>> I would be nice if the clients didn't need to know so much about the
>> brokers. In Apache Solr, which actually has a similar
>> partition+replication
>> strategy, each server (broker) can handle requests for any shard
>> (partition) in the cluster. If the current server happens to be the leader
>> then it will process the request; if not it will forward it to the correct
>> server, wait for a response, then forward the response back to the client.
>>
>> Dumb clients will pay the extra cost of the additional hop, but do not
>> need
>> to know anything about the brokers. Smart clients will work basically like
>> they would now with the added benefit of not getting an error when leader
>> changes.
>>
>> Would a strategy like this work in 0.8? Do the brokers know about one
>> another?
>>
>> -David
>>
>

Re: Clients and partition leaders

Posted by David Arthur <mu...@gmail.com>.
Thanks, Jun, this answers my questions.

I wasn't necessarily thinking of an HTTP interface like Solr, but rather 
the way it routes requests to leaders. However, since brokers are not 
aware of all the partition leaders, then the Solr approach will not work.

I actually worked a bit on a REST interface a while ago: 
https://github.com/mumrah/kafka/tree/rest/contrib/rest-proxy, once 0.8 
is out I might pick it up and clean it up a bit.

-David


On 2/12/13 9:44 AM, Jun Rao wrote:
> David,
>
> The benefit of the strategy used in Solr is that it simplifies client
> routing. The downside is potential additional RPC overhead and a bit more
> logic in the server. Technically, you can achieve what Solr does in the
> client layer too. You can run a proxy that runs the java version of Kafka
> producer and exposes a restful api. Then, your non-java client can talk to
> the proxy.
>
> We do plan to support a restful api for the producer in the future. Doing
> the Solr strategy needs more thinking since currently, not every broker
> knows the leader of all partitions.
>
> Thanks,
>
> Jun
>
> ---------- Forwarded message ----------
> From: David Arthur <mu...@gmail.com>
> Date: Mon, Feb 11, 2013 at 7:45 AM
> Subject: Clients and replica leaders
> To: "dev@kafka.apache.org" <de...@kafka.apache.org>
>
>
> In writing a client for 0.8, I now have to keep state of which
> topic+partition is owned by what broker. This is inherently a pain to deal
> with and has the downside that I must wait for an error before I am
> notified about a change in the broker topology.
>
> I would be nice if the clients didn't need to know so much about the
> brokers. In Apache Solr, which actually has a similar partition+replication
> strategy, each server (broker) can handle requests for any shard
> (partition) in the cluster. If the current server happens to be the leader
> then it will process the request; if not it will forward it to the correct
> server, wait for a response, then forward the response back to the client.
>
> Dumb clients will pay the extra cost of the additional hop, but do not need
> to know anything about the brokers. Smart clients will work basically like
> they would now with the added benefit of not getting an error when leader
> changes.
>
> Would a strategy like this work in 0.8? Do the brokers know about one
> another?
>
> -David
>