You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Valentin (JIRA)" <ji...@apache.org> on 2014/09/29 00:49:34 UTC

[jira] [Commented] (KAFKA-1655) Allow high performance SimpleConsumer use cases to still work with new Kafka 0.9 consumer APIs

    [ https://issues.apache.org/jira/browse/KAFKA-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151249#comment-14151249 ] 

Valentin commented on KAFKA-1655:
---------------------------------

mailing list thread which led to the creation of this ticket:
{code}
On Sat, 27 Sep 2014 08:31:01 -0700, Jun Rao <ju...@gmail.com> wrote:
> Valentin,
> 
> That's a good point. We don't have this use case in mind when designing the
> new consumer api. A straightforward implementation could be removing the
> locally cached topic metadata for unsubscribed topics. It's probably
> possible to add a config value to avoid churns in caching the metadata.
> Could you file a jira so that we can track this?
> 
> Thanks,
> 
> Jun
> 
> On Thu, Sep 25, 2014 at 4:19 AM, Valentin <ka...@sblk.de> wrote:
> 
>>
>> Hi Jun, Hi Guozhang,
>>
>> hm, yeah, if the subscribe/unsubscribe is a smart and lightweight
>> operation this might work. But if it needs to do any additional calls to
>> fetch metadata during a subscribe/unsubscribe call, the overhead could
>> get
>> quite problematic. The main issue I still see here is that an additional
>> layer is added which does not really provide any benefit for a use case
>> like mine.
>> I.e. the leader discovery and connection handling you mention below don't
>> really offer value in this case, as for the connection pooling approach
>> suggested, I will have to discover and maintain leader metadata in my own
>> code anyway as well as handling connection pooling. So if I understand
>> the
>> current plans for the Kafka 0.9 consumer correctly, it just doesn't work
>> well for my use case. Sure, there are workarounds to make it work in my
>> scenario, but I doubt any of them would scale as well as my current
>> SimpleConsumer approach :|
>> Or am I missing something here?
>>
>> Greetings
>> Valentin
>>
>> On Wed, 24 Sep 2014 17:44:15 -0700, Jun Rao <ju...@gmail.com> wrote:
>> > Valentin,
>> >
>> > As Guozhang mentioned, to use the new consumer in the SimpleConsumer
>> way,
>> > you would subscribe to a set of topic partitions and the issue poll().
>> You
>> > can change subscriptions on every poll since it's cheap. The benefit
>> > you
>> > get is that it does things like leader discovery and maintaining
>> > connections to the leader automatically for you.
>> >
>> > In any case, we will leave the old consumer including the
>> > SimpleConsumer
>> > for sometime even after the new consumer is out.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Tue, Sep 23, 2014 at 12:23 PM, Valentin <ka...@sblk.de>
>> wrote:
>> >
>> >> Hi Jun,
>> >>
>> >> yes, that would theoretically be possible, but it does not scale at
>> all.
>> >>
>> >> I.e. in the current HTTP REST API use case, I have 5 connection pools
>> on
>> >> every tomcat server (as I have 5 brokers) and each connection pool
>> holds
>> >> upto 10 SimpleConsumer connections. So all in all I get a maximum of
>> >> 50
>> >> open connections per web application server. And with that I am able
>> >> to
>> >> handle most requests from HTTP consumers without having to open/close
>> >> any new connections to a broker host.
>> >>
>> >> If I would now do the same implementation with the new Kafka 0.9 high
>> >> level consumer, I would end up with >1000 connection pools (as I have
>> >> >1000 topic partitions) and each of these connection pools would
>> contain
>> >> a number of consumer connections. So all in all, I would end up with
>> >> thousands of connection objects per application server. Not really a
>> >> viable approach :|
>> >>
>> >> Currently I am wondering what the rationale is for deprecating the
>> >> SimpleConsumer API, if there are use cases which just work much better
>> >> using it.
>> >>
>> >> Greetings
>> >> Valentin
>> >>
>> >> On 23/09/14 18:16, Guozhang Wang wrote:
>> >> > Hello,
>> >> >
>> >> > For your use case, with the new consumer you can still create a new
>> >> > consumer instance for each  topic / partition, and remember the
>> mapping
>> >> of
>> >> > topic / partition => consumer. The upon receiving the http request
>> you
>> >> can
>> >> > then decide which consumer to use. Since the new consumer is single
>> >> > threaded, creating this many new consumers is roughly the same cost
>> >> > with
>> >> > the old simple consumer.
>> >> >
>> >> > Guozhang
>> >> >
>> >> > On Tue, Sep 23, 2014 at 2:32 AM, Valentin <ka...@sblk.de>
>> >> > wrote:
>> >> >
>> >> >>
>> >> >> Hi Jun,
>> >> >>
>> >> >> On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <ju...@gmail.com>
>> wrote:
>> >> >>> The new consumer api will also allow you to do what you want in a
>> >> >>> SimpleConsumer (e.g., subscribe to a static set of partitions,
>> >> >>> control
>> >> >>> initial offsets, etc), only more conveniently.
>> >> >>
>> >> >> Yeah, I have reviewed the available javadocs for the new Kafka 0.9
>> >> >> consumer APIs.
>> >> >> However, while they still allow me to do roughly what I want, I
>> >> >> fear
>> >> that
>> >> >> they will result in an overall much worse performing implementation
>> on
>> >> my
>> >> >> side.
>> >> >> The main problem I have in my scenario is that consumer requests
>> >> >> are
>> >> >> coming in via stateless HTTP requests (each request is standalone
>> and
>> >> >> specifies topics+partitions+offsets to read data from) and I need
>> >> >> to
>> >> find a
>> >> >> good way to do connection pooling to the Kafka backend for good
>> >> >> performance. The SimpleConsumer would allow me to do that, an
>> approach
>> >> with
>> >> >> the new Kafka 0.9 consumer API seems to have a lot more overhead.
>> >> >>
>> >> >> Basically, what I am looking for is a way to pool connections per
>> >> >> Kafka
>> >> >> broker host, independent of the topics/partitions/clients/..., so
>> each
>> >> >> Tomcat app server would keep N disjunctive connection pools, if I
>> >> >> have N
>> >> >> Kafka broker hosts.
>> >> >> I would then keep some central metadata which tells me which hosts
>> are
>> >> the
>> >> >> leaders for which topic+partition and for an incoming HTTP client
>> >> request
>> >> >> I'd just take a Kafka connection from the pool for that particular
>> >> broker
>> >> >> host, request the data and return the connection to the pool. This
>> >> >> means
>> >> >> that a Kafka broker host will get requests from lots of different
>> end
>> >> >> consumers via the same TCP connection (sequentially of course).
>> >> >>
>> >> >> With the new Kafka consumer API I would have to
>> subscribe/unsubscribe
>> >> from
>> >> >> topics every time I take a connection from the pool and as the
>> request
>> >> may
>> >> >> need go to a different broker host than the last one, that wouldn't
>> >> >> even
>> >> >> prevent all the connection/reconnection overhead. I guess I could
>> >> >> create
>> >> >> one dedicated connection pool per topic-partition, that way
>> >> >> connection/reconnection overhead should be minimized, but that way
>> I'd
>> >> end
>> >> >> up with hundreds of connection pools per app server, also not a
>> >> >> good
>> >> >> approach.
>> >> >> All in all, the planned design of the new consumer API just doesn't
>> >> >> seem
>> >> >> to fit my use case well. Which is why I am a bit anxious about the
>> >> >> SimpleConsumer API being deprecated.
>> >> >>
>> >> >> Or am I missing something here? Thanks!
>> >> >>
>> >> >> Greetings
>> >> >> Valentin
{code}

> Allow high performance SimpleConsumer use cases to still work with new Kafka 0.9 consumer APIs
> ----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1655
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1655
>             Project: Kafka
>          Issue Type: New Feature
>          Components: consumer
>    Affects Versions: 0.9.0
>            Reporter: Valentin
>            Assignee: Neha Narkhede
>
> Hi guys,
> currently Kafka allows consumers to either chose the low level or the high level API, depending on the specific requirements of the consumer implementation. However, I was told that the current low level API (SimpleConsumer) will be deprecated once the new Kafka 0.9 consumer APIs are available.
> In this case it would be good, if we can ensure that the new API does offer some ways to get similar performance for use cases which perfectly fit the old SimpleConsumer API approach.
> Example Use Case:
> A high throughput HTTP API wrapper for consumer requests which gets HTTP REST calls to retrieve data for a specific set of topic partitions and offsets.
> Here the SimpleConsumer is perfect because it allows connection pooling in the HTTP API web application with one pool per existing kafka broker and the web application can handle the required metadata managment to know which pool to fetch a connection for, for each used topic partition. This means connections to Kafka brokers can remain open/pooled and connection/reconnection and metadata overhead is minimized.
> To achieve something similar with the new Kafka 0.9 consumer APIs, it would be good if it could:
> - provide a lowlevel call to connect to a specific broker and to read data from a topic+partition+offset
> OR
> - ensure that subscribe/unsubscribe calls are very cheap and can run without requiring any network traffic. If I subscribe to a topic partition for which the same broker is the leader as the last topic partition which was in use for this consumer API connection, then the consumer API implementation should recognize this and should not do any disconnects/reconnects and just reuse the existing connection to that kafka broker.
> Or put differently, it should be possible to do external metadata handling in the consumer API client and the client should be able to pool consumer API connections effectively by having one pool per Kafka broker.
> Greetings
> Valentin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)