You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Valentin <ka...@sblk.de> on 2014/09/22 17:10:18 UTC

Questions about Kafka 0.9 API changes

Hello,

I am currently working on a Kafka implementation and have a couple of
questions concerning the road map for the future.
As I am unsure where to put such questions, I decided to try my luck on
this mailing list. If this is the wrong place for such inquiries, I
apologize. In this case it would be great if someone could offer some
pointers as to where to find/get these answers.

So, here I go :)

1) Consumer Redesign in Kafka 0.9
I found a number of documents explaining planned changes to the consumer
APIs for Kafka version 0.9. However, these documents are only mentioning
the high level consumer implementations. Does anyone know if the
kafka.javaapi.consumer.SimpleConsumer API/implementation will also change
with 0.9? Or will that stay more or less as it is now?

2) Pooling of Kafka Connections - SimpleConsumer
As I have a use case where the connection between the final consumers and
Kafka needs to happen via HTTP, I am concerned about performance
implications of the required HTTP wrapping. I am planning to implement a
custom HTTP API for Kafka producers and consumers which will be stateless
and where offset tracking will be done on the final consumer side. Now the
question here would be whether anyone has made experiences with pooling
connections to Kafka brokers in order to reuse them effectively for
incoming, stateless HTTP REST calls. An idea here would be to have one
connection pool per broker host and to keep a set of open
consumers/connections for each broker in those pools. Once I know which
broker is the leader for a requested topic partition for a REST call, I
could then use an already existing consumer/connection from that pool for
the processing of that REST call and then return it to the pool. So I'd be
able to have completely stateless REST call handling without having to
open/close Kafka connections all the time.

3) Pooling of Kafka Connections - KafkaConsumer (Kafka 0.9)
Now let's assume I want to implement the idea from 2) but with the high
level KafkaConsumer (to leave identifications of partition leaders and
error handling to it). Are already any implementation details known/decided
on how the subscribe, unsubscribe and seek methods will work internally?
Would I be able to somehow reuse connected KafkaConsumer objects in
connection pools? Could I for example call subscribe/unsubscribe/seek for
each HTTP request on a consumer to switch topics/partitions to the
currently needed set or would this be a very expensive operation (i.e.
because it would fetch metadata from Kafka to identify the leader for each
partition)?

Greetings
Valentin

Re: Questions about Kafka 0.9 API changes

Posted by Valentin <ka...@sblk.de>.

Hi Guozhang,

On Mon, 22 Sep 2014 10:08:58 -0700, Guozhang Wang <wa...@gmail.com>
wrote:
> 1) The new consumer clients will be developed under a new directory. The
> old consumer, including the SimpleConsumer will not be changed, though
it
> will be retired in the 0.9 release.

So that means that the SimpleConsumer API will be deprecated with Kafka
0.9?
Will it never the less be available in future versions of Kafka or should
I avoid using it for new developments?
I am a bit concerned here as I fear that the new 0.9 Kafka consumer does
not fit my requirements as well as the current SimpleConsumer.

Greetings
Valentin


> 2) I am not very familiar with HTTP wrapper on the clients, could
someone
> who have done so comment here?
> 
> 3) The new KafkaConsumer has not been fully implemented, but you can
take a
> look at its JavaDoc for examples of using the new client.
> 
> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/
> 
> Guozhang
> 
> 
> On Mon, Sep 22, 2014 at 8:10 AM, Valentin <ka...@sblk.de> wrote:
> 
>>
>> Hello,
>>
>> I am currently working on a Kafka implementation and have a couple of
>> questions concerning the road map for the future.
>> As I am unsure where to put such questions, I decided to try my luck on
>> this mailing list. If this is the wrong place for such inquiries, I
>> apologize. In this case it would be great if someone could offer some
>> pointers as to where to find/get these answers.
>>
>> So, here I go :)
>>
>> 1) Consumer Redesign in Kafka 0.9
>> I found a number of documents explaining planned changes to the
consumer
>> APIs for Kafka version 0.9. However, these documents are only
mentioning
>> the high level consumer implementations. Does anyone know if the
>> kafka.javaapi.consumer.SimpleConsumer API/implementation will also
change
>> with 0.9? Or will that stay more or less as it is now?
>>
>> 2) Pooling of Kafka Connections - SimpleConsumer
>> As I have a use case where the connection between the final consumers
and
>> Kafka needs to happen via HTTP, I am concerned about performance
>> implications of the required HTTP wrapping. I am planning to implement
a
>> custom HTTP API for Kafka producers and consumers which will be
stateless
>> and where offset tracking will be done on the final consumer side. Now
>> the
>> question here would be whether anyone has made experiences with pooling
>> connections to Kafka brokers in order to reuse them effectively for
>> incoming, stateless HTTP REST calls. An idea here would be to have one
>> connection pool per broker host and to keep a set of open
>> consumers/connections for each broker in those pools. Once I know which
>> broker is the leader for a requested topic partition for a REST call, I
>> could then use an already existing consumer/connection from that pool
for
>> the processing of that REST call and then return it to the pool. So I'd
>> be
>> able to have completely stateless REST call handling without having to
>> open/close Kafka connections all the time.
>>
>> 3) Pooling of Kafka Connections - KafkaConsumer (Kafka 0.9)
>> Now let's assume I want to implement the idea from 2) but with the high
>> level KafkaConsumer (to leave identifications of partition leaders and
>> error handling to it). Are already any implementation details
>> known/decided
>> on how the subscribe, unsubscribe and seek methods will work
internally?
>> Would I be able to somehow reuse connected KafkaConsumer objects in
>> connection pools? Could I for example call subscribe/unsubscribe/seek
for
>> each HTTP request on a consumer to switch topics/partitions to the
>> currently needed set or would this be a very expensive operation (i.e.
>> because it would fetch metadata from Kafka to identify the leader for
>> each
>> partition)?
>>
>> Greetings
>> Valentin
>>
> 
> 
> 
> -- 
> -- Guozhang

Re: Questions about Kafka 0.9 API changes

Posted by Guozhang Wang <wa...@gmail.com>.

Hello,

1) The new consumer clients will be developed under a new directory. The
old consumer, including the SimpleConsumer will not be changed, though it
will be retired in the 0.9 release.

2) I am not very familiar with HTTP wrapper on the clients, could someone
who have done so comment here?

3) The new KafkaConsumer has not been fully implemented, but you can take a
look at its JavaDoc for examples of using the new client.

http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/

Guozhang


On Mon, Sep 22, 2014 at 8:10 AM, Valentin <ka...@sblk.de> wrote:

>
> Hello,
>
> I am currently working on a Kafka implementation and have a couple of
> questions concerning the road map for the future.
> As I am unsure where to put such questions, I decided to try my luck on
> this mailing list. If this is the wrong place for such inquiries, I
> apologize. In this case it would be great if someone could offer some
> pointers as to where to find/get these answers.
>
> So, here I go :)
>
> 1) Consumer Redesign in Kafka 0.9
> I found a number of documents explaining planned changes to the consumer
> APIs for Kafka version 0.9. However, these documents are only mentioning
> the high level consumer implementations. Does anyone know if the
> kafka.javaapi.consumer.SimpleConsumer API/implementation will also change
> with 0.9? Or will that stay more or less as it is now?
>
> 2) Pooling of Kafka Connections - SimpleConsumer
> As I have a use case where the connection between the final consumers and
> Kafka needs to happen via HTTP, I am concerned about performance
> implications of the required HTTP wrapping. I am planning to implement a
> custom HTTP API for Kafka producers and consumers which will be stateless
> and where offset tracking will be done on the final consumer side. Now the
> question here would be whether anyone has made experiences with pooling
> connections to Kafka brokers in order to reuse them effectively for
> incoming, stateless HTTP REST calls. An idea here would be to have one
> connection pool per broker host and to keep a set of open
> consumers/connections for each broker in those pools. Once I know which
> broker is the leader for a requested topic partition for a REST call, I
> could then use an already existing consumer/connection from that pool for
> the processing of that REST call and then return it to the pool. So I'd be
> able to have completely stateless REST call handling without having to
> open/close Kafka connections all the time.
>
> 3) Pooling of Kafka Connections - KafkaConsumer (Kafka 0.9)
> Now let's assume I want to implement the idea from 2) but with the high
> level KafkaConsumer (to leave identifications of partition leaders and
> error handling to it). Are already any implementation details known/decided
> on how the subscribe, unsubscribe and seek methods will work internally?
> Would I be able to somehow reuse connected KafkaConsumer objects in
> connection pools? Could I for example call subscribe/unsubscribe/seek for
> each HTTP request on a consumer to switch topics/partitions to the
> currently needed set or would this be a very expensive operation (i.e.
> because it would fetch metadata from Kafka to identify the leader for each
> partition)?
>
> Greetings
> Valentin
>



-- 
-- Guozhang

Re: conflicted ephemeral node error

Posted by Guozhang Wang <wa...@gmail.com>.

Hi Snehalata,

Did you see this log only a few times or it keep spilling in the log file?

Guozhang

On Sun, Sep 28, 2014 at 11:20 PM, Snehalata Nagaje <
snehalata.nagaje@harbingergroup.com> wrote:

>
>
> Hi ,
>
>
> I am getting this error in kafka logs
>
> I wrote this conflicted ephemeral node
> [{"version":1,"brokerid":0,"timestamp":"1411969901501"}] at /controller a
> while back in a different session, hence I will backoff for this node to be
> deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
>
> Can anyone help me with this?
>
> Thanks,
> Snehalata
> Disclaimer:
> This e-mail may contain Privileged/Confidential information and is
> intended only for the individual(s) named. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon
> this information by persons or entities other than the intended recipient
> is prohibited. Please notify the sender, if you have received this e-mail
> by mistake and delete it from your system. Information in this message that
> does not relate to the official business of the company shall be understood
> as neither given nor endorsed by it. E-mail transmission cannot be
> guaranteed to be secure or error-free. The sender does not accept liability
> for any errors or omissions in the contents of this message which arise as
> a result of e-mail transmission. If verification is required please request
> a hard-copy version. Visit us at http://www.harbingergroup.com/
>



-- 
-- Guozhang

conflicted ephemeral node error

Posted by Snehalata Nagaje <sn...@harbingergroup.com>.


Hi ,


I am getting this error in kafka logs

I wrote this conflicted ephemeral node [{"version":1,"brokerid":0,"timestamp":"1411969901501"}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$)

Can anyone help me with this?

Thanks,
Snehalata
Disclaimer:
This e-mail may contain Privileged/Confidential information and is intended only for the individual(s) named. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. Please notify the sender, if you have received this e-mail by mistake and delete it from your system. Information in this message that does not relate to the official business of the company shall be understood as neither given nor endorsed by it. E-mail transmission cannot be guaranteed to be secure or error-free. The sender does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. Visit us at http://www.harbingergroup.com/

Re: Questions about Kafka 0.9 API changes

Posted by Guozhang Wang <wa...@gmail.com>.

Thanks Valentin!

Guozhang

On Sun, Sep 28, 2014 at 3:49 PM, Valentin <ka...@sblk.de> wrote:

>
> Hi Jun,
>
> ok, I created:
> https://issues.apache.org/jira/browse/KAFKA-1655
>
> Greetings
> Valentin
>
> On Sat, 27 Sep 2014 08:31:01 -0700, Jun Rao <ju...@gmail.com> wrote:
> > Valentin,
> >
> > That's a good point. We don't have this use case in mind when designing
> the
> > new consumer api. A straightforward implementation could be removing the
> > locally cached topic metadata for unsubscribed topics. It's probably
> > possible to add a config value to avoid churns in caching the metadata.
> > Could you file a jira so that we can track this?
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Sep 25, 2014 at 4:19 AM, Valentin <ka...@sblk.de> wrote:
> >
> >>
> >> Hi Jun, Hi Guozhang,
> >>
> >> hm, yeah, if the subscribe/unsubscribe is a smart and lightweight
> >> operation this might work. But if it needs to do any additional calls
> to
> >> fetch metadata during a subscribe/unsubscribe call, the overhead could
> >> get
> >> quite problematic. The main issue I still see here is that an
> additional
> >> layer is added which does not really provide any benefit for a use case
> >> like mine.
> >> I.e. the leader discovery and connection handling you mention below
> don't
> >> really offer value in this case, as for the connection pooling approach
> >> suggested, I will have to discover and maintain leader metadata in my
> own
> >> code anyway as well as handling connection pooling. So if I understand
> >> the
> >> current plans for the Kafka 0.9 consumer correctly, it just doesn't
> work
> >> well for my use case. Sure, there are workarounds to make it work in my
> >> scenario, but I doubt any of them would scale as well as my current
> >> SimpleConsumer approach :|
> >> Or am I missing something here?
> >>
> >> Greetings
> >> Valentin
> >>
> >> On Wed, 24 Sep 2014 17:44:15 -0700, Jun Rao <ju...@gmail.com> wrote:
> >> > Valentin,
> >> >
> >> > As Guozhang mentioned, to use the new consumer in the SimpleConsumer
> >> way,
> >> > you would subscribe to a set of topic partitions and the issue
> poll().
> >> You
> >> > can change subscriptions on every poll since it's cheap. The benefit
> >> > you
> >> > get is that it does things like leader discovery and maintaining
> >> > connections to the leader automatically for you.
> >> >
> >> > In any case, we will leave the old consumer including the
> >> > SimpleConsumer
> >> > for sometime even after the new consumer is out.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Tue, Sep 23, 2014 at 12:23 PM, Valentin <ka...@sblk.de>
> >> wrote:
> >> >
> >> >> Hi Jun,
> >> >>
> >> >> yes, that would theoretically be possible, but it does not scale at
> >> all.
> >> >>
> >> >> I.e. in the current HTTP REST API use case, I have 5 connection
> pools
> >> on
> >> >> every tomcat server (as I have 5 brokers) and each connection pool
> >> holds
> >> >> upto 10 SimpleConsumer connections. So all in all I get a maximum of
> >> >> 50
> >> >> open connections per web application server. And with that I am able
> >> >> to
> >> >> handle most requests from HTTP consumers without having to
> open/close
> >> >> any new connections to a broker host.
> >> >>
> >> >> If I would now do the same implementation with the new Kafka 0.9
> high
> >> >> level consumer, I would end up with >1000 connection pools (as I
> have
> >> >> >1000 topic partitions) and each of these connection pools would
> >> contain
> >> >> a number of consumer connections. So all in all, I would end up with
> >> >> thousands of connection objects per application server. Not really a
> >> >> viable approach :|
> >> >>
> >> >> Currently I am wondering what the rationale is for deprecating the
> >> >> SimpleConsumer API, if there are use cases which just work much
> better
> >> >> using it.
> >> >>
> >> >> Greetings
> >> >> Valentin
> >> >>
> >> >> On 23/09/14 18:16, Guozhang Wang wrote:
> >> >> > Hello,
> >> >> >
> >> >> > For your use case, with the new consumer you can still create a
> new
> >> >> > consumer instance for each  topic / partition, and remember the
> >> mapping
> >> >> of
> >> >> > topic / partition => consumer. The upon receiving the http request
> >> you
> >> >> can
> >> >> > then decide which consumer to use. Since the new consumer is
> single
> >> >> > threaded, creating this many new consumers is roughly the same
> cost
> >> >> > with
> >> >> > the old simple consumer.
> >> >> >
> >> >> > Guozhang
> >> >> >
> >> >> > On Tue, Sep 23, 2014 at 2:32 AM, Valentin <ka...@sblk.de>
> >> >> > wrote:
> >> >> >
> >> >> >>
> >> >> >> Hi Jun,
> >> >> >>
> >> >> >> On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <ju...@gmail.com>
> >> wrote:
> >> >> >>> The new consumer api will also allow you to do what you want in
> a
> >> >> >>> SimpleConsumer (e.g., subscribe to a static set of partitions,
> >> >> >>> control
> >> >> >>> initial offsets, etc), only more conveniently.
> >> >> >>
> >> >> >> Yeah, I have reviewed the available javadocs for the new Kafka
> 0.9
> >> >> >> consumer APIs.
> >> >> >> However, while they still allow me to do roughly what I want, I
> >> >> >> fear
> >> >> that
> >> >> >> they will result in an overall much worse performing
> implementation
> >> on
> >> >> my
> >> >> >> side.
> >> >> >> The main problem I have in my scenario is that consumer requests
> >> >> >> are
> >> >> >> coming in via stateless HTTP requests (each request is standalone
> >> and
> >> >> >> specifies topics+partitions+offsets to read data from) and I need
> >> >> >> to
> >> >> find a
> >> >> >> good way to do connection pooling to the Kafka backend for good
> >> >> >> performance. The SimpleConsumer would allow me to do that, an
> >> approach
> >> >> with
> >> >> >> the new Kafka 0.9 consumer API seems to have a lot more overhead.
> >> >> >>
> >> >> >> Basically, what I am looking for is a way to pool connections per
> >> >> >> Kafka
> >> >> >> broker host, independent of the topics/partitions/clients/..., so
> >> each
> >> >> >> Tomcat app server would keep N disjunctive connection pools, if I
> >> >> >> have N
> >> >> >> Kafka broker hosts.
> >> >> >> I would then keep some central metadata which tells me which
> hosts
> >> are
> >> >> the
> >> >> >> leaders for which topic+partition and for an incoming HTTP client
> >> >> request
> >> >> >> I'd just take a Kafka connection from the pool for that
> particular
> >> >> broker
> >> >> >> host, request the data and return the connection to the pool.
> This
> >> >> >> means
> >> >> >> that a Kafka broker host will get requests from lots of different
> >> end
> >> >> >> consumers via the same TCP connection (sequentially of course).
> >> >> >>
> >> >> >> With the new Kafka consumer API I would have to
> >> subscribe/unsubscribe
> >> >> from
> >> >> >> topics every time I take a connection from the pool and as the
> >> request
> >> >> may
> >> >> >> need go to a different broker host than the last one, that
> wouldn't
> >> >> >> even
> >> >> >> prevent all the connection/reconnection overhead. I guess I could
> >> >> >> create
> >> >> >> one dedicated connection pool per topic-partition, that way
> >> >> >> connection/reconnection overhead should be minimized, but that
> way
> >> I'd
> >> >> end
> >> >> >> up with hundreds of connection pools per app server, also not a
> >> >> >> good
> >> >> >> approach.
> >> >> >> All in all, the planned design of the new consumer API just
> doesn't
> >> >> >> seem
> >> >> >> to fit my use case well. Which is why I am a bit anxious about
> the
> >> >> >> SimpleConsumer API being deprecated.
> >> >> >>
> >> >> >> Or am I missing something here? Thanks!
> >> >> >>
> >> >> >> Greetings
> >> >> >> Valentin
> >> >>
> >> >>
> >>
>



-- 
-- Guozhang

Re: Questions about Kafka 0.9 API changes

Posted by Valentin <ka...@sblk.de>.

Hi Jun,

ok, I created:
https://issues.apache.org/jira/browse/KAFKA-1655

Greetings
Valentin

On Sat, 27 Sep 2014 08:31:01 -0700, Jun Rao <ju...@gmail.com> wrote:
> Valentin,
> 
> That's a good point. We don't have this use case in mind when designing
the
> new consumer api. A straightforward implementation could be removing the
> locally cached topic metadata for unsubscribed topics. It's probably
> possible to add a config value to avoid churns in caching the metadata.
> Could you file a jira so that we can track this?
> 
> Thanks,
> 
> Jun
> 
> On Thu, Sep 25, 2014 at 4:19 AM, Valentin <ka...@sblk.de> wrote:
> 
>>
>> Hi Jun, Hi Guozhang,
>>
>> hm, yeah, if the subscribe/unsubscribe is a smart and lightweight
>> operation this might work. But if it needs to do any additional calls
to
>> fetch metadata during a subscribe/unsubscribe call, the overhead could
>> get
>> quite problematic. The main issue I still see here is that an
additional
>> layer is added which does not really provide any benefit for a use case
>> like mine.
>> I.e. the leader discovery and connection handling you mention below
don't
>> really offer value in this case, as for the connection pooling approach
>> suggested, I will have to discover and maintain leader metadata in my
own
>> code anyway as well as handling connection pooling. So if I understand
>> the
>> current plans for the Kafka 0.9 consumer correctly, it just doesn't
work
>> well for my use case. Sure, there are workarounds to make it work in my
>> scenario, but I doubt any of them would scale as well as my current
>> SimpleConsumer approach :|
>> Or am I missing something here?
>>
>> Greetings
>> Valentin
>>
>> On Wed, 24 Sep 2014 17:44:15 -0700, Jun Rao <ju...@gmail.com> wrote:
>> > Valentin,
>> >
>> > As Guozhang mentioned, to use the new consumer in the SimpleConsumer
>> way,
>> > you would subscribe to a set of topic partitions and the issue
poll().
>> You
>> > can change subscriptions on every poll since it's cheap. The benefit
>> > you
>> > get is that it does things like leader discovery and maintaining
>> > connections to the leader automatically for you.
>> >
>> > In any case, we will leave the old consumer including the
>> > SimpleConsumer
>> > for sometime even after the new consumer is out.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Tue, Sep 23, 2014 at 12:23 PM, Valentin <ka...@sblk.de>
>> wrote:
>> >
>> >> Hi Jun,
>> >>
>> >> yes, that would theoretically be possible, but it does not scale at
>> all.
>> >>
>> >> I.e. in the current HTTP REST API use case, I have 5 connection
pools
>> on
>> >> every tomcat server (as I have 5 brokers) and each connection pool
>> holds
>> >> upto 10 SimpleConsumer connections. So all in all I get a maximum of
>> >> 50
>> >> open connections per web application server. And with that I am able
>> >> to
>> >> handle most requests from HTTP consumers without having to
open/close
>> >> any new connections to a broker host.
>> >>
>> >> If I would now do the same implementation with the new Kafka 0.9
high
>> >> level consumer, I would end up with >1000 connection pools (as I
have
>> >> >1000 topic partitions) and each of these connection pools would
>> contain
>> >> a number of consumer connections. So all in all, I would end up with
>> >> thousands of connection objects per application server. Not really a
>> >> viable approach :|
>> >>
>> >> Currently I am wondering what the rationale is for deprecating the
>> >> SimpleConsumer API, if there are use cases which just work much
better
>> >> using it.
>> >>
>> >> Greetings
>> >> Valentin
>> >>
>> >> On 23/09/14 18:16, Guozhang Wang wrote:
>> >> > Hello,
>> >> >
>> >> > For your use case, with the new consumer you can still create a
new
>> >> > consumer instance for each  topic / partition, and remember the
>> mapping
>> >> of
>> >> > topic / partition => consumer. The upon receiving the http request
>> you
>> >> can
>> >> > then decide which consumer to use. Since the new consumer is
single
>> >> > threaded, creating this many new consumers is roughly the same
cost
>> >> > with
>> >> > the old simple consumer.
>> >> >
>> >> > Guozhang
>> >> >
>> >> > On Tue, Sep 23, 2014 at 2:32 AM, Valentin <ka...@sblk.de>
>> >> > wrote:
>> >> >
>> >> >>
>> >> >> Hi Jun,
>> >> >>
>> >> >> On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <ju...@gmail.com>
>> wrote:
>> >> >>> The new consumer api will also allow you to do what you want in
a
>> >> >>> SimpleConsumer (e.g., subscribe to a static set of partitions,
>> >> >>> control
>> >> >>> initial offsets, etc), only more conveniently.
>> >> >>
>> >> >> Yeah, I have reviewed the available javadocs for the new Kafka
0.9
>> >> >> consumer APIs.
>> >> >> However, while they still allow me to do roughly what I want, I
>> >> >> fear
>> >> that
>> >> >> they will result in an overall much worse performing
implementation
>> on
>> >> my
>> >> >> side.
>> >> >> The main problem I have in my scenario is that consumer requests
>> >> >> are
>> >> >> coming in via stateless HTTP requests (each request is standalone
>> and
>> >> >> specifies topics+partitions+offsets to read data from) and I need
>> >> >> to
>> >> find a
>> >> >> good way to do connection pooling to the Kafka backend for good
>> >> >> performance. The SimpleConsumer would allow me to do that, an
>> approach
>> >> with
>> >> >> the new Kafka 0.9 consumer API seems to have a lot more overhead.
>> >> >>
>> >> >> Basically, what I am looking for is a way to pool connections per
>> >> >> Kafka
>> >> >> broker host, independent of the topics/partitions/clients/..., so
>> each
>> >> >> Tomcat app server would keep N disjunctive connection pools, if I
>> >> >> have N
>> >> >> Kafka broker hosts.
>> >> >> I would then keep some central metadata which tells me which
hosts
>> are
>> >> the
>> >> >> leaders for which topic+partition and for an incoming HTTP client
>> >> request
>> >> >> I'd just take a Kafka connection from the pool for that
particular
>> >> broker
>> >> >> host, request the data and return the connection to the pool.
This
>> >> >> means
>> >> >> that a Kafka broker host will get requests from lots of different
>> end
>> >> >> consumers via the same TCP connection (sequentially of course).
>> >> >>
>> >> >> With the new Kafka consumer API I would have to
>> subscribe/unsubscribe
>> >> from
>> >> >> topics every time I take a connection from the pool and as the
>> request
>> >> may
>> >> >> need go to a different broker host than the last one, that
wouldn't
>> >> >> even
>> >> >> prevent all the connection/reconnection overhead. I guess I could
>> >> >> create
>> >> >> one dedicated connection pool per topic-partition, that way
>> >> >> connection/reconnection overhead should be minimized, but that
way
>> I'd
>> >> end
>> >> >> up with hundreds of connection pools per app server, also not a
>> >> >> good
>> >> >> approach.
>> >> >> All in all, the planned design of the new consumer API just
doesn't
>> >> >> seem
>> >> >> to fit my use case well. Which is why I am a bit anxious about
the
>> >> >> SimpleConsumer API being deprecated.
>> >> >>
>> >> >> Or am I missing something here? Thanks!
>> >> >>
>> >> >> Greetings
>> >> >> Valentin
>> >>
>> >>
>>

Re: Questions about Kafka 0.9 API changes

Posted by Jun Rao <ju...@gmail.com>.

Valentin,

That's a good point. We don't have this use case in mind when designing the
new consumer api. A straightforward implementation could be removing the
locally cached topic metadata for unsubscribed topics. It's probably
possible to add a config value to avoid churns in caching the metadata.
Could you file a jira so that we can track this?

Thanks,

Jun

On Thu, Sep 25, 2014 at 4:19 AM, Valentin <ka...@sblk.de> wrote:

>
> Hi Jun, Hi Guozhang,
>
> hm, yeah, if the subscribe/unsubscribe is a smart and lightweight
> operation this might work. But if it needs to do any additional calls to
> fetch metadata during a subscribe/unsubscribe call, the overhead could get
> quite problematic. The main issue I still see here is that an additional
> layer is added which does not really provide any benefit for a use case
> like mine.
> I.e. the leader discovery and connection handling you mention below don't
> really offer value in this case, as for the connection pooling approach
> suggested, I will have to discover and maintain leader metadata in my own
> code anyway as well as handling connection pooling. So if I understand the
> current plans for the Kafka 0.9 consumer correctly, it just doesn't work
> well for my use case. Sure, there are workarounds to make it work in my
> scenario, but I doubt any of them would scale as well as my current
> SimpleConsumer approach :|
> Or am I missing something here?
>
> Greetings
> Valentin
>
> On Wed, 24 Sep 2014 17:44:15 -0700, Jun Rao <ju...@gmail.com> wrote:
> > Valentin,
> >
> > As Guozhang mentioned, to use the new consumer in the SimpleConsumer
> way,
> > you would subscribe to a set of topic partitions and the issue poll().
> You
> > can change subscriptions on every poll since it's cheap. The benefit you
> > get is that it does things like leader discovery and maintaining
> > connections to the leader automatically for you.
> >
> > In any case, we will leave the old consumer including the SimpleConsumer
> > for sometime even after the new consumer is out.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Sep 23, 2014 at 12:23 PM, Valentin <ka...@sblk.de>
> wrote:
> >
> >> Hi Jun,
> >>
> >> yes, that would theoretically be possible, but it does not scale at
> all.
> >>
> >> I.e. in the current HTTP REST API use case, I have 5 connection pools
> on
> >> every tomcat server (as I have 5 brokers) and each connection pool
> holds
> >> upto 10 SimpleConsumer connections. So all in all I get a maximum of 50
> >> open connections per web application server. And with that I am able to
> >> handle most requests from HTTP consumers without having to open/close
> >> any new connections to a broker host.
> >>
> >> If I would now do the same implementation with the new Kafka 0.9 high
> >> level consumer, I would end up with >1000 connection pools (as I have
> >> >1000 topic partitions) and each of these connection pools would
> contain
> >> a number of consumer connections. So all in all, I would end up with
> >> thousands of connection objects per application server. Not really a
> >> viable approach :|
> >>
> >> Currently I am wondering what the rationale is for deprecating the
> >> SimpleConsumer API, if there are use cases which just work much better
> >> using it.
> >>
> >> Greetings
> >> Valentin
> >>
> >> On 23/09/14 18:16, Guozhang Wang wrote:
> >> > Hello,
> >> >
> >> > For your use case, with the new consumer you can still create a new
> >> > consumer instance for each  topic / partition, and remember the
> mapping
> >> of
> >> > topic / partition => consumer. The upon receiving the http request
> you
> >> can
> >> > then decide which consumer to use. Since the new consumer is single
> >> > threaded, creating this many new consumers is roughly the same cost
> >> > with
> >> > the old simple consumer.
> >> >
> >> > Guozhang
> >> >
> >> > On Tue, Sep 23, 2014 at 2:32 AM, Valentin <ka...@sblk.de>
> >> > wrote:
> >> >
> >> >>
> >> >> Hi Jun,
> >> >>
> >> >> On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <ju...@gmail.com>
> wrote:
> >> >>> The new consumer api will also allow you to do what you want in a
> >> >>> SimpleConsumer (e.g., subscribe to a static set of partitions,
> >> >>> control
> >> >>> initial offsets, etc), only more conveniently.
> >> >>
> >> >> Yeah, I have reviewed the available javadocs for the new Kafka 0.9
> >> >> consumer APIs.
> >> >> However, while they still allow me to do roughly what I want, I fear
> >> that
> >> >> they will result in an overall much worse performing implementation
> on
> >> my
> >> >> side.
> >> >> The main problem I have in my scenario is that consumer requests are
> >> >> coming in via stateless HTTP requests (each request is standalone
> and
> >> >> specifies topics+partitions+offsets to read data from) and I need to
> >> find a
> >> >> good way to do connection pooling to the Kafka backend for good
> >> >> performance. The SimpleConsumer would allow me to do that, an
> approach
> >> with
> >> >> the new Kafka 0.9 consumer API seems to have a lot more overhead.
> >> >>
> >> >> Basically, what I am looking for is a way to pool connections per
> >> >> Kafka
> >> >> broker host, independent of the topics/partitions/clients/..., so
> each
> >> >> Tomcat app server would keep N disjunctive connection pools, if I
> >> >> have N
> >> >> Kafka broker hosts.
> >> >> I would then keep some central metadata which tells me which hosts
> are
> >> the
> >> >> leaders for which topic+partition and for an incoming HTTP client
> >> request
> >> >> I'd just take a Kafka connection from the pool for that particular
> >> broker
> >> >> host, request the data and return the connection to the pool. This
> >> >> means
> >> >> that a Kafka broker host will get requests from lots of different
> end
> >> >> consumers via the same TCP connection (sequentially of course).
> >> >>
> >> >> With the new Kafka consumer API I would have to
> subscribe/unsubscribe
> >> from
> >> >> topics every time I take a connection from the pool and as the
> request
> >> may
> >> >> need go to a different broker host than the last one, that wouldn't
> >> >> even
> >> >> prevent all the connection/reconnection overhead. I guess I could
> >> >> create
> >> >> one dedicated connection pool per topic-partition, that way
> >> >> connection/reconnection overhead should be minimized, but that way
> I'd
> >> end
> >> >> up with hundreds of connection pools per app server, also not a good
> >> >> approach.
> >> >> All in all, the planned design of the new consumer API just doesn't
> >> >> seem
> >> >> to fit my use case well. Which is why I am a bit anxious about the
> >> >> SimpleConsumer API being deprecated.
> >> >>
> >> >> Or am I missing something here? Thanks!
> >> >>
> >> >> Greetings
> >> >> Valentin
> >>
> >>
>

Re: Questions about Kafka 0.9 API changes

Posted by Valentin <ka...@sblk.de>.

Hi Jun, Hi Guozhang,

hm, yeah, if the subscribe/unsubscribe is a smart and lightweight
operation this might work. But if it needs to do any additional calls to
fetch metadata during a subscribe/unsubscribe call, the overhead could get
quite problematic. The main issue I still see here is that an additional
layer is added which does not really provide any benefit for a use case
like mine.
I.e. the leader discovery and connection handling you mention below don't
really offer value in this case, as for the connection pooling approach
suggested, I will have to discover and maintain leader metadata in my own
code anyway as well as handling connection pooling. So if I understand the
current plans for the Kafka 0.9 consumer correctly, it just doesn't work
well for my use case. Sure, there are workarounds to make it work in my
scenario, but I doubt any of them would scale as well as my current
SimpleConsumer approach :|
Or am I missing something here?

Greetings
Valentin

On Wed, 24 Sep 2014 17:44:15 -0700, Jun Rao <ju...@gmail.com> wrote:
> Valentin,
> 
> As Guozhang mentioned, to use the new consumer in the SimpleConsumer
way,
> you would subscribe to a set of topic partitions and the issue poll().
You
> can change subscriptions on every poll since it's cheap. The benefit you
> get is that it does things like leader discovery and maintaining
> connections to the leader automatically for you.
> 
> In any case, we will leave the old consumer including the SimpleConsumer
> for sometime even after the new consumer is out.
> 
> Thanks,
> 
> Jun
> 
> On Tue, Sep 23, 2014 at 12:23 PM, Valentin <ka...@sblk.de>
wrote:
> 
>> Hi Jun,
>>
>> yes, that would theoretically be possible, but it does not scale at
all.
>>
>> I.e. in the current HTTP REST API use case, I have 5 connection pools
on
>> every tomcat server (as I have 5 brokers) and each connection pool
holds
>> upto 10 SimpleConsumer connections. So all in all I get a maximum of 50
>> open connections per web application server. And with that I am able to
>> handle most requests from HTTP consumers without having to open/close
>> any new connections to a broker host.
>>
>> If I would now do the same implementation with the new Kafka 0.9 high
>> level consumer, I would end up with >1000 connection pools (as I have
>> >1000 topic partitions) and each of these connection pools would
contain
>> a number of consumer connections. So all in all, I would end up with
>> thousands of connection objects per application server. Not really a
>> viable approach :|
>>
>> Currently I am wondering what the rationale is for deprecating the
>> SimpleConsumer API, if there are use cases which just work much better
>> using it.
>>
>> Greetings
>> Valentin
>>
>> On 23/09/14 18:16, Guozhang Wang wrote:
>> > Hello,
>> >
>> > For your use case, with the new consumer you can still create a new
>> > consumer instance for each  topic / partition, and remember the
mapping
>> of
>> > topic / partition => consumer. The upon receiving the http request
you
>> can
>> > then decide which consumer to use. Since the new consumer is single
>> > threaded, creating this many new consumers is roughly the same cost
>> > with
>> > the old simple consumer.
>> >
>> > Guozhang
>> >
>> > On Tue, Sep 23, 2014 at 2:32 AM, Valentin <ka...@sblk.de>
>> > wrote:
>> >
>> >>
>> >> Hi Jun,
>> >>
>> >> On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <ju...@gmail.com>
wrote:
>> >>> The new consumer api will also allow you to do what you want in a
>> >>> SimpleConsumer (e.g., subscribe to a static set of partitions,
>> >>> control
>> >>> initial offsets, etc), only more conveniently.
>> >>
>> >> Yeah, I have reviewed the available javadocs for the new Kafka 0.9
>> >> consumer APIs.
>> >> However, while they still allow me to do roughly what I want, I fear
>> that
>> >> they will result in an overall much worse performing implementation
on
>> my
>> >> side.
>> >> The main problem I have in my scenario is that consumer requests are
>> >> coming in via stateless HTTP requests (each request is standalone
and
>> >> specifies topics+partitions+offsets to read data from) and I need to
>> find a
>> >> good way to do connection pooling to the Kafka backend for good
>> >> performance. The SimpleConsumer would allow me to do that, an
approach
>> with
>> >> the new Kafka 0.9 consumer API seems to have a lot more overhead.
>> >>
>> >> Basically, what I am looking for is a way to pool connections per
>> >> Kafka
>> >> broker host, independent of the topics/partitions/clients/..., so
each
>> >> Tomcat app server would keep N disjunctive connection pools, if I
>> >> have N
>> >> Kafka broker hosts.
>> >> I would then keep some central metadata which tells me which hosts
are
>> the
>> >> leaders for which topic+partition and for an incoming HTTP client
>> request
>> >> I'd just take a Kafka connection from the pool for that particular
>> broker
>> >> host, request the data and return the connection to the pool. This
>> >> means
>> >> that a Kafka broker host will get requests from lots of different
end
>> >> consumers via the same TCP connection (sequentially of course).
>> >>
>> >> With the new Kafka consumer API I would have to
subscribe/unsubscribe
>> from
>> >> topics every time I take a connection from the pool and as the
request
>> may
>> >> need go to a different broker host than the last one, that wouldn't
>> >> even
>> >> prevent all the connection/reconnection overhead. I guess I could
>> >> create
>> >> one dedicated connection pool per topic-partition, that way
>> >> connection/reconnection overhead should be minimized, but that way
I'd
>> end
>> >> up with hundreds of connection pools per app server, also not a good
>> >> approach.
>> >> All in all, the planned design of the new consumer API just doesn't
>> >> seem
>> >> to fit my use case well. Which is why I am a bit anxious about the
>> >> SimpleConsumer API being deprecated.
>> >>
>> >> Or am I missing something here? Thanks!
>> >>
>> >> Greetings
>> >> Valentin
>>
>>

Re: Questions about Kafka 0.9 API changes

Posted by Jun Rao <ju...@gmail.com>.

Valentin,

As Guozhang mentioned, to use the new consumer in the SimpleConsumer way,
you would subscribe to a set of topic partitions and the issue poll(). You
can change subscriptions on every poll since it's cheap. The benefit you
get is that it does things like leader discovery and maintaining
connections to the leader automatically for you.

In any case, we will leave the old consumer including the SimpleConsumer
for sometime even after the new consumer is out.

Thanks,

Jun

On Tue, Sep 23, 2014 at 12:23 PM, Valentin <ka...@sblk.de> wrote:

> Hi Jun,
>
> yes, that would theoretically be possible, but it does not scale at all.
>
> I.e. in the current HTTP REST API use case, I have 5 connection pools on
> every tomcat server (as I have 5 brokers) and each connection pool holds
> upto 10 SimpleConsumer connections. So all in all I get a maximum of 50
> open connections per web application server. And with that I am able to
> handle most requests from HTTP consumers without having to open/close
> any new connections to a broker host.
>
> If I would now do the same implementation with the new Kafka 0.9 high
> level consumer, I would end up with >1000 connection pools (as I have
> >1000 topic partitions) and each of these connection pools would contain
> a number of consumer connections. So all in all, I would end up with
> thousands of connection objects per application server. Not really a
> viable approach :|
>
> Currently I am wondering what the rationale is for deprecating the
> SimpleConsumer API, if there are use cases which just work much better
> using it.
>
> Greetings
> Valentin
>
> On 23/09/14 18:16, Guozhang Wang wrote:
> > Hello,
> >
> > For your use case, with the new consumer you can still create a new
> > consumer instance for each  topic / partition, and remember the mapping
> of
> > topic / partition => consumer. The upon receiving the http request you
> can
> > then decide which consumer to use. Since the new consumer is single
> > threaded, creating this many new consumers is roughly the same cost with
> > the old simple consumer.
> >
> > Guozhang
> >
> > On Tue, Sep 23, 2014 at 2:32 AM, Valentin <ka...@sblk.de> wrote:
> >
> >>
> >> Hi Jun,
> >>
> >> On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <ju...@gmail.com> wrote:
> >>> The new consumer api will also allow you to do what you want in a
> >>> SimpleConsumer (e.g., subscribe to a static set of partitions, control
> >>> initial offsets, etc), only more conveniently.
> >>
> >> Yeah, I have reviewed the available javadocs for the new Kafka 0.9
> >> consumer APIs.
> >> However, while they still allow me to do roughly what I want, I fear
> that
> >> they will result in an overall much worse performing implementation on
> my
> >> side.
> >> The main problem I have in my scenario is that consumer requests are
> >> coming in via stateless HTTP requests (each request is standalone and
> >> specifies topics+partitions+offsets to read data from) and I need to
> find a
> >> good way to do connection pooling to the Kafka backend for good
> >> performance. The SimpleConsumer would allow me to do that, an approach
> with
> >> the new Kafka 0.9 consumer API seems to have a lot more overhead.
> >>
> >> Basically, what I am looking for is a way to pool connections per Kafka
> >> broker host, independent of the topics/partitions/clients/..., so each
> >> Tomcat app server would keep N disjunctive connection pools, if I have N
> >> Kafka broker hosts.
> >> I would then keep some central metadata which tells me which hosts are
> the
> >> leaders for which topic+partition and for an incoming HTTP client
> request
> >> I'd just take a Kafka connection from the pool for that particular
> broker
> >> host, request the data and return the connection to the pool. This means
> >> that a Kafka broker host will get requests from lots of different end
> >> consumers via the same TCP connection (sequentially of course).
> >>
> >> With the new Kafka consumer API I would have to subscribe/unsubscribe
> from
> >> topics every time I take a connection from the pool and as the request
> may
> >> need go to a different broker host than the last one, that wouldn't even
> >> prevent all the connection/reconnection overhead. I guess I could create
> >> one dedicated connection pool per topic-partition, that way
> >> connection/reconnection overhead should be minimized, but that way I'd
> end
> >> up with hundreds of connection pools per app server, also not a good
> >> approach.
> >> All in all, the planned design of the new consumer API just doesn't seem
> >> to fit my use case well. Which is why I am a bit anxious about the
> >> SimpleConsumer API being deprecated.
> >>
> >> Or am I missing something here? Thanks!
> >>
> >> Greetings
> >> Valentin
>
>

Re: Questions about Kafka 0.9 API changes

Posted by Guozhang Wang <wa...@gmail.com>.

Hi Valentin,

I see your point. Would the following be work for you then: You can
maintain the broker metadata as you already did and then use a 0.9 kafka
consumer for each broker, and hence by calling subscribe / de-subscribe the
consumer would not close / re-connect to the broker if it is implemented
smartly enough to realize the newly subscribed topic is still on the
current connected broker?

Guozhang

On Tue, Sep 23, 2014 at 12:23 PM, Valentin <ka...@sblk.de> wrote:

> Hi Jun,
>
> yes, that would theoretically be possible, but it does not scale at all.
>
> I.e. in the current HTTP REST API use case, I have 5 connection pools on
> every tomcat server (as I have 5 brokers) and each connection pool holds
> upto 10 SimpleConsumer connections. So all in all I get a maximum of 50
> open connections per web application server. And with that I am able to
> handle most requests from HTTP consumers without having to open/close
> any new connections to a broker host.
>
> If I would now do the same implementation with the new Kafka 0.9 high
> level consumer, I would end up with >1000 connection pools (as I have
> >1000 topic partitions) and each of these connection pools would contain
> a number of consumer connections. So all in all, I would end up with
> thousands of connection objects per application server. Not really a
> viable approach :|
>
> Currently I am wondering what the rationale is for deprecating the
> SimpleConsumer API, if there are use cases which just work much better
> using it.
>
> Greetings
> Valentin
>
> On 23/09/14 18:16, Guozhang Wang wrote:
> > Hello,
> >
> > For your use case, with the new consumer you can still create a new
> > consumer instance for each  topic / partition, and remember the mapping
> of
> > topic / partition => consumer. The upon receiving the http request you
> can
> > then decide which consumer to use. Since the new consumer is single
> > threaded, creating this many new consumers is roughly the same cost with
> > the old simple consumer.
> >
> > Guozhang
> >
> > On Tue, Sep 23, 2014 at 2:32 AM, Valentin <ka...@sblk.de> wrote:
> >
> >>
> >> Hi Jun,
> >>
> >> On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <ju...@gmail.com> wrote:
> >>> The new consumer api will also allow you to do what you want in a
> >>> SimpleConsumer (e.g., subscribe to a static set of partitions, control
> >>> initial offsets, etc), only more conveniently.
> >>
> >> Yeah, I have reviewed the available javadocs for the new Kafka 0.9
> >> consumer APIs.
> >> However, while they still allow me to do roughly what I want, I fear
> that
> >> they will result in an overall much worse performing implementation on
> my
> >> side.
> >> The main problem I have in my scenario is that consumer requests are
> >> coming in via stateless HTTP requests (each request is standalone and
> >> specifies topics+partitions+offsets to read data from) and I need to
> find a
> >> good way to do connection pooling to the Kafka backend for good
> >> performance. The SimpleConsumer would allow me to do that, an approach
> with
> >> the new Kafka 0.9 consumer API seems to have a lot more overhead.
> >>
> >> Basically, what I am looking for is a way to pool connections per Kafka
> >> broker host, independent of the topics/partitions/clients/..., so each
> >> Tomcat app server would keep N disjunctive connection pools, if I have N
> >> Kafka broker hosts.
> >> I would then keep some central metadata which tells me which hosts are
> the
> >> leaders for which topic+partition and for an incoming HTTP client
> request
> >> I'd just take a Kafka connection from the pool for that particular
> broker
> >> host, request the data and return the connection to the pool. This means
> >> that a Kafka broker host will get requests from lots of different end
> >> consumers via the same TCP connection (sequentially of course).
> >>
> >> With the new Kafka consumer API I would have to subscribe/unsubscribe
> from
> >> topics every time I take a connection from the pool and as the request
> may
> >> need go to a different broker host than the last one, that wouldn't even
> >> prevent all the connection/reconnection overhead. I guess I could create
> >> one dedicated connection pool per topic-partition, that way
> >> connection/reconnection overhead should be minimized, but that way I'd
> end
> >> up with hundreds of connection pools per app server, also not a good
> >> approach.
> >> All in all, the planned design of the new consumer API just doesn't seem
> >> to fit my use case well. Which is why I am a bit anxious about the
> >> SimpleConsumer API being deprecated.
> >>
> >> Or am I missing something here? Thanks!
> >>
> >> Greetings
> >> Valentin
>
>


-- 
-- Guozhang

Re: Questions about Kafka 0.9 API changes

Posted by Valentin <ka...@sblk.de>.

Hi Jun,

yes, that would theoretically be possible, but it does not scale at all.

I.e. in the current HTTP REST API use case, I have 5 connection pools on
every tomcat server (as I have 5 brokers) and each connection pool holds
upto 10 SimpleConsumer connections. So all in all I get a maximum of 50
open connections per web application server. And with that I am able to
handle most requests from HTTP consumers without having to open/close
any new connections to a broker host.

If I would now do the same implementation with the new Kafka 0.9 high
level consumer, I would end up with >1000 connection pools (as I have
>1000 topic partitions) and each of these connection pools would contain
a number of consumer connections. So all in all, I would end up with
thousands of connection objects per application server. Not really a
viable approach :|

Currently I am wondering what the rationale is for deprecating the
SimpleConsumer API, if there are use cases which just work much better
using it.

Greetings
Valentin

On 23/09/14 18:16, Guozhang Wang wrote:
> Hello,
> 
> For your use case, with the new consumer you can still create a new
> consumer instance for each  topic / partition, and remember the mapping of
> topic / partition => consumer. The upon receiving the http request you can
> then decide which consumer to use. Since the new consumer is single
> threaded, creating this many new consumers is roughly the same cost with
> the old simple consumer.
> 
> Guozhang
> 
> On Tue, Sep 23, 2014 at 2:32 AM, Valentin <ka...@sblk.de> wrote:
> 
>>
>> Hi Jun,
>>
>> On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <ju...@gmail.com> wrote:
>>> The new consumer api will also allow you to do what you want in a
>>> SimpleConsumer (e.g., subscribe to a static set of partitions, control
>>> initial offsets, etc), only more conveniently.
>>
>> Yeah, I have reviewed the available javadocs for the new Kafka 0.9
>> consumer APIs.
>> However, while they still allow me to do roughly what I want, I fear that
>> they will result in an overall much worse performing implementation on my
>> side.
>> The main problem I have in my scenario is that consumer requests are
>> coming in via stateless HTTP requests (each request is standalone and
>> specifies topics+partitions+offsets to read data from) and I need to find a
>> good way to do connection pooling to the Kafka backend for good
>> performance. The SimpleConsumer would allow me to do that, an approach with
>> the new Kafka 0.9 consumer API seems to have a lot more overhead.
>>
>> Basically, what I am looking for is a way to pool connections per Kafka
>> broker host, independent of the topics/partitions/clients/..., so each
>> Tomcat app server would keep N disjunctive connection pools, if I have N
>> Kafka broker hosts.
>> I would then keep some central metadata which tells me which hosts are the
>> leaders for which topic+partition and for an incoming HTTP client request
>> I'd just take a Kafka connection from the pool for that particular broker
>> host, request the data and return the connection to the pool. This means
>> that a Kafka broker host will get requests from lots of different end
>> consumers via the same TCP connection (sequentially of course).
>>
>> With the new Kafka consumer API I would have to subscribe/unsubscribe from
>> topics every time I take a connection from the pool and as the request may
>> need go to a different broker host than the last one, that wouldn't even
>> prevent all the connection/reconnection overhead. I guess I could create
>> one dedicated connection pool per topic-partition, that way
>> connection/reconnection overhead should be minimized, but that way I'd end
>> up with hundreds of connection pools per app server, also not a good
>> approach.
>> All in all, the planned design of the new consumer API just doesn't seem
>> to fit my use case well. Which is why I am a bit anxious about the
>> SimpleConsumer API being deprecated.
>>
>> Or am I missing something here? Thanks!
>>
>> Greetings
>> Valentin

Re: Questions about Kafka 0.9 API changes

Posted by Guozhang Wang <wa...@gmail.com>.

Hello,

For your use case, with the new consumer you can still create a new
consumer instance for each  topic / partition, and remember the mapping of
topic / partition => consumer. The upon receiving the http request you can
then decide which consumer to use. Since the new consumer is single
threaded, creating this many new consumers is roughly the same cost with
the old simple consumer.

Guozhang

On Tue, Sep 23, 2014 at 2:32 AM, Valentin <ka...@sblk.de> wrote:

>
> Hi Jun,
>
> On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <ju...@gmail.com> wrote:
> > The new consumer api will also allow you to do what you want in a
> > SimpleConsumer (e.g., subscribe to a static set of partitions, control
> > initial offsets, etc), only more conveniently.
>
> Yeah, I have reviewed the available javadocs for the new Kafka 0.9
> consumer APIs.
> However, while they still allow me to do roughly what I want, I fear that
> they will result in an overall much worse performing implementation on my
> side.
> The main problem I have in my scenario is that consumer requests are
> coming in via stateless HTTP requests (each request is standalone and
> specifies topics+partitions+offsets to read data from) and I need to find a
> good way to do connection pooling to the Kafka backend for good
> performance. The SimpleConsumer would allow me to do that, an approach with
> the new Kafka 0.9 consumer API seems to have a lot more overhead.
>
> Basically, what I am looking for is a way to pool connections per Kafka
> broker host, independent of the topics/partitions/clients/..., so each
> Tomcat app server would keep N disjunctive connection pools, if I have N
> Kafka broker hosts.
> I would then keep some central metadata which tells me which hosts are the
> leaders for which topic+partition and for an incoming HTTP client request
> I'd just take a Kafka connection from the pool for that particular broker
> host, request the data and return the connection to the pool. This means
> that a Kafka broker host will get requests from lots of different end
> consumers via the same TCP connection (sequentially of course).
>
> With the new Kafka consumer API I would have to subscribe/unsubscribe from
> topics every time I take a connection from the pool and as the request may
> need go to a different broker host than the last one, that wouldn't even
> prevent all the connection/reconnection overhead. I guess I could create
> one dedicated connection pool per topic-partition, that way
> connection/reconnection overhead should be minimized, but that way I'd end
> up with hundreds of connection pools per app server, also not a good
> approach.
> All in all, the planned design of the new consumer API just doesn't seem
> to fit my use case well. Which is why I am a bit anxious about the
> SimpleConsumer API being deprecated.
>
> Or am I missing something here? Thanks!
>
> Greetings
> Valentin
>
> > On Mon, Sep 22, 2014 at 8:10 AM, Valentin <ka...@sblk.de> wrote:
> >
> >>
> >> Hello,
> >>
> >> I am currently working on a Kafka implementation and have a couple of
> >> questions concerning the road map for the future.
> >> As I am unsure where to put such questions, I decided to try my luck on
> >> this mailing list. If this is the wrong place for such inquiries, I
> >> apologize. In this case it would be great if someone could offer some
> >> pointers as to where to find/get these answers.
> >>
> >> So, here I go :)
> >>
> >> 1) Consumer Redesign in Kafka 0.9
> >> I found a number of documents explaining planned changes to the
> consumer
> >> APIs for Kafka version 0.9. However, these documents are only
> mentioning
> >> the high level consumer implementations. Does anyone know if the
> >> kafka.javaapi.consumer.SimpleConsumer API/implementation will also
> change
> >> with 0.9? Or will that stay more or less as it is now?
> >>
> >> 2) Pooling of Kafka Connections - SimpleConsumer
> >> As I have a use case where the connection between the final consumers
> and
> >> Kafka needs to happen via HTTP, I am concerned about performance
> >> implications of the required HTTP wrapping. I am planning to implement
> a
> >> custom HTTP API for Kafka producers and consumers which will be
> stateless
> >> and where offset tracking will be done on the final consumer side. Now
> >> the
> >> question here would be whether anyone has made experiences with pooling
> >> connections to Kafka brokers in order to reuse them effectively for
> >> incoming, stateless HTTP REST calls. An idea here would be to have one
> >> connection pool per broker host and to keep a set of open
> >> consumers/connections for each broker in those pools. Once I know which
> >> broker is the leader for a requested topic partition for a REST call, I
> >> could then use an already existing consumer/connection from that pool
> for
> >> the processing of that REST call and then return it to the pool. So I'd
> >> be
> >> able to have completely stateless REST call handling without having to
> >> open/close Kafka connections all the time.
> >>
> >> 3) Pooling of Kafka Connections - KafkaConsumer (Kafka 0.9)
> >> Now let's assume I want to implement the idea from 2) but with the high
> >> level KafkaConsumer (to leave identifications of partition leaders and
> >> error handling to it). Are already any implementation details
> >> known/decided
> >> on how the subscribe, unsubscribe and seek methods will work
> internally?
> >> Would I be able to somehow reuse connected KafkaConsumer objects in
> >> connection pools? Could I for example call subscribe/unsubscribe/seek
> for
> >> each HTTP request on a consumer to switch topics/partitions to the
> >> currently needed set or would this be a very expensive operation (i.e.
> >> because it would fetch metadata from Kafka to identify the leader for
> >> each
> >> partition)?
> >>
> >> Greetings
> >> Valentin
> >>
>



-- 
-- Guozhang

Re: Questions about Kafka 0.9 API changes

Posted by Valentin <ka...@sblk.de>.

Hi Jun,

On Mon, 22 Sep 2014 21:15:55 -0700, Jun Rao <ju...@gmail.com> wrote:
> The new consumer api will also allow you to do what you want in a
> SimpleConsumer (e.g., subscribe to a static set of partitions, control
> initial offsets, etc), only more conveniently.

Yeah, I have reviewed the available javadocs for the new Kafka 0.9
consumer APIs.
However, while they still allow me to do roughly what I want, I fear that
they will result in an overall much worse performing implementation on my
side.
The main problem I have in my scenario is that consumer requests are
coming in via stateless HTTP requests (each request is standalone and
specifies topics+partitions+offsets to read data from) and I need to find a
good way to do connection pooling to the Kafka backend for good
performance. The SimpleConsumer would allow me to do that, an approach with
the new Kafka 0.9 consumer API seems to have a lot more overhead.

Basically, what I am looking for is a way to pool connections per Kafka
broker host, independent of the topics/partitions/clients/..., so each
Tomcat app server would keep N disjunctive connection pools, if I have N
Kafka broker hosts.
I would then keep some central metadata which tells me which hosts are the
leaders for which topic+partition and for an incoming HTTP client request
I'd just take a Kafka connection from the pool for that particular broker
host, request the data and return the connection to the pool. This means
that a Kafka broker host will get requests from lots of different end
consumers via the same TCP connection (sequentially of course).

With the new Kafka consumer API I would have to subscribe/unsubscribe from
topics every time I take a connection from the pool and as the request may
need go to a different broker host than the last one, that wouldn't even
prevent all the connection/reconnection overhead. I guess I could create
one dedicated connection pool per topic-partition, that way
connection/reconnection overhead should be minimized, but that way I'd end
up with hundreds of connection pools per app server, also not a good
approach.
All in all, the planned design of the new consumer API just doesn't seem
to fit my use case well. Which is why I am a bit anxious about the
SimpleConsumer API being deprecated.

Or am I missing something here? Thanks!

Greetings
Valentin

> On Mon, Sep 22, 2014 at 8:10 AM, Valentin <ka...@sblk.de> wrote:
> 
>>
>> Hello,
>>
>> I am currently working on a Kafka implementation and have a couple of
>> questions concerning the road map for the future.
>> As I am unsure where to put such questions, I decided to try my luck on
>> this mailing list. If this is the wrong place for such inquiries, I
>> apologize. In this case it would be great if someone could offer some
>> pointers as to where to find/get these answers.
>>
>> So, here I go :)
>>
>> 1) Consumer Redesign in Kafka 0.9
>> I found a number of documents explaining planned changes to the
consumer
>> APIs for Kafka version 0.9. However, these documents are only
mentioning
>> the high level consumer implementations. Does anyone know if the
>> kafka.javaapi.consumer.SimpleConsumer API/implementation will also
change
>> with 0.9? Or will that stay more or less as it is now?
>>
>> 2) Pooling of Kafka Connections - SimpleConsumer
>> As I have a use case where the connection between the final consumers
and
>> Kafka needs to happen via HTTP, I am concerned about performance
>> implications of the required HTTP wrapping. I am planning to implement
a
>> custom HTTP API for Kafka producers and consumers which will be
stateless
>> and where offset tracking will be done on the final consumer side. Now
>> the
>> question here would be whether anyone has made experiences with pooling
>> connections to Kafka brokers in order to reuse them effectively for
>> incoming, stateless HTTP REST calls. An idea here would be to have one
>> connection pool per broker host and to keep a set of open
>> consumers/connections for each broker in those pools. Once I know which
>> broker is the leader for a requested topic partition for a REST call, I
>> could then use an already existing consumer/connection from that pool
for
>> the processing of that REST call and then return it to the pool. So I'd
>> be
>> able to have completely stateless REST call handling without having to
>> open/close Kafka connections all the time.
>>
>> 3) Pooling of Kafka Connections - KafkaConsumer (Kafka 0.9)
>> Now let's assume I want to implement the idea from 2) but with the high
>> level KafkaConsumer (to leave identifications of partition leaders and
>> error handling to it). Are already any implementation details
>> known/decided
>> on how the subscribe, unsubscribe and seek methods will work
internally?
>> Would I be able to somehow reuse connected KafkaConsumer objects in
>> connection pools? Could I for example call subscribe/unsubscribe/seek
for
>> each HTTP request on a consumer to switch topics/partitions to the
>> currently needed set or would this be a very expensive operation (i.e.
>> because it would fetch metadata from Kafka to identify the leader for
>> each
>> partition)?
>>
>> Greetings
>> Valentin
>>

Re: Questions about Kafka 0.9 API changes

Posted by Jun Rao <ju...@gmail.com>.

The new consumer api will also allow you to do what you want in a
SimpleConsumer (e.g., subscribe to a static set of partitions, control
initial offsets, etc), only more conveniently.

Thanks,

Jun

On Mon, Sep 22, 2014 at 8:10 AM, Valentin <ka...@sblk.de> wrote:

>
> Hello,
>
> I am currently working on a Kafka implementation and have a couple of
> questions concerning the road map for the future.
> As I am unsure where to put such questions, I decided to try my luck on
> this mailing list. If this is the wrong place for such inquiries, I
> apologize. In this case it would be great if someone could offer some
> pointers as to where to find/get these answers.
>
> So, here I go :)
>
> 1) Consumer Redesign in Kafka 0.9
> I found a number of documents explaining planned changes to the consumer
> APIs for Kafka version 0.9. However, these documents are only mentioning
> the high level consumer implementations. Does anyone know if the
> kafka.javaapi.consumer.SimpleConsumer API/implementation will also change
> with 0.9? Or will that stay more or less as it is now?
>
> 2) Pooling of Kafka Connections - SimpleConsumer
> As I have a use case where the connection between the final consumers and
> Kafka needs to happen via HTTP, I am concerned about performance
> implications of the required HTTP wrapping. I am planning to implement a
> custom HTTP API for Kafka producers and consumers which will be stateless
> and where offset tracking will be done on the final consumer side. Now the
> question here would be whether anyone has made experiences with pooling
> connections to Kafka brokers in order to reuse them effectively for
> incoming, stateless HTTP REST calls. An idea here would be to have one
> connection pool per broker host and to keep a set of open
> consumers/connections for each broker in those pools. Once I know which
> broker is the leader for a requested topic partition for a REST call, I
> could then use an already existing consumer/connection from that pool for
> the processing of that REST call and then return it to the pool. So I'd be
> able to have completely stateless REST call handling without having to
> open/close Kafka connections all the time.
>
> 3) Pooling of Kafka Connections - KafkaConsumer (Kafka 0.9)
> Now let's assume I want to implement the idea from 2) but with the high
> level KafkaConsumer (to leave identifications of partition leaders and
> error handling to it). Are already any implementation details known/decided
> on how the subscribe, unsubscribe and seek methods will work internally?
> Would I be able to somehow reuse connected KafkaConsumer objects in
> connection pools? Could I for example call subscribe/unsubscribe/seek for
> each HTTP request on a consumer to switch topics/partitions to the
> currently needed set or would this be a very expensive operation (i.e.
> because it would fetch metadata from Kafka to identify the leader for each
> partition)?
>
> Greetings
> Valentin
>