You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Neha Narkhede <ne...@gmail.com> on 2012/06/12 01:52:26 UTC

Consumer re-design proposal

Hi,

Over the past few months, we've received quite a lot of feedback on the
consumer side features and design. Some of them are improvements to the
current consumer design and some are simply new feature/API requests. I
have attempted to write up the requirements that I've heard on this wiki -
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design

This would involve some significant changes to the consumer APIs, so we
would like to collect feedback on the proposal from our community. Since
the list of changes is not small, we would like to understand if some
features are preferred over others, and more importantly, if some features
are not required at all.

Since some part of this proposal is experimental and the consumer side
changes are non-trivial, we would like this initiative to not interfere
with the forthcoming replication release. However, it will be good to have
people from the community give this some thought and help out with the
JIRAs if interested. One way of managing this project could be creating a
separate branch from the kafka trunk and continue development on it. Once
it is ready and in good shape, we can think about cutting another release
(after 0.8) for the releasing the new consumer API. Do people have
preferences/concerns regarding creating a separate branch for this project ?

Please feel free to start a discussion on this JIRA -
https://issues.apache.org/jira/browse/KAFKA-364

Thanks,

Neha

Re: Consumer re-design proposal

Posted by Jun Rao <ju...@gmail.com>.
If nobody objects, we can create a separate consumer redesign branch. This
way, everyone can see the changes and progress.

Thanks,

Jun

On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <ne...@gmail.com>wrote:

> Hi,
>
> Over the past few months, we've received quite a lot of feedback on the
> consumer side features and design. Some of them are improvements to the
> current consumer design and some are simply new feature/API requests. I
> have attempted to write up the requirements that I've heard on this wiki -
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
>
> This would involve some significant changes to the consumer APIs, so we
> would like to collect feedback on the proposal from our community. Since
> the list of changes is not small, we would like to understand if some
> features are preferred over others, and more importantly, if some features
> are not required at all.
>
> Since some part of this proposal is experimental and the consumer side
> changes are non-trivial, we would like this initiative to not interfere
> with the forthcoming replication release. However, it will be good to have
> people from the community give this some thought and help out with the
> JIRAs if interested. One way of managing this project could be creating a
> separate branch from the kafka trunk and continue development on it. Once
> it is ready and in good shape, we can think about cutting another release
> (after 0.8) for the releasing the new consumer API. Do people have
> preferences/concerns regarding creating a separate branch for this project
> ?
>
> Please feel free to start a discussion on this JIRA -
> https://issues.apache.org/jira/browse/KAFKA-364
>
> Thanks,
>
> Neha
>

Re: Consumer re-design proposal

Posted by Ross Black <ro...@gmail.com>.
I added a couple of comments to the issue
https://issues.apache.org/jira/browse/KAFKA-364
(I was not certain whether you wanted comments on the mailing list, the
wiki page, or the issue?)

Thanks,
Ross

Re: Consumer re-design proposal

Posted by Neha Narkhede <ne...@gmail.com>.
Dave,

> Say I start up my producers and consumers with a static list of hosts
> A,B,C.  Over time, I add host D, E, F.  D becomes the eventual new
> coordinator.  But the clients have an in-memory config which knows
> only of A, B, and C.  Now host D dies.  What happens (especially to
> clients who have never been told about/connected to E or F)?

The consumers would need only one of the brokers in its config to be
alive. It talks to one of the live brokers in the boostrap brokers
list and finds out who the co-ordinator is, and then communicates with
the co-ordinator.
So, in your example, unless you take A, B AND C offline, it shouldn't
be a problem. And if you need to do that, you can rolling restart the
consumers. Maybe there is another easier way to solve the bootstrap
problem,
this is just one of the solutions.

>> > Also is this proposal going to address the ability to drain a broker?
> Say I have a broker with old data that I want to keep around so you
> can reconsume old data, but I don't want anyone producing to it
> anymore.

This proposal is only dealing with Consumer API/protocol redesign.
What you want is the ability to decommission a broker gracefully. I
suggest you add your comments/requirements to this JIRA -
https://issues.apache.org/jira/browse/KAFKA-155

Also, how about moving your suggestions/comments to the JIRA/wiki ?
Email is not quite the easiest way to keep track of this discussion.

Thanks,
Neha

On Wed, Jun 20, 2012 at 12:11 PM, Dave Barr <da...@gmail.com> wrote:
> Yes, which is why I asked how discovery will work.
>
> Proposal II reads a bit handwavy.
>
> "On startup, the consumer is informed with a list of the brokers."
>
> From where?  From whom?  I have to provide some mechanism now to feed
> this config to my consumers?  What happens when this changes?
>
> Say I start up my producers and consumers with a static list of hosts
> A,B,C.  Over time, I add host D, E, F.  D becomes the eventual new
> coordinator.  But the clients have an in-memory config which knows
> only of A, B, and C.  Now host D dies.  What happens (especially to
> clients who have never been told about/connected to E or F)?
>
> Also is this proposal going to address the ability to drain a broker?
> Say I have a broker with old data that I want to keep around so you
> can reconsume old data, but I don't want anyone producing to it
> anymore.
>
> --Dave
> On Wed, Jun 20, 2012 at 7:48 AM, Jun Rao <ju...@gmail.com> wrote:
>> Dave,
>>
>> Just to clarify. We are not removing ZK completely. The proposal is just
>> trying to see if we can remove the ZK dependency on the consumer client. ZK
>> will still be used at the broker.
>>
>> Thanks,
>>
>> Jun
>>
>> On Tue, Jun 19, 2012 at 10:49 PM, Dave Barr <da...@gmail.com> wrote:
>>
>>> On Tue, Jun 19, 2012 at 10:09 AM, Neha Narkhede <ne...@gmail.com>
>>> wrote:
>>> > One of the goals of thinning the Kafka consumer client is removing the
>>> > zookeeper client from the consumer. Without this, Kafka consumer
>>> > client would depend on the stability of a zookeeper client.
>>>
>>> If there's a stability issue with the zookeeper client, then that
>>> should be addressed.
>>>
>>> ZK is a fine tool for service discovery and coordination.  It seems
>>> like any new system that forced me, as a consumer, to use yet another
>>> system to bootstrap and discover where my brokers are for a topic
>>> would be a step backward.
>>>
>>> I'm curious, why, specifically, is removing ZK a design goal
>>> (especially when it's such a core component of the broker)?  I think
>>> of other projects, like HBase, which seem to have no issue with using
>>> ZK in their client.
>>>
>>> --Dave
>>>

Re: Consumer re-design proposal

Posted by Dave Barr <da...@gmail.com>.
Yes, which is why I asked how discovery will work.

Proposal II reads a bit handwavy.

"On startup, the consumer is informed with a list of the brokers."

>From where?  From whom?  I have to provide some mechanism now to feed
this config to my consumers?  What happens when this changes?

Say I start up my producers and consumers with a static list of hosts
A,B,C.  Over time, I add host D, E, F.  D becomes the eventual new
coordinator.  But the clients have an in-memory config which knows
only of A, B, and C.  Now host D dies.  What happens (especially to
clients who have never been told about/connected to E or F)?

Also is this proposal going to address the ability to drain a broker?
Say I have a broker with old data that I want to keep around so you
can reconsume old data, but I don't want anyone producing to it
anymore.

--Dave
On Wed, Jun 20, 2012 at 7:48 AM, Jun Rao <ju...@gmail.com> wrote:
> Dave,
>
> Just to clarify. We are not removing ZK completely. The proposal is just
> trying to see if we can remove the ZK dependency on the consumer client. ZK
> will still be used at the broker.
>
> Thanks,
>
> Jun
>
> On Tue, Jun 19, 2012 at 10:49 PM, Dave Barr <da...@gmail.com> wrote:
>
>> On Tue, Jun 19, 2012 at 10:09 AM, Neha Narkhede <ne...@gmail.com>
>> wrote:
>> > One of the goals of thinning the Kafka consumer client is removing the
>> > zookeeper client from the consumer. Without this, Kafka consumer
>> > client would depend on the stability of a zookeeper client.
>>
>> If there's a stability issue with the zookeeper client, then that
>> should be addressed.
>>
>> ZK is a fine tool for service discovery and coordination.  It seems
>> like any new system that forced me, as a consumer, to use yet another
>> system to bootstrap and discover where my brokers are for a topic
>> would be a step backward.
>>
>> I'm curious, why, specifically, is removing ZK a design goal
>> (especially when it's such a core component of the broker)?  I think
>> of other projects, like HBase, which seem to have no issue with using
>> ZK in their client.
>>
>> --Dave
>>

Re: Consumer re-design proposal

Posted by Chris Burroughs <ch...@gmail.com>.
On 06/20/2012 11:25 AM, Taylor Gautier wrote:
> Fwiw I think this is the right move. We don't use ZK in our Kafka installation.

How do the consumers know who the brokers are?  List in a config file?

Re: Consumer re-design proposal

Posted by Neha Narkhede <ne...@gmail.com>.
The proposal is to move the rebalancing logic out of the consumer to
the broker side. The consumer client merely connects to the consumer
co-ordinator (on the Kafka brokers) and fetches data from the assigned
set of partitions. We have a high level design here -
https://cwiki.apache.org/confluence/display/KAFKA/Central+Consumer+Coordination#CentralConsumerCoordination-ProposalII

Thanks,
Neha

On Wed, Jun 20, 2012 at 9:36 AM, Evan Chan <ev...@ooyala.com> wrote:
> How would you accomplish automatic rebalancing without ZK?
>
> On Wed, Jun 20, 2012 at 8:25 AM, Taylor Gautier <tg...@tagged.com> wrote:
>
>> Fwiw I think this is the right move. We don't use ZK in our Kafka
>> installation.
>>
>> One reason is that we do not want or need the complexity of having the
>> broker distribution in the clients.
>>
>> Moving towards this design in the core would help our installation be
>> more "standard".
>>
>>
>>
>> On Jun 20, 2012, at 7:48 AM, Jun Rao <ju...@gmail.com> wrote:
>>
>> > Dave,
>> >
>> > Just to clarify. We are not removing ZK completely. The proposal is just
>> > trying to see if we can remove the ZK dependency on the consumer client.
>> ZK
>> > will still be used at the broker.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Tue, Jun 19, 2012 at 10:49 PM, Dave Barr <da...@gmail.com> wrote:
>> >
>> >> On Tue, Jun 19, 2012 at 10:09 AM, Neha Narkhede <
>> neha.narkhede@gmail.com>
>> >> wrote:
>> >>> One of the goals of thinning the Kafka consumer client is removing the
>> >>> zookeeper client from the consumer. Without this, Kafka consumer
>> >>> client would depend on the stability of a zookeeper client.
>> >>
>> >> If there's a stability issue with the zookeeper client, then that
>> >> should be addressed.
>> >>
>> >> ZK is a fine tool for service discovery and coordination.  It seems
>> >> like any new system that forced me, as a consumer, to use yet another
>> >> system to bootstrap and discover where my brokers are for a topic
>> >> would be a step backward.
>> >>
>> >> I'm curious, why, specifically, is removing ZK a design goal
>> >> (especially when it's such a core component of the broker)?  I think
>> >> of other projects, like HBase, which seem to have no issue with using
>> >> ZK in their client.
>> >>
>> >> --Dave
>> >>
>>
>
>
>
> --
> --
> *Evan Chan*
> Senior Software Engineer |
> ev@ooyala.com | (650) 996-4600
> www.ooyala.com | blog <http://www.ooyala.com/blog> |
> @ooyala<http://www.twitter.com/ooyala>

Re: Consumer re-design proposal

Posted by Evan Chan <ev...@ooyala.com>.
How would you accomplish automatic rebalancing without ZK?

On Wed, Jun 20, 2012 at 8:25 AM, Taylor Gautier <tg...@tagged.com> wrote:

> Fwiw I think this is the right move. We don't use ZK in our Kafka
> installation.
>
> One reason is that we do not want or need the complexity of having the
> broker distribution in the clients.
>
> Moving towards this design in the core would help our installation be
> more "standard".
>
>
>
> On Jun 20, 2012, at 7:48 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > Dave,
> >
> > Just to clarify. We are not removing ZK completely. The proposal is just
> > trying to see if we can remove the ZK dependency on the consumer client.
> ZK
> > will still be used at the broker.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Jun 19, 2012 at 10:49 PM, Dave Barr <da...@gmail.com> wrote:
> >
> >> On Tue, Jun 19, 2012 at 10:09 AM, Neha Narkhede <
> neha.narkhede@gmail.com>
> >> wrote:
> >>> One of the goals of thinning the Kafka consumer client is removing the
> >>> zookeeper client from the consumer. Without this, Kafka consumer
> >>> client would depend on the stability of a zookeeper client.
> >>
> >> If there's a stability issue with the zookeeper client, then that
> >> should be addressed.
> >>
> >> ZK is a fine tool for service discovery and coordination.  It seems
> >> like any new system that forced me, as a consumer, to use yet another
> >> system to bootstrap and discover where my brokers are for a topic
> >> would be a step backward.
> >>
> >> I'm curious, why, specifically, is removing ZK a design goal
> >> (especially when it's such a core component of the broker)?  I think
> >> of other projects, like HBase, which seem to have no issue with using
> >> ZK in their client.
> >>
> >> --Dave
> >>
>



-- 
--
*Evan Chan*
Senior Software Engineer |
ev@ooyala.com | (650) 996-4600
www.ooyala.com | blog <http://www.ooyala.com/blog> |
@ooyala<http://www.twitter.com/ooyala>

Re: Consumer re-design proposal

Posted by Taylor Gautier <tg...@tagged.com>.
Fwiw I think this is the right move. We don't use ZK in our Kafka installation.

One reason is that we do not want or need the complexity of having the
broker distribution in the clients.

Moving towards this design in the core would help our installation be
more "standard".



On Jun 20, 2012, at 7:48 AM, Jun Rao <ju...@gmail.com> wrote:

> Dave,
>
> Just to clarify. We are not removing ZK completely. The proposal is just
> trying to see if we can remove the ZK dependency on the consumer client. ZK
> will still be used at the broker.
>
> Thanks,
>
> Jun
>
> On Tue, Jun 19, 2012 at 10:49 PM, Dave Barr <da...@gmail.com> wrote:
>
>> On Tue, Jun 19, 2012 at 10:09 AM, Neha Narkhede <ne...@gmail.com>
>> wrote:
>>> One of the goals of thinning the Kafka consumer client is removing the
>>> zookeeper client from the consumer. Without this, Kafka consumer
>>> client would depend on the stability of a zookeeper client.
>>
>> If there's a stability issue with the zookeeper client, then that
>> should be addressed.
>>
>> ZK is a fine tool for service discovery and coordination.  It seems
>> like any new system that forced me, as a consumer, to use yet another
>> system to bootstrap and discover where my brokers are for a topic
>> would be a step backward.
>>
>> I'm curious, why, specifically, is removing ZK a design goal
>> (especially when it's such a core component of the broker)?  I think
>> of other projects, like HBase, which seem to have no issue with using
>> ZK in their client.
>>
>> --Dave
>>

Re: Consumer re-design proposal

Posted by Jun Rao <ju...@gmail.com>.
Dave,

Just to clarify. We are not removing ZK completely. The proposal is just
trying to see if we can remove the ZK dependency on the consumer client. ZK
will still be used at the broker.

Thanks,

Jun

On Tue, Jun 19, 2012 at 10:49 PM, Dave Barr <da...@gmail.com> wrote:

> On Tue, Jun 19, 2012 at 10:09 AM, Neha Narkhede <ne...@gmail.com>
> wrote:
> > One of the goals of thinning the Kafka consumer client is removing the
> > zookeeper client from the consumer. Without this, Kafka consumer
> > client would depend on the stability of a zookeeper client.
>
> If there's a stability issue with the zookeeper client, then that
> should be addressed.
>
> ZK is a fine tool for service discovery and coordination.  It seems
> like any new system that forced me, as a consumer, to use yet another
> system to bootstrap and discover where my brokers are for a topic
> would be a step backward.
>
> I'm curious, why, specifically, is removing ZK a design goal
> (especially when it's such a core component of the broker)?  I think
> of other projects, like HBase, which seem to have no issue with using
> ZK in their client.
>
> --Dave
>

Re: Consumer re-design proposal

Posted by Ross Black <ro...@gmail.com>.
I added a couple of comments to the issue
https://issues.apache.org/jira/browse/KAFKA-364
(I was not certain whether you wanted comments on the mailing list, the
wiki page, or the issue?)

Thanks,
Ross

Re: Consumer re-design proposal

Posted by Dave Barr <da...@gmail.com>.
On Tue, Jun 19, 2012 at 10:09 AM, Neha Narkhede <ne...@gmail.com> wrote:
> One of the goals of thinning the Kafka consumer client is removing the
> zookeeper client from the consumer. Without this, Kafka consumer
> client would depend on the stability of a zookeeper client.

If there's a stability issue with the zookeeper client, then that
should be addressed.

ZK is a fine tool for service discovery and coordination.  It seems
like any new system that forced me, as a consumer, to use yet another
system to bootstrap and discover where my brokers are for a topic
would be a step backward.

I'm curious, why, specifically, is removing ZK a design goal
(especially when it's such a core component of the broker)?  I think
of other projects, like HBase, which seem to have no issue with using
ZK in their client.

--Dave

Re: Consumer re-design proposal

Posted by Dave Barr <da...@gmail.com>.
On Tue, Jun 19, 2012 at 10:09 AM, Neha Narkhede <ne...@gmail.com> wrote:
> One of the goals of thinning the Kafka consumer client is removing the
> zookeeper client from the consumer. Without this, Kafka consumer
> client would depend on the stability of a zookeeper client.

If there's a stability issue with the zookeeper client, then that
should be addressed.

ZK is a fine tool for service discovery and coordination.  It seems
like any new system that forced me, as a consumer, to use yet another
system to bootstrap and discover where my brokers are for a topic
would be a step backward.

I'm curious, why, specifically, is removing ZK a design goal
(especially when it's such a core component of the broker)?  I think
of other projects, like HBase, which seem to have no issue with using
ZK in their client.

--Dave

Re: Consumer re-design proposal

Posted by Neha Narkhede <ne...@gmail.com>.
Chris,

One of the goals of thinning the Kafka consumer client is removing the
zookeeper client from the consumer. Without this, Kafka consumer
client would depend on the stability of a zookeeper client.

>> For a consumer that wants "Manual partition assignment" and "Manual offset management", what does the proposed offer over the existing SimpleConsumer?

Right now, we have 2 Kafka consumer clients, some functionality is
possible in one but not the other. Some users have requested features
that would require some combination of the functionalities offered by
the two consumer clients. We think it might be a good idea to collect
feedback and try to design a single consumer client API that satisfies
these requirements. But it's unclear if this is quite the right
solution.

We will be writing up some concrete API/protocol proposal soon. I will
send it around for more detailed feedback.

Thanks,
Neha

On Mon, Jun 18, 2012 at 7:00 PM, Chris Burroughs
<ch...@gmail.com> wrote:
> On 06/12/2012 12:59 PM, Jay Kreps wrote:
>>    2. Try to replace the "simple consumer" and "high level consumer" with a
>>    single, general interface that has all the advantages of both.
>
> I've read through the wiki pages but think I'm missing the forrest for
> the trees.
>
> For a consumer that wants "Manual partition assignment" and "Manual
> offset management", what does the proposed offer over the existing
> SimpleConsumer?

Re: Consumer re-design proposal

Posted by Neha Narkhede <ne...@gmail.com>.
Chris,

One of the goals of thinning the Kafka consumer client is removing the
zookeeper client from the consumer. Without this, Kafka consumer
client would depend on the stability of a zookeeper client.

>> For a consumer that wants "Manual partition assignment" and "Manual offset management", what does the proposed offer over the existing SimpleConsumer?

Right now, we have 2 Kafka consumer clients, some functionality is
possible in one but not the other. Some users have requested features
that would require some combination of the functionalities offered by
the two consumer clients. We think it might be a good idea to collect
feedback and try to design a single consumer client API that satisfies
these requirements. But it's unclear if this is quite the right
solution.

We will be writing up some concrete API/protocol proposal soon. I will
send it around for more detailed feedback.

Thanks,
Neha

On Mon, Jun 18, 2012 at 7:00 PM, Chris Burroughs
<ch...@gmail.com> wrote:
> On 06/12/2012 12:59 PM, Jay Kreps wrote:
>>    2. Try to replace the "simple consumer" and "high level consumer" with a
>>    single, general interface that has all the advantages of both.
>
> I've read through the wiki pages but think I'm missing the forrest for
> the trees.
>
> For a consumer that wants "Manual partition assignment" and "Manual
> offset management", what does the proposed offer over the existing
> SimpleConsumer?

Re: Consumer re-design proposal

Posted by Chris Burroughs <ch...@gmail.com>.
+list

The sentiment makes sense.  I just know with the people I talked to the
(arguably deceptive) simplicity of the SimpleConsumer was a significant
selling point.

On 2012-06-19 01:18, Jay Kreps wrote:
> I think the hope was to have a single API and make each piece of
> functionality optional. E.g. you could have manual offset management OR you
> could have manual partition assignment OR both. You are right that if you
> disable everything you are left with something not much better than the
> current simple consumer except that it would support leadership changes (a
> la replication). But somehow having a single public API seems like a good
> practice just to have a single place where monitoring, testing, etc.
> 
> -Jay
> 
> On Mon, Jun 18, 2012 at 7:00 PM, Chris Burroughs
> <ch...@gmail.com>wrote:
> 
>> On 06/12/2012 12:59 PM, Jay Kreps wrote:
>>>    2. Try to replace the "simple consumer" and "high level consumer"
>> with a
>>>    single, general interface that has all the advantages of both.
>>
>> I've read through the wiki pages but think I'm missing the forrest for
>> the trees.
>>
>> For a consumer that wants "Manual partition assignment" and "Manual
>> offset management", what does the proposed offer over the existing
>> SimpleConsumer?
>>
> 


Re: Consumer re-design proposal

Posted by Chris Burroughs <ch...@gmail.com>.
On 06/12/2012 12:59 PM, Jay Kreps wrote:
>    2. Try to replace the "simple consumer" and "high level consumer" with a
>    single, general interface that has all the advantages of both.

I've read through the wiki pages but think I'm missing the forrest for
the trees.

For a consumer that wants "Manual partition assignment" and "Manual
offset management", what does the proposed offer over the existing
SimpleConsumer?

Re: Consumer re-design proposal

Posted by Chris Burroughs <ch...@gmail.com>.
On 06/18/2012 06:29 PM, Jay Kreps wrote:
> This would definitely be good to have. As Joel says we are thinking
> about re-factoring the consumer protocol to make it much thinner to help
> enable non-java consumers much more easily (i.e. get rid of zookeeper and
> other things).

This isn't removing ZK from Kafka, but rather having the clients only
interact with a broker side API, right?

Re: Consumer re-design proposal

Posted by Jay Kreps <ja...@gmail.com>.
This would definitely be good to have. As Joel says we are thinking
about re-factoring the consumer protocol to make it much thinner to help
enable non-java consumers much more easily (i.e. get rid of zookeeper and
other things).

There is a very brief write up on this here:
https://cwiki.apache.org/confluence/display/KAFKA/Central+Consumer+Coordination

Take a look and let us know what you think. It might make sense to wait
until this is done before adding consumer clients.

-Jay

On Mon, Jun 18, 2012 at 1:41 PM, Joel Koshy <jj...@gmail.com> wrote:

> That's true - I think that's one of the major motivations of the consumer
> re-design. Right now, the consumer implementation is very thick which makes
> it difficult to maintain correct implementations across multiple languages.
> It will be much easier to implement a consumer with the thinner logic - and
> as you pointed out, many languages have pretty good bindings with native C
> libraries so technically we would go pretty far with just a JVM and native
> (C) implementation of the consumer logic.
>
> Joel
>
> On Mon, Jun 18, 2012 at 11:40 AM, Sybrandy, Casey <
> Casey.Sybrandy@six3systems.com> wrote:
>
> > Would porting the consumer/producer code to C be a good idea?  I say this
> > because at least with most languages I know of, leveraging a C library is
> > pretty easy.  This way, you would have to maintain only the C library and
> > others can make/maintain wrappers for their languages.  Having to port to
> > other languages is going to cause you to have a significant amount of
> > maintenance if you change the protocol in the future.
> >
> > ________________________________________
> > From: Neha Narkhede [neha.narkhede@gmail.com]
> > Sent: Thursday, June 14, 2012 5:53 PM
> > To: kafka-users@incubator.apache.org
> > Cc: kafka-dev@incubator.apache.org
> > Subject: Re: Consumer re-design proposal
> >
> > Thanks for the feedback ! I moved it to
> > https://issues.apache.org/jira/browse/KAFKA-364, so that we can keep
> track
> > of these.
> >
> > -Neha
> >
> > On Thu, Jun 14, 2012 at 2:45 PM, Marcos Juarez <mj...@gmail.com>
> wrote:
> >
> > > Throwing a +1 on "Allow the consumer to reset its offset to some
> > arbitrary
> > > value, and then write that offset into ZK".
> > >
> > > We're currently running into a scenario where we would like to have
> 100%
> > > reliability, and we're losing a few messages when a connection is
> broken,
> > > but there were still a few messages in the OS TCP buffers. So, we're
> > > planning on shifting the ZK offset by a few seconds "back in time" if
> we
> > > detect a broker has gone down, to make sure all the messages will be
> > > actually delivered to the end consumer when that broker comes back up,
> > even
> > > if there's a small amount of overlapping messages.
> > >
> > > Thanks,
> > >
> > > Marcos
> > >
> > >
> > > On Jun 14, 2012, at 2:39 PM, Evan Chan wrote:
> > >
> > > > I would like to throw in a couple use cases:
> > > >
> > > >
> > > >   - Allow the new consumer to reset its offset to either the current
> > > >   largest or smallest.  This would be a great way to restart a
> process
> > > that
> > > >   has fallen behind.  The only way I know how to do this today, with
> > the
> > > >   high-level consumer, is to delete the ZK nodes manually and restart
> > the
> > > >   consumer.
> > > >   - Allow the consumer to reset its offset to some arbitrary value,
> and
> > > >   then write that offset into ZK.    Kind of like the first case, but
> > > would
> > > >   make rewinding/replays much easier.
> > > >
> > > > Modularity (the ability to layer the ZK infrastructure on top of the
> > > simple
> > > > interface) would be great.
> > > >
> > > > thanks,
> > > > Evan
> > > >
> > > >
> > > > On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <ja...@gmail.com>
> > wrote:
> > > >
> > > >> This is a great summary Neha. It would be good to get people's
> > feedback
> > > on
> > > >> this since we don't want to keep breaking api and
> > > >> protocol compatibility here, so the hope is to really get it right
> > this
> > > >> time now that we have really seen all the use cases and live with
> the
> > > >> output for a while. I think the consumer design is a pretty hard
> > > protocol
> > > >> and API design problem, so its fun to think about.
> > > >>
> > > >> If I were to summarize Neha's requirements list, I think there are
> > three
> > > >> high-level goals:
> > > >>
> > > >>  1. Simplify the consumer protocol to enable ease of development of
> > > >>  consumer clients in other languages
> > > >>  2. Try to replace the "simple consumer" and "high level consumer"
> > with
> > > a
> > > >>  single, general interface that has all the advantages of both.
> > > >>  3. Support a bunch of use cases that either we didn't think of, or
> > that
> > > >>  weren't possible in the partitioning model of the pre-0.8 code
> base.
> > > >>
> > > >> -Jay
> > > >>
> > > >>
> > > >> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <
> > neha.narkhede@gmail.com
> > > >>> wrote:
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>> Over the past few months, we've received quite a lot of feedback on
> > the
> > > >>> consumer side features and design. Some of them are improvements to
> > the
> > > >>> current consumer design and some are simply new feature/API
> > requests. I
> > > >>> have attempted to write up the requirements that I've heard on this
> > > wiki
> > > >> -
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
> > > >>>
> > > >>> This would involve some significant changes to the consumer APIs,
> so
> > we
> > > >>> would like to collect feedback on the proposal from our community.
> > > Since
> > > >>> the list of changes is not small, we would like to understand if
> some
> > > >>> features are preferred over others, and more importantly, if some
> > > >> features
> > > >>> are not required at all.
> > > >>>
> > > >>> Since some part of this proposal is experimental and the consumer
> > side
> > > >>> changes are non-trivial, we would like this initiative to not
> > interfere
> > > >>> with the forthcoming replication release. However, it will be good
> to
> > > >> have
> > > >>> people from the community give this some thought and help out with
> > the
> > > >>> JIRAs if interested. One way of managing this project could be
> > > creating a
> > > >>> separate branch from the kafka trunk and continue development on
> it.
> > > Once
> > > >>> it is ready and in good shape, we can think about cutting another
> > > release
> > > >>> (after 0.8) for the releasing the new consumer API. Do people have
> > > >>> preferences/concerns regarding creating a separate branch for this
> > > >> project
> > > >>> ?
> > > >>>
> > > >>> Please feel free to start a discussion on this JIRA -
> > > >>> https://issues.apache.org/jira/browse/KAFKA-364
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> Neha
> > > >>>
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > --
> > > > *Evan Chan*
> > > > Senior Software Engineer |
> > > > ev@ooyala.com | (650) 996-4600
> > > > www.ooyala.com | blog <http://www.ooyala.com/blog> |
> > > > @ooyala<http://www.twitter.com/ooyala>
> > >
> > >
> >
>

Re: Consumer re-design proposal

Posted by Joel Koshy <jj...@gmail.com>.
That's true - I think that's one of the major motivations of the consumer
re-design. Right now, the consumer implementation is very thick which makes
it difficult to maintain correct implementations across multiple languages.
It will be much easier to implement a consumer with the thinner logic - and
as you pointed out, many languages have pretty good bindings with native C
libraries so technically we would go pretty far with just a JVM and native
(C) implementation of the consumer logic.

Joel

On Mon, Jun 18, 2012 at 11:40 AM, Sybrandy, Casey <
Casey.Sybrandy@six3systems.com> wrote:

> Would porting the consumer/producer code to C be a good idea?  I say this
> because at least with most languages I know of, leveraging a C library is
> pretty easy.  This way, you would have to maintain only the C library and
> others can make/maintain wrappers for their languages.  Having to port to
> other languages is going to cause you to have a significant amount of
> maintenance if you change the protocol in the future.
>
> ________________________________________
> From: Neha Narkhede [neha.narkhede@gmail.com]
> Sent: Thursday, June 14, 2012 5:53 PM
> To: kafka-users@incubator.apache.org
> Cc: kafka-dev@incubator.apache.org
> Subject: Re: Consumer re-design proposal
>
> Thanks for the feedback ! I moved it to
> https://issues.apache.org/jira/browse/KAFKA-364, so that we can keep track
> of these.
>
> -Neha
>
> On Thu, Jun 14, 2012 at 2:45 PM, Marcos Juarez <mj...@gmail.com> wrote:
>
> > Throwing a +1 on "Allow the consumer to reset its offset to some
> arbitrary
> > value, and then write that offset into ZK".
> >
> > We're currently running into a scenario where we would like to have 100%
> > reliability, and we're losing a few messages when a connection is broken,
> > but there were still a few messages in the OS TCP buffers. So, we're
> > planning on shifting the ZK offset by a few seconds "back in time" if we
> > detect a broker has gone down, to make sure all the messages will be
> > actually delivered to the end consumer when that broker comes back up,
> even
> > if there's a small amount of overlapping messages.
> >
> > Thanks,
> >
> > Marcos
> >
> >
> > On Jun 14, 2012, at 2:39 PM, Evan Chan wrote:
> >
> > > I would like to throw in a couple use cases:
> > >
> > >
> > >   - Allow the new consumer to reset its offset to either the current
> > >   largest or smallest.  This would be a great way to restart a process
> > that
> > >   has fallen behind.  The only way I know how to do this today, with
> the
> > >   high-level consumer, is to delete the ZK nodes manually and restart
> the
> > >   consumer.
> > >   - Allow the consumer to reset its offset to some arbitrary value, and
> > >   then write that offset into ZK.    Kind of like the first case, but
> > would
> > >   make rewinding/replays much easier.
> > >
> > > Modularity (the ability to layer the ZK infrastructure on top of the
> > simple
> > > interface) would be great.
> > >
> > > thanks,
> > > Evan
> > >
> > >
> > > On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <ja...@gmail.com>
> wrote:
> > >
> > >> This is a great summary Neha. It would be good to get people's
> feedback
> > on
> > >> this since we don't want to keep breaking api and
> > >> protocol compatibility here, so the hope is to really get it right
> this
> > >> time now that we have really seen all the use cases and live with the
> > >> output for a while. I think the consumer design is a pretty hard
> > protocol
> > >> and API design problem, so its fun to think about.
> > >>
> > >> If I were to summarize Neha's requirements list, I think there are
> three
> > >> high-level goals:
> > >>
> > >>  1. Simplify the consumer protocol to enable ease of development of
> > >>  consumer clients in other languages
> > >>  2. Try to replace the "simple consumer" and "high level consumer"
> with
> > a
> > >>  single, general interface that has all the advantages of both.
> > >>  3. Support a bunch of use cases that either we didn't think of, or
> that
> > >>  weren't possible in the partitioning model of the pre-0.8 code base.
> > >>
> > >> -Jay
> > >>
> > >>
> > >> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <
> neha.narkhede@gmail.com
> > >>> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> Over the past few months, we've received quite a lot of feedback on
> the
> > >>> consumer side features and design. Some of them are improvements to
> the
> > >>> current consumer design and some are simply new feature/API
> requests. I
> > >>> have attempted to write up the requirements that I've heard on this
> > wiki
> > >> -
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
> > >>>
> > >>> This would involve some significant changes to the consumer APIs, so
> we
> > >>> would like to collect feedback on the proposal from our community.
> > Since
> > >>> the list of changes is not small, we would like to understand if some
> > >>> features are preferred over others, and more importantly, if some
> > >> features
> > >>> are not required at all.
> > >>>
> > >>> Since some part of this proposal is experimental and the consumer
> side
> > >>> changes are non-trivial, we would like this initiative to not
> interfere
> > >>> with the forthcoming replication release. However, it will be good to
> > >> have
> > >>> people from the community give this some thought and help out with
> the
> > >>> JIRAs if interested. One way of managing this project could be
> > creating a
> > >>> separate branch from the kafka trunk and continue development on it.
> > Once
> > >>> it is ready and in good shape, we can think about cutting another
> > release
> > >>> (after 0.8) for the releasing the new consumer API. Do people have
> > >>> preferences/concerns regarding creating a separate branch for this
> > >> project
> > >>> ?
> > >>>
> > >>> Please feel free to start a discussion on this JIRA -
> > >>> https://issues.apache.org/jira/browse/KAFKA-364
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Neha
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > --
> > > *Evan Chan*
> > > Senior Software Engineer |
> > > ev@ooyala.com | (650) 996-4600
> > > www.ooyala.com | blog <http://www.ooyala.com/blog> |
> > > @ooyala<http://www.twitter.com/ooyala>
> >
> >
>

RE: Consumer re-design proposal

Posted by "Sybrandy, Casey" <Ca...@Six3Systems.com>.
Would porting the consumer/producer code to C be a good idea?  I say this because at least with most languages I know of, leveraging a C library is pretty easy.  This way, you would have to maintain only the C library and others can make/maintain wrappers for their languages.  Having to port to other languages is going to cause you to have a significant amount of maintenance if you change the protocol in the future.

________________________________________
From: Neha Narkhede [neha.narkhede@gmail.com]
Sent: Thursday, June 14, 2012 5:53 PM
To: kafka-users@incubator.apache.org
Cc: kafka-dev@incubator.apache.org
Subject: Re: Consumer re-design proposal

Thanks for the feedback ! I moved it to
https://issues.apache.org/jira/browse/KAFKA-364, so that we can keep track
of these.

-Neha

On Thu, Jun 14, 2012 at 2:45 PM, Marcos Juarez <mj...@gmail.com> wrote:

> Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary
> value, and then write that offset into ZK".
>
> We're currently running into a scenario where we would like to have 100%
> reliability, and we're losing a few messages when a connection is broken,
> but there were still a few messages in the OS TCP buffers. So, we're
> planning on shifting the ZK offset by a few seconds "back in time" if we
> detect a broker has gone down, to make sure all the messages will be
> actually delivered to the end consumer when that broker comes back up, even
> if there's a small amount of overlapping messages.
>
> Thanks,
>
> Marcos
>
>
> On Jun 14, 2012, at 2:39 PM, Evan Chan wrote:
>
> > I would like to throw in a couple use cases:
> >
> >
> >   - Allow the new consumer to reset its offset to either the current
> >   largest or smallest.  This would be a great way to restart a process
> that
> >   has fallen behind.  The only way I know how to do this today, with the
> >   high-level consumer, is to delete the ZK nodes manually and restart the
> >   consumer.
> >   - Allow the consumer to reset its offset to some arbitrary value, and
> >   then write that offset into ZK.    Kind of like the first case, but
> would
> >   make rewinding/replays much easier.
> >
> > Modularity (the ability to layer the ZK infrastructure on top of the
> simple
> > interface) would be great.
> >
> > thanks,
> > Evan
> >
> >
> > On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> >> This is a great summary Neha. It would be good to get people's feedback
> on
> >> this since we don't want to keep breaking api and
> >> protocol compatibility here, so the hope is to really get it right this
> >> time now that we have really seen all the use cases and live with the
> >> output for a while. I think the consumer design is a pretty hard
> protocol
> >> and API design problem, so its fun to think about.
> >>
> >> If I were to summarize Neha's requirements list, I think there are three
> >> high-level goals:
> >>
> >>  1. Simplify the consumer protocol to enable ease of development of
> >>  consumer clients in other languages
> >>  2. Try to replace the "simple consumer" and "high level consumer" with
> a
> >>  single, general interface that has all the advantages of both.
> >>  3. Support a bunch of use cases that either we didn't think of, or that
> >>  weren't possible in the partitioning model of the pre-0.8 code base.
> >>
> >> -Jay
> >>
> >>
> >> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <neha.narkhede@gmail.com
> >>> wrote:
> >>
> >>> Hi,
> >>>
> >>> Over the past few months, we've received quite a lot of feedback on the
> >>> consumer side features and design. Some of them are improvements to the
> >>> current consumer design and some are simply new feature/API requests. I
> >>> have attempted to write up the requirements that I've heard on this
> wiki
> >> -
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
> >>>
> >>> This would involve some significant changes to the consumer APIs, so we
> >>> would like to collect feedback on the proposal from our community.
> Since
> >>> the list of changes is not small, we would like to understand if some
> >>> features are preferred over others, and more importantly, if some
> >> features
> >>> are not required at all.
> >>>
> >>> Since some part of this proposal is experimental and the consumer side
> >>> changes are non-trivial, we would like this initiative to not interfere
> >>> with the forthcoming replication release. However, it will be good to
> >> have
> >>> people from the community give this some thought and help out with the
> >>> JIRAs if interested. One way of managing this project could be
> creating a
> >>> separate branch from the kafka trunk and continue development on it.
> Once
> >>> it is ready and in good shape, we can think about cutting another
> release
> >>> (after 0.8) for the releasing the new consumer API. Do people have
> >>> preferences/concerns regarding creating a separate branch for this
> >> project
> >>> ?
> >>>
> >>> Please feel free to start a discussion on this JIRA -
> >>> https://issues.apache.org/jira/browse/KAFKA-364
> >>>
> >>> Thanks,
> >>>
> >>> Neha
> >>>
> >>
> >
> >
> >
> > --
> > --
> > *Evan Chan*
> > Senior Software Engineer |
> > ev@ooyala.com | (650) 996-4600
> > www.ooyala.com | blog <http://www.ooyala.com/blog> |
> > @ooyala<http://www.twitter.com/ooyala>
>
>

RE: Consumer re-design proposal

Posted by "Sybrandy, Casey" <Ca...@Six3Systems.com>.
Would porting the consumer/producer code to C be a good idea?  I say this because at least with most languages I know of, leveraging a C library is pretty easy.  This way, you would have to maintain only the C library and others can make/maintain wrappers for their languages.  Having to port to other languages is going to cause you to have a significant amount of maintenance if you change the protocol in the future.

________________________________________
From: Neha Narkhede [neha.narkhede@gmail.com]
Sent: Thursday, June 14, 2012 5:53 PM
To: kafka-users@incubator.apache.org
Cc: kafka-dev@incubator.apache.org
Subject: Re: Consumer re-design proposal

Thanks for the feedback ! I moved it to
https://issues.apache.org/jira/browse/KAFKA-364, so that we can keep track
of these.

-Neha

On Thu, Jun 14, 2012 at 2:45 PM, Marcos Juarez <mj...@gmail.com> wrote:

> Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary
> value, and then write that offset into ZK".
>
> We're currently running into a scenario where we would like to have 100%
> reliability, and we're losing a few messages when a connection is broken,
> but there were still a few messages in the OS TCP buffers. So, we're
> planning on shifting the ZK offset by a few seconds "back in time" if we
> detect a broker has gone down, to make sure all the messages will be
> actually delivered to the end consumer when that broker comes back up, even
> if there's a small amount of overlapping messages.
>
> Thanks,
>
> Marcos
>
>
> On Jun 14, 2012, at 2:39 PM, Evan Chan wrote:
>
> > I would like to throw in a couple use cases:
> >
> >
> >   - Allow the new consumer to reset its offset to either the current
> >   largest or smallest.  This would be a great way to restart a process
> that
> >   has fallen behind.  The only way I know how to do this today, with the
> >   high-level consumer, is to delete the ZK nodes manually and restart the
> >   consumer.
> >   - Allow the consumer to reset its offset to some arbitrary value, and
> >   then write that offset into ZK.    Kind of like the first case, but
> would
> >   make rewinding/replays much easier.
> >
> > Modularity (the ability to layer the ZK infrastructure on top of the
> simple
> > interface) would be great.
> >
> > thanks,
> > Evan
> >
> >
> > On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> >> This is a great summary Neha. It would be good to get people's feedback
> on
> >> this since we don't want to keep breaking api and
> >> protocol compatibility here, so the hope is to really get it right this
> >> time now that we have really seen all the use cases and live with the
> >> output for a while. I think the consumer design is a pretty hard
> protocol
> >> and API design problem, so its fun to think about.
> >>
> >> If I were to summarize Neha's requirements list, I think there are three
> >> high-level goals:
> >>
> >>  1. Simplify the consumer protocol to enable ease of development of
> >>  consumer clients in other languages
> >>  2. Try to replace the "simple consumer" and "high level consumer" with
> a
> >>  single, general interface that has all the advantages of both.
> >>  3. Support a bunch of use cases that either we didn't think of, or that
> >>  weren't possible in the partitioning model of the pre-0.8 code base.
> >>
> >> -Jay
> >>
> >>
> >> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <neha.narkhede@gmail.com
> >>> wrote:
> >>
> >>> Hi,
> >>>
> >>> Over the past few months, we've received quite a lot of feedback on the
> >>> consumer side features and design. Some of them are improvements to the
> >>> current consumer design and some are simply new feature/API requests. I
> >>> have attempted to write up the requirements that I've heard on this
> wiki
> >> -
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
> >>>
> >>> This would involve some significant changes to the consumer APIs, so we
> >>> would like to collect feedback on the proposal from our community.
> Since
> >>> the list of changes is not small, we would like to understand if some
> >>> features are preferred over others, and more importantly, if some
> >> features
> >>> are not required at all.
> >>>
> >>> Since some part of this proposal is experimental and the consumer side
> >>> changes are non-trivial, we would like this initiative to not interfere
> >>> with the forthcoming replication release. However, it will be good to
> >> have
> >>> people from the community give this some thought and help out with the
> >>> JIRAs if interested. One way of managing this project could be
> creating a
> >>> separate branch from the kafka trunk and continue development on it.
> Once
> >>> it is ready and in good shape, we can think about cutting another
> release
> >>> (after 0.8) for the releasing the new consumer API. Do people have
> >>> preferences/concerns regarding creating a separate branch for this
> >> project
> >>> ?
> >>>
> >>> Please feel free to start a discussion on this JIRA -
> >>> https://issues.apache.org/jira/browse/KAFKA-364
> >>>
> >>> Thanks,
> >>>
> >>> Neha
> >>>
> >>
> >
> >
> >
> > --
> > --
> > *Evan Chan*
> > Senior Software Engineer |
> > ev@ooyala.com | (650) 996-4600
> > www.ooyala.com | blog <http://www.ooyala.com/blog> |
> > @ooyala<http://www.twitter.com/ooyala>
>
>

Re: Consumer re-design proposal

Posted by Neha Narkhede <ne...@gmail.com>.
Thanks for the feedback ! I moved it to
https://issues.apache.org/jira/browse/KAFKA-364, so that we can keep track
of these.

-Neha

On Thu, Jun 14, 2012 at 2:45 PM, Marcos Juarez <mj...@gmail.com> wrote:

> Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary
> value, and then write that offset into ZK".
>
> We're currently running into a scenario where we would like to have 100%
> reliability, and we're losing a few messages when a connection is broken,
> but there were still a few messages in the OS TCP buffers. So, we're
> planning on shifting the ZK offset by a few seconds "back in time" if we
> detect a broker has gone down, to make sure all the messages will be
> actually delivered to the end consumer when that broker comes back up, even
> if there's a small amount of overlapping messages.
>
> Thanks,
>
> Marcos
>
>
> On Jun 14, 2012, at 2:39 PM, Evan Chan wrote:
>
> > I would like to throw in a couple use cases:
> >
> >
> >   - Allow the new consumer to reset its offset to either the current
> >   largest or smallest.  This would be a great way to restart a process
> that
> >   has fallen behind.  The only way I know how to do this today, with the
> >   high-level consumer, is to delete the ZK nodes manually and restart the
> >   consumer.
> >   - Allow the consumer to reset its offset to some arbitrary value, and
> >   then write that offset into ZK.    Kind of like the first case, but
> would
> >   make rewinding/replays much easier.
> >
> > Modularity (the ability to layer the ZK infrastructure on top of the
> simple
> > interface) would be great.
> >
> > thanks,
> > Evan
> >
> >
> > On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> >> This is a great summary Neha. It would be good to get people's feedback
> on
> >> this since we don't want to keep breaking api and
> >> protocol compatibility here, so the hope is to really get it right this
> >> time now that we have really seen all the use cases and live with the
> >> output for a while. I think the consumer design is a pretty hard
> protocol
> >> and API design problem, so its fun to think about.
> >>
> >> If I were to summarize Neha's requirements list, I think there are three
> >> high-level goals:
> >>
> >>  1. Simplify the consumer protocol to enable ease of development of
> >>  consumer clients in other languages
> >>  2. Try to replace the "simple consumer" and "high level consumer" with
> a
> >>  single, general interface that has all the advantages of both.
> >>  3. Support a bunch of use cases that either we didn't think of, or that
> >>  weren't possible in the partitioning model of the pre-0.8 code base.
> >>
> >> -Jay
> >>
> >>
> >> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <neha.narkhede@gmail.com
> >>> wrote:
> >>
> >>> Hi,
> >>>
> >>> Over the past few months, we've received quite a lot of feedback on the
> >>> consumer side features and design. Some of them are improvements to the
> >>> current consumer design and some are simply new feature/API requests. I
> >>> have attempted to write up the requirements that I've heard on this
> wiki
> >> -
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
> >>>
> >>> This would involve some significant changes to the consumer APIs, so we
> >>> would like to collect feedback on the proposal from our community.
> Since
> >>> the list of changes is not small, we would like to understand if some
> >>> features are preferred over others, and more importantly, if some
> >> features
> >>> are not required at all.
> >>>
> >>> Since some part of this proposal is experimental and the consumer side
> >>> changes are non-trivial, we would like this initiative to not interfere
> >>> with the forthcoming replication release. However, it will be good to
> >> have
> >>> people from the community give this some thought and help out with the
> >>> JIRAs if interested. One way of managing this project could be
> creating a
> >>> separate branch from the kafka trunk and continue development on it.
> Once
> >>> it is ready and in good shape, we can think about cutting another
> release
> >>> (after 0.8) for the releasing the new consumer API. Do people have
> >>> preferences/concerns regarding creating a separate branch for this
> >> project
> >>> ?
> >>>
> >>> Please feel free to start a discussion on this JIRA -
> >>> https://issues.apache.org/jira/browse/KAFKA-364
> >>>
> >>> Thanks,
> >>>
> >>> Neha
> >>>
> >>
> >
> >
> >
> > --
> > --
> > *Evan Chan*
> > Senior Software Engineer |
> > ev@ooyala.com | (650) 996-4600
> > www.ooyala.com | blog <http://www.ooyala.com/blog> |
> > @ooyala<http://www.twitter.com/ooyala>
>
>

Re: Consumer re-design proposal

Posted by Neha Narkhede <ne...@gmail.com>.
Thanks for the feedback ! I moved it to
https://issues.apache.org/jira/browse/KAFKA-364, so that we can keep track
of these.

-Neha

On Thu, Jun 14, 2012 at 2:45 PM, Marcos Juarez <mj...@gmail.com> wrote:

> Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary
> value, and then write that offset into ZK".
>
> We're currently running into a scenario where we would like to have 100%
> reliability, and we're losing a few messages when a connection is broken,
> but there were still a few messages in the OS TCP buffers. So, we're
> planning on shifting the ZK offset by a few seconds "back in time" if we
> detect a broker has gone down, to make sure all the messages will be
> actually delivered to the end consumer when that broker comes back up, even
> if there's a small amount of overlapping messages.
>
> Thanks,
>
> Marcos
>
>
> On Jun 14, 2012, at 2:39 PM, Evan Chan wrote:
>
> > I would like to throw in a couple use cases:
> >
> >
> >   - Allow the new consumer to reset its offset to either the current
> >   largest or smallest.  This would be a great way to restart a process
> that
> >   has fallen behind.  The only way I know how to do this today, with the
> >   high-level consumer, is to delete the ZK nodes manually and restart the
> >   consumer.
> >   - Allow the consumer to reset its offset to some arbitrary value, and
> >   then write that offset into ZK.    Kind of like the first case, but
> would
> >   make rewinding/replays much easier.
> >
> > Modularity (the ability to layer the ZK infrastructure on top of the
> simple
> > interface) would be great.
> >
> > thanks,
> > Evan
> >
> >
> > On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> >> This is a great summary Neha. It would be good to get people's feedback
> on
> >> this since we don't want to keep breaking api and
> >> protocol compatibility here, so the hope is to really get it right this
> >> time now that we have really seen all the use cases and live with the
> >> output for a while. I think the consumer design is a pretty hard
> protocol
> >> and API design problem, so its fun to think about.
> >>
> >> If I were to summarize Neha's requirements list, I think there are three
> >> high-level goals:
> >>
> >>  1. Simplify the consumer protocol to enable ease of development of
> >>  consumer clients in other languages
> >>  2. Try to replace the "simple consumer" and "high level consumer" with
> a
> >>  single, general interface that has all the advantages of both.
> >>  3. Support a bunch of use cases that either we didn't think of, or that
> >>  weren't possible in the partitioning model of the pre-0.8 code base.
> >>
> >> -Jay
> >>
> >>
> >> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <neha.narkhede@gmail.com
> >>> wrote:
> >>
> >>> Hi,
> >>>
> >>> Over the past few months, we've received quite a lot of feedback on the
> >>> consumer side features and design. Some of them are improvements to the
> >>> current consumer design and some are simply new feature/API requests. I
> >>> have attempted to write up the requirements that I've heard on this
> wiki
> >> -
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
> >>>
> >>> This would involve some significant changes to the consumer APIs, so we
> >>> would like to collect feedback on the proposal from our community.
> Since
> >>> the list of changes is not small, we would like to understand if some
> >>> features are preferred over others, and more importantly, if some
> >> features
> >>> are not required at all.
> >>>
> >>> Since some part of this proposal is experimental and the consumer side
> >>> changes are non-trivial, we would like this initiative to not interfere
> >>> with the forthcoming replication release. However, it will be good to
> >> have
> >>> people from the community give this some thought and help out with the
> >>> JIRAs if interested. One way of managing this project could be
> creating a
> >>> separate branch from the kafka trunk and continue development on it.
> Once
> >>> it is ready and in good shape, we can think about cutting another
> release
> >>> (after 0.8) for the releasing the new consumer API. Do people have
> >>> preferences/concerns regarding creating a separate branch for this
> >> project
> >>> ?
> >>>
> >>> Please feel free to start a discussion on this JIRA -
> >>> https://issues.apache.org/jira/browse/KAFKA-364
> >>>
> >>> Thanks,
> >>>
> >>> Neha
> >>>
> >>
> >
> >
> >
> > --
> > --
> > *Evan Chan*
> > Senior Software Engineer |
> > ev@ooyala.com | (650) 996-4600
> > www.ooyala.com | blog <http://www.ooyala.com/blog> |
> > @ooyala<http://www.twitter.com/ooyala>
>
>

Re: Consumer re-design proposal

Posted by Marcos Juarez <mj...@gmail.com>.
Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary value, and then write that offset into ZK".

We're currently running into a scenario where we would like to have 100% reliability, and we're losing a few messages when a connection is broken, but there were still a few messages in the OS TCP buffers. So, we're planning on shifting the ZK offset by a few seconds "back in time" if we detect a broker has gone down, to make sure all the messages will be actually delivered to the end consumer when that broker comes back up, even if there's a small amount of overlapping messages.

Thanks,

Marcos


On Jun 14, 2012, at 2:39 PM, Evan Chan wrote:

> I would like to throw in a couple use cases:
> 
> 
>   - Allow the new consumer to reset its offset to either the current
>   largest or smallest.  This would be a great way to restart a process that
>   has fallen behind.  The only way I know how to do this today, with the
>   high-level consumer, is to delete the ZK nodes manually and restart the
>   consumer.
>   - Allow the consumer to reset its offset to some arbitrary value, and
>   then write that offset into ZK.    Kind of like the first case, but would
>   make rewinding/replays much easier.
> 
> Modularity (the ability to layer the ZK infrastructure on top of the simple
> interface) would be great.
> 
> thanks,
> Evan
> 
> 
> On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <ja...@gmail.com> wrote:
> 
>> This is a great summary Neha. It would be good to get people's feedback on
>> this since we don't want to keep breaking api and
>> protocol compatibility here, so the hope is to really get it right this
>> time now that we have really seen all the use cases and live with the
>> output for a while. I think the consumer design is a pretty hard protocol
>> and API design problem, so its fun to think about.
>> 
>> If I were to summarize Neha's requirements list, I think there are three
>> high-level goals:
>> 
>>  1. Simplify the consumer protocol to enable ease of development of
>>  consumer clients in other languages
>>  2. Try to replace the "simple consumer" and "high level consumer" with a
>>  single, general interface that has all the advantages of both.
>>  3. Support a bunch of use cases that either we didn't think of, or that
>>  weren't possible in the partitioning model of the pre-0.8 code base.
>> 
>> -Jay
>> 
>> 
>> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <neha.narkhede@gmail.com
>>> wrote:
>> 
>>> Hi,
>>> 
>>> Over the past few months, we've received quite a lot of feedback on the
>>> consumer side features and design. Some of them are improvements to the
>>> current consumer design and some are simply new feature/API requests. I
>>> have attempted to write up the requirements that I've heard on this wiki
>> -
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
>>> 
>>> This would involve some significant changes to the consumer APIs, so we
>>> would like to collect feedback on the proposal from our community. Since
>>> the list of changes is not small, we would like to understand if some
>>> features are preferred over others, and more importantly, if some
>> features
>>> are not required at all.
>>> 
>>> Since some part of this proposal is experimental and the consumer side
>>> changes are non-trivial, we would like this initiative to not interfere
>>> with the forthcoming replication release. However, it will be good to
>> have
>>> people from the community give this some thought and help out with the
>>> JIRAs if interested. One way of managing this project could be creating a
>>> separate branch from the kafka trunk and continue development on it. Once
>>> it is ready and in good shape, we can think about cutting another release
>>> (after 0.8) for the releasing the new consumer API. Do people have
>>> preferences/concerns regarding creating a separate branch for this
>> project
>>> ?
>>> 
>>> Please feel free to start a discussion on this JIRA -
>>> https://issues.apache.org/jira/browse/KAFKA-364
>>> 
>>> Thanks,
>>> 
>>> Neha
>>> 
>> 
> 
> 
> 
> -- 
> --
> *Evan Chan*
> Senior Software Engineer |
> ev@ooyala.com | (650) 996-4600
> www.ooyala.com | blog <http://www.ooyala.com/blog> |
> @ooyala<http://www.twitter.com/ooyala>


Re: Consumer re-design proposal

Posted by Jay Kreps <ja...@gmail.com>.
These are excellent points, thanks.

-Jay

On Thu, Jun 14, 2012 at 1:39 PM, Evan Chan <ev...@ooyala.com> wrote:

> I would like to throw in a couple use cases:
>
>
>   - Allow the new consumer to reset its offset to either the current
>   largest or smallest.  This would be a great way to restart a process that
>   has fallen behind.  The only way I know how to do this today, with the
>   high-level consumer, is to delete the ZK nodes manually and restart the
>   consumer.
>   - Allow the consumer to reset its offset to some arbitrary value, and
>   then write that offset into ZK.    Kind of like the first case, but would
>   make rewinding/replays much easier.
>
> Modularity (the ability to layer the ZK infrastructure on top of the simple
> interface) would be great.
>
> thanks,
> Evan
>
>
> On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > This is a great summary Neha. It would be good to get people's feedback
> on
> > this since we don't want to keep breaking api and
> > protocol compatibility here, so the hope is to really get it right this
> > time now that we have really seen all the use cases and live with the
> > output for a while. I think the consumer design is a pretty hard protocol
> > and API design problem, so its fun to think about.
> >
> > If I were to summarize Neha's requirements list, I think there are three
> > high-level goals:
> >
> >   1. Simplify the consumer protocol to enable ease of development of
> >   consumer clients in other languages
> >   2. Try to replace the "simple consumer" and "high level consumer" with
> a
> >   single, general interface that has all the advantages of both.
> >   3. Support a bunch of use cases that either we didn't think of, or that
> >   weren't possible in the partitioning model of the pre-0.8 code base.
> >
> > -Jay
> >
> >
> > On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <neha.narkhede@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > Over the past few months, we've received quite a lot of feedback on the
> > > consumer side features and design. Some of them are improvements to the
> > > current consumer design and some are simply new feature/API requests. I
> > > have attempted to write up the requirements that I've heard on this
> wiki
> > -
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
> > >
> > > This would involve some significant changes to the consumer APIs, so we
> > > would like to collect feedback on the proposal from our community.
> Since
> > > the list of changes is not small, we would like to understand if some
> > > features are preferred over others, and more importantly, if some
> > features
> > > are not required at all.
> > >
> > > Since some part of this proposal is experimental and the consumer side
> > > changes are non-trivial, we would like this initiative to not interfere
> > > with the forthcoming replication release. However, it will be good to
> > have
> > > people from the community give this some thought and help out with the
> > > JIRAs if interested. One way of managing this project could be
> creating a
> > > separate branch from the kafka trunk and continue development on it.
> Once
> > > it is ready and in good shape, we can think about cutting another
> release
> > > (after 0.8) for the releasing the new consumer API. Do people have
> > > preferences/concerns regarding creating a separate branch for this
> > project
> > > ?
> > >
> > > Please feel free to start a discussion on this JIRA -
> > > https://issues.apache.org/jira/browse/KAFKA-364
> > >
> > > Thanks,
> > >
> > > Neha
> > >
> >
>
>
>
> --
> --
> *Evan Chan*
> Senior Software Engineer |
> ev@ooyala.com | (650) 996-4600
> www.ooyala.com | blog <http://www.ooyala.com/blog> |
> @ooyala<http://www.twitter.com/ooyala>
>

Re: Consumer re-design proposal

Posted by Marcos Juarez <mj...@gmail.com>.
Throwing a +1 on "Allow the consumer to reset its offset to some arbitrary value, and then write that offset into ZK".

We're currently running into a scenario where we would like to have 100% reliability, and we're losing a few messages when a connection is broken, but there were still a few messages in the OS TCP buffers. So, we're planning on shifting the ZK offset by a few seconds "back in time" if we detect a broker has gone down, to make sure all the messages will be actually delivered to the end consumer when that broker comes back up, even if there's a small amount of overlapping messages.

Thanks,

Marcos


On Jun 14, 2012, at 2:39 PM, Evan Chan wrote:

> I would like to throw in a couple use cases:
> 
> 
>   - Allow the new consumer to reset its offset to either the current
>   largest or smallest.  This would be a great way to restart a process that
>   has fallen behind.  The only way I know how to do this today, with the
>   high-level consumer, is to delete the ZK nodes manually and restart the
>   consumer.
>   - Allow the consumer to reset its offset to some arbitrary value, and
>   then write that offset into ZK.    Kind of like the first case, but would
>   make rewinding/replays much easier.
> 
> Modularity (the ability to layer the ZK infrastructure on top of the simple
> interface) would be great.
> 
> thanks,
> Evan
> 
> 
> On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <ja...@gmail.com> wrote:
> 
>> This is a great summary Neha. It would be good to get people's feedback on
>> this since we don't want to keep breaking api and
>> protocol compatibility here, so the hope is to really get it right this
>> time now that we have really seen all the use cases and live with the
>> output for a while. I think the consumer design is a pretty hard protocol
>> and API design problem, so its fun to think about.
>> 
>> If I were to summarize Neha's requirements list, I think there are three
>> high-level goals:
>> 
>>  1. Simplify the consumer protocol to enable ease of development of
>>  consumer clients in other languages
>>  2. Try to replace the "simple consumer" and "high level consumer" with a
>>  single, general interface that has all the advantages of both.
>>  3. Support a bunch of use cases that either we didn't think of, or that
>>  weren't possible in the partitioning model of the pre-0.8 code base.
>> 
>> -Jay
>> 
>> 
>> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <neha.narkhede@gmail.com
>>> wrote:
>> 
>>> Hi,
>>> 
>>> Over the past few months, we've received quite a lot of feedback on the
>>> consumer side features and design. Some of them are improvements to the
>>> current consumer design and some are simply new feature/API requests. I
>>> have attempted to write up the requirements that I've heard on this wiki
>> -
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
>>> 
>>> This would involve some significant changes to the consumer APIs, so we
>>> would like to collect feedback on the proposal from our community. Since
>>> the list of changes is not small, we would like to understand if some
>>> features are preferred over others, and more importantly, if some
>> features
>>> are not required at all.
>>> 
>>> Since some part of this proposal is experimental and the consumer side
>>> changes are non-trivial, we would like this initiative to not interfere
>>> with the forthcoming replication release. However, it will be good to
>> have
>>> people from the community give this some thought and help out with the
>>> JIRAs if interested. One way of managing this project could be creating a
>>> separate branch from the kafka trunk and continue development on it. Once
>>> it is ready and in good shape, we can think about cutting another release
>>> (after 0.8) for the releasing the new consumer API. Do people have
>>> preferences/concerns regarding creating a separate branch for this
>> project
>>> ?
>>> 
>>> Please feel free to start a discussion on this JIRA -
>>> https://issues.apache.org/jira/browse/KAFKA-364
>>> 
>>> Thanks,
>>> 
>>> Neha
>>> 
>> 
> 
> 
> 
> -- 
> --
> *Evan Chan*
> Senior Software Engineer |
> ev@ooyala.com | (650) 996-4600
> www.ooyala.com | blog <http://www.ooyala.com/blog> |
> @ooyala<http://www.twitter.com/ooyala>


Re: Consumer re-design proposal

Posted by Evan Chan <ev...@ooyala.com>.
I would like to throw in a couple use cases:


   - Allow the new consumer to reset its offset to either the current
   largest or smallest.  This would be a great way to restart a process that
   has fallen behind.  The only way I know how to do this today, with the
   high-level consumer, is to delete the ZK nodes manually and restart the
   consumer.
   - Allow the consumer to reset its offset to some arbitrary value, and
   then write that offset into ZK.    Kind of like the first case, but would
   make rewinding/replays much easier.

Modularity (the ability to layer the ZK infrastructure on top of the simple
interface) would be great.

thanks,
Evan


On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <ja...@gmail.com> wrote:

> This is a great summary Neha. It would be good to get people's feedback on
> this since we don't want to keep breaking api and
> protocol compatibility here, so the hope is to really get it right this
> time now that we have really seen all the use cases and live with the
> output for a while. I think the consumer design is a pretty hard protocol
> and API design problem, so its fun to think about.
>
> If I were to summarize Neha's requirements list, I think there are three
> high-level goals:
>
>   1. Simplify the consumer protocol to enable ease of development of
>   consumer clients in other languages
>   2. Try to replace the "simple consumer" and "high level consumer" with a
>   single, general interface that has all the advantages of both.
>   3. Support a bunch of use cases that either we didn't think of, or that
>   weren't possible in the partitioning model of the pre-0.8 code base.
>
> -Jay
>
>
> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > Hi,
> >
> > Over the past few months, we've received quite a lot of feedback on the
> > consumer side features and design. Some of them are improvements to the
> > current consumer design and some are simply new feature/API requests. I
> > have attempted to write up the requirements that I've heard on this wiki
> -
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
> >
> > This would involve some significant changes to the consumer APIs, so we
> > would like to collect feedback on the proposal from our community. Since
> > the list of changes is not small, we would like to understand if some
> > features are preferred over others, and more importantly, if some
> features
> > are not required at all.
> >
> > Since some part of this proposal is experimental and the consumer side
> > changes are non-trivial, we would like this initiative to not interfere
> > with the forthcoming replication release. However, it will be good to
> have
> > people from the community give this some thought and help out with the
> > JIRAs if interested. One way of managing this project could be creating a
> > separate branch from the kafka trunk and continue development on it. Once
> > it is ready and in good shape, we can think about cutting another release
> > (after 0.8) for the releasing the new consumer API. Do people have
> > preferences/concerns regarding creating a separate branch for this
> project
> > ?
> >
> > Please feel free to start a discussion on this JIRA -
> > https://issues.apache.org/jira/browse/KAFKA-364
> >
> > Thanks,
> >
> > Neha
> >
>



-- 
--
*Evan Chan*
Senior Software Engineer |
ev@ooyala.com | (650) 996-4600
www.ooyala.com | blog <http://www.ooyala.com/blog> |
@ooyala<http://www.twitter.com/ooyala>

Re: Consumer re-design proposal

Posted by Evan Chan <ev...@ooyala.com>.
I would like to throw in a couple use cases:


   - Allow the new consumer to reset its offset to either the current
   largest or smallest.  This would be a great way to restart a process that
   has fallen behind.  The only way I know how to do this today, with the
   high-level consumer, is to delete the ZK nodes manually and restart the
   consumer.
   - Allow the consumer to reset its offset to some arbitrary value, and
   then write that offset into ZK.    Kind of like the first case, but would
   make rewinding/replays much easier.

Modularity (the ability to layer the ZK infrastructure on top of the simple
interface) would be great.

thanks,
Evan


On Tue, Jun 12, 2012 at 9:59 AM, Jay Kreps <ja...@gmail.com> wrote:

> This is a great summary Neha. It would be good to get people's feedback on
> this since we don't want to keep breaking api and
> protocol compatibility here, so the hope is to really get it right this
> time now that we have really seen all the use cases and live with the
> output for a while. I think the consumer design is a pretty hard protocol
> and API design problem, so its fun to think about.
>
> If I were to summarize Neha's requirements list, I think there are three
> high-level goals:
>
>   1. Simplify the consumer protocol to enable ease of development of
>   consumer clients in other languages
>   2. Try to replace the "simple consumer" and "high level consumer" with a
>   single, general interface that has all the advantages of both.
>   3. Support a bunch of use cases that either we didn't think of, or that
>   weren't possible in the partitioning model of the pre-0.8 code base.
>
> -Jay
>
>
> On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > Hi,
> >
> > Over the past few months, we've received quite a lot of feedback on the
> > consumer side features and design. Some of them are improvements to the
> > current consumer design and some are simply new feature/API requests. I
> > have attempted to write up the requirements that I've heard on this wiki
> -
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
> >
> > This would involve some significant changes to the consumer APIs, so we
> > would like to collect feedback on the proposal from our community. Since
> > the list of changes is not small, we would like to understand if some
> > features are preferred over others, and more importantly, if some
> features
> > are not required at all.
> >
> > Since some part of this proposal is experimental and the consumer side
> > changes are non-trivial, we would like this initiative to not interfere
> > with the forthcoming replication release. However, it will be good to
> have
> > people from the community give this some thought and help out with the
> > JIRAs if interested. One way of managing this project could be creating a
> > separate branch from the kafka trunk and continue development on it. Once
> > it is ready and in good shape, we can think about cutting another release
> > (after 0.8) for the releasing the new consumer API. Do people have
> > preferences/concerns regarding creating a separate branch for this
> project
> > ?
> >
> > Please feel free to start a discussion on this JIRA -
> > https://issues.apache.org/jira/browse/KAFKA-364
> >
> > Thanks,
> >
> > Neha
> >
>



-- 
--
*Evan Chan*
Senior Software Engineer |
ev@ooyala.com | (650) 996-4600
www.ooyala.com | blog <http://www.ooyala.com/blog> |
@ooyala<http://www.twitter.com/ooyala>

Re: Consumer re-design proposal

Posted by Chris Burroughs <ch...@gmail.com>.
On 06/12/2012 12:59 PM, Jay Kreps wrote:
>    2. Try to replace the "simple consumer" and "high level consumer" with a
>    single, general interface that has all the advantages of both.

I've read through the wiki pages but think I'm missing the forrest for
the trees.

For a consumer that wants "Manual partition assignment" and "Manual
offset management", what does the proposed offer over the existing
SimpleConsumer?

Re: Consumer re-design proposal

Posted by Jay Kreps <ja...@gmail.com>.
This is a great summary Neha. It would be good to get people's feedback on
this since we don't want to keep breaking api and
protocol compatibility here, so the hope is to really get it right this
time now that we have really seen all the use cases and live with the
output for a while. I think the consumer design is a pretty hard protocol
and API design problem, so its fun to think about.

If I were to summarize Neha's requirements list, I think there are three
high-level goals:

   1. Simplify the consumer protocol to enable ease of development of
   consumer clients in other languages
   2. Try to replace the "simple consumer" and "high level consumer" with a
   single, general interface that has all the advantages of both.
   3. Support a bunch of use cases that either we didn't think of, or that
   weren't possible in the partitioning model of the pre-0.8 code base.

-Jay


On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <ne...@gmail.com>wrote:

> Hi,
>
> Over the past few months, we've received quite a lot of feedback on the
> consumer side features and design. Some of them are improvements to the
> current consumer design and some are simply new feature/API requests. I
> have attempted to write up the requirements that I've heard on this wiki -
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
>
> This would involve some significant changes to the consumer APIs, so we
> would like to collect feedback on the proposal from our community. Since
> the list of changes is not small, we would like to understand if some
> features are preferred over others, and more importantly, if some features
> are not required at all.
>
> Since some part of this proposal is experimental and the consumer side
> changes are non-trivial, we would like this initiative to not interfere
> with the forthcoming replication release. However, it will be good to have
> people from the community give this some thought and help out with the
> JIRAs if interested. One way of managing this project could be creating a
> separate branch from the kafka trunk and continue development on it. Once
> it is ready and in good shape, we can think about cutting another release
> (after 0.8) for the releasing the new consumer API. Do people have
> preferences/concerns regarding creating a separate branch for this project
> ?
>
> Please feel free to start a discussion on this JIRA -
> https://issues.apache.org/jira/browse/KAFKA-364
>
> Thanks,
>
> Neha
>

Re: Consumer re-design proposal

Posted by Jay Kreps <ja...@gmail.com>.
This is a great summary Neha. It would be good to get people's feedback on
this since we don't want to keep breaking api and
protocol compatibility here, so the hope is to really get it right this
time now that we have really seen all the use cases and live with the
output for a while. I think the consumer design is a pretty hard protocol
and API design problem, so its fun to think about.

If I were to summarize Neha's requirements list, I think there are three
high-level goals:

   1. Simplify the consumer protocol to enable ease of development of
   consumer clients in other languages
   2. Try to replace the "simple consumer" and "high level consumer" with a
   single, general interface that has all the advantages of both.
   3. Support a bunch of use cases that either we didn't think of, or that
   weren't possible in the partitioning model of the pre-0.8 code base.

-Jay


On Mon, Jun 11, 2012 at 4:52 PM, Neha Narkhede <ne...@gmail.com>wrote:

> Hi,
>
> Over the past few months, we've received quite a lot of feedback on the
> consumer side features and design. Some of them are improvements to the
> current consumer design and some are simply new feature/API requests. I
> have attempted to write up the requirements that I've heard on this wiki -
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
>
> This would involve some significant changes to the consumer APIs, so we
> would like to collect feedback on the proposal from our community. Since
> the list of changes is not small, we would like to understand if some
> features are preferred over others, and more importantly, if some features
> are not required at all.
>
> Since some part of this proposal is experimental and the consumer side
> changes are non-trivial, we would like this initiative to not interfere
> with the forthcoming replication release. However, it will be good to have
> people from the community give this some thought and help out with the
> JIRAs if interested. One way of managing this project could be creating a
> separate branch from the kafka trunk and continue development on it. Once
> it is ready and in good shape, we can think about cutting another release
> (after 0.8) for the releasing the new consumer API. Do people have
> preferences/concerns regarding creating a separate branch for this project
> ?
>
> Please feel free to start a discussion on this JIRA -
> https://issues.apache.org/jira/browse/KAFKA-364
>
> Thanks,
>
> Neha
>