You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Joe Stein <jo...@stealth.ly> on 2015/02/06 20:18:59 UTC

Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations

I updated the installation and sample usage for the existing patches on the
KIP site
https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations

There are still a few pending items here.

1) There was already some discussion about using the Broker that is the
Controller here https://issues.apache.org/jira/browse/KAFKA-1772 and we
should elaborate on that more in the thread or agree we are ok with admin
asking for the controller to talk to and then just sending that broker the
admin tasks.

2) I like this idea https://issues.apache.org/jira/browse/KAFKA-1912 but we
can refactor after KAFK-1694 committed, no? I know folks just want to talk
to the broker that is the controller. It may even become useful to have the
controller run on a broker that isn't even a topic broker anymore (small
can of worms I am opening here but it elaborates on Guozhang's hot spot
point.

3) anymore feedback?

- Joe Stein

On Fri, Jan 23, 2015 at 3:15 PM, Guozhang Wang <wa...@gmail.com> wrote:

> A centralized admin operation protocol would be very useful.
>
> One more general comment here is that controller is originally designed to
> only talk to other brokers through ControllerChannel, while the broker
> instance which carries the current controller is agnostic of its existence,
> and use KafkaApis to handle general Kafka requests. Having all admin
> requests redirected to the controller instance will force the broker to be
> aware of its carried controller, and access its internal data for handling
> these requests. Plus with the number of clients out of Kafka's control,
> this may easily cause the controller to be a hot spot in terms of request
> load.
>
>
> On Thu, Jan 22, 2015 at 10:09 PM, Joe Stein <jo...@stealth.ly> wrote:
>
> > inline
> >
> > On Thu, Jan 22, 2015 at 11:59 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > Hey Joe,
> > >
> > > This is great. A few comments on KIP-4
> > >
> > > 1. This is much needed functionality, but there are a lot of the so
> let's
> > > really think these protocols through. We really want to end up with a
> set
> > > of well thought-out, orthoganol apis. For this reason I think it is
> > really
> > > important to think through the end state even if that includes APIs we
> > > won't implement in the first phase.
> > >
> >
> > ok
> >
> >
> > >
> > > 2. Let's please please please wait until we have switched the server
> over
> > > to the new java protocol definitions. If we add upteen more ad hoc
> scala
> > > objects that is just generating more work for the conversion we know we
> > > have to do.
> > >
> >
> > ok :)
> >
> >
> > >
> > > 3. This proposal introduces a new type of optional parameter. This is
> > > inconsistent with everything else in the protocol where we use -1 or
> some
> > > other marker value. You could argue either way but let's stick with
> that
> > > for consistency. For clients that implemented the protocol in a better
> > way
> > > than our scala code these basic primitives are hard to change.
> > >
> >
> > yes, less confusing, ok.
> >
> >
> > >
> > > 4. ClusterMetadata: This seems to duplicate TopicMetadataRequest which
> > has
> > > brokers, topics, and partitions. I think we should rename that request
> > > ClusterMetadataRequest (or just MetadataRequest) and include the id of
> > the
> > > controller. Or are there other things we could add here?
> > >
> >
> > We could add broker version to it.
> >
> >
> > >
> > > 5. We have a tendency to try to make a lot of requests that can only go
> > to
> > > particular nodes. This adds a lot of burden for client implementations
> > (it
> > > sounds easy but each discovery can fail in many parts so it ends up
> > being a
> > > full state machine to do right). I think we should consider making
> admin
> > > commands and ideally as many of the other apis as possible available on
> > all
> > > brokers and just redirect to the controller on the broker side. Perhaps
> > > there would be a general way to encapsulate this re-routing behavior.
> > >
> >
> > If we do that then we should also preserve what we have and do both. The
> > client can then decide "do I want to go to any broker and proxy" or just
> > "go to controller and run admin task". Lots of folks have seen
> controllers
> > come under distress because of their producers/consumers. There is ticket
> > too for controller elect and re-elect
> > https://issues.apache.org/jira/browse/KAFKA-1778 so you can force it to
> a
> > broker that has 0 load.
> >
> >
> > >
> > > 6. We should probably normalize the key value pairs used for configs
> > rather
> > > than embedding a new formatting. So two strings rather than one with an
> > > internal equals sign.
> > >
> >
> > ok
> >
> >
> > >
> > > 7. Is the postcondition of these APIs that the command has begun or
> that
> > > the command has been completed? It is a lot more usable if the command
> > has
> > > been completed so you know that if you create a topic and then publish
> to
> > > it you won't get an exception about there being no such topic.
> > >
> >
> > We should define that more. There needs to be some more state there, yes.
> >
> > We should try to cover https://issues.apache.org/jira/browse/KAFKA-1125
> > within what we come up with.
> >
> >
> > >
> > > 8. Describe topic and list topics duplicate a lot of stuff in the
> > metadata
> > > request. Is there a reason to give back topics marked for deletion? I
> > feel
> > > like if we just make the post-condition of the delete command be that
> the
> > > topic is deleted that will get rid of the need for this right? And it
> > will
> > > be much more intuitive.
> > >
> >
> > I will go back and look through it.
> >
> >
> > >
> > > 9. Should we consider batching these requests? We have generally tried
> to
> > > allow multiple operations to be batched. My suspicion is that without
> > this
> > > we will get a lot of code that does something like
> > >    for(topic: adminClient.listTopics())
> > >       adminClient.describeTopic(topic)
> > > this code will work great when you test on 5 topics but not do as well
> if
> > > you have 50k.
> > >
> >
> > So => Input is a list of topics (or none for all) and a batch response
> from
> > the controller (which could be routed through another broker) of the
> entire
> > response? We could introduce a Batch keyword to explicitly show the usage
> > of it.
> >
> >
> > > 10. I think we should also discuss how we want to expose a programmatic
> > JVM
> > > client api for these operations. Currently people rely on AdminUtils
> > which
> > > is totally sketchy. I think we probably need another client under
> > clients/
> > > that exposes administrative functionality. We will need this just to
> > > properly test the new apis, I suspect. We should figure out that API.
> > >
> >
> > We were talking about that here
> > https://issues.apache.org/jira/browse/KAFKA-1774 and wrote it in java
> > https://reviews.apache.org/r/29301/diff/7/?page=4#75 so we could do
> > something like that, sure.
> >
> >
> > >
> > > 11. The other information that would be really useful to get would be
> > > information about partitions--how much data is in the partition, what
> are
> > > the segment offsets, what is the log-end offset (i.e. last offset),
> what
> > is
> > > the compaction point, etc. I think that done right this would be the
> > > successor to the very awkward OffsetRequest we have today.
> > >
> >
> > yes!
> >
> >
> > >
> > > -Jay
> > >
> > > On Wed, Jan 21, 2015 at 10:27 PM, Joe Stein <jo...@stealth.ly>
> > wrote:
> > >
> > > > Hi, created a KIP
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations
> > > >
> > > > JIRA https://issues.apache.org/jira/browse/KAFKA-1694
> > > >
> > > > /*******************************************
> > > >  Joe Stein
> > > >  Founder, Principal Consultant
> > > >  Big Data Open Source Security LLC
> > > >  http://www.stealth.ly
> > > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > > ********************************************/
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations

Posted by Chi Hoang <ch...@groupon.com>.
For the "Sample usage" section, please consider
https://github.com/airbnb/kafkat.  We find that tool to be very easy to
use, and extremely useful for our administration tasks.

Chi

On Mon, Feb 9, 2015 at 9:03 AM, Guozhang Wang <wa...@gmail.com> wrote:

> I feel the benefits of lowering the development bar for new clients does
> not worth the complexity we need to introduce in the server side, as today
> the clients just need one more request type (metadata request) to send the
> produce / fetch to the right brokers, whereas re-routing mechanism will
> result in complicated between-brokers communication patterns that
> potentially impact Kafka performance and making debugging / trouble
> shooting much harder.
>
> An alternative way to ease the development of the clients is to use a proxy
> in front of the kafka servers, like the rest proxy we have built before,
> which we use for non-java clients primarily but also can be treated as
> handling cluster metadata discovery for clients. Comparing to the
> re-routing idea, the proxy also introduces two-hops but its layered
> architecture is simpler.
>
> Guozhang
>
>
> On Sun, Feb 8, 2015 at 8:00 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > Hey Jiangjie,
> >
> > Re routing support doesn't force clients to use it. Java and all existing
> > clients would work as now where request are intelligently routed by the
> > client, but this would lower the bar for new clients. That said I agree
> the
> > case for reroute get admin commands is much stronger than data.
> >
> > The idea of separating admin/metadata from would definitely solve some
> > problems but it would also add a lot of complexity--new ports, thread
> > pools, etc. this is an interesting idea to think over but I'm not sure if
> > it's worth it. Probably a separate effort in any case.
> >
> > -jay
> >
> > On Friday, February 6, 2015, Jiangjie Qin <jq...@linkedin.com.invalid>
> > wrote:
> >
> > > I¹m a little bit concerned about the request routers among brokers.
> > > Typically we have a dominant percentage of produce and fetch
> > > request/response. Routing them from one broker to another seems not
> > wanted.
> > > Also I think we generally have two types of requests/responses: data
> > > related and admin related. It is typically a good practice to separate
> > > data plain from control plain. That suggests we should have another
> admin
> > > port to serve those admin requests and probably have different
> > > authentication/authorization from the data port.
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On 2/6/15, 11:18 AM, "Joe Stein" <jo...@stealth.ly> wrote:
> > >
> > > >I updated the installation and sample usage for the existing patches
> on
> > > >the
> > > >KIP site
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and
> > > >+centralized+administrative+operations
> > > >
> > > >There are still a few pending items here.
> > > >
> > > >1) There was already some discussion about using the Broker that is
> the
> > > >Controller here https://issues.apache.org/jira/browse/KAFKA-1772 and
> we
> > > >should elaborate on that more in the thread or agree we are ok with
> > admin
> > > >asking for the controller to talk to and then just sending that broker
> > the
> > > >admin tasks.
> > > >
> > > >2) I like this idea https://issues.apache.org/jira/browse/KAFKA-1912
> > but
> > > >we
> > > >can refactor after KAFK-1694 committed, no? I know folks just want to
> > talk
> > > >to the broker that is the controller. It may even become useful to
> have
> > > >the
> > > >controller run on a broker that isn't even a topic broker anymore
> (small
> > > >can of worms I am opening here but it elaborates on Guozhang's hot
> spot
> > > >point.
> > > >
> > > >3) anymore feedback?
> > > >
> > > >- Joe Stein
> > > >
> > > >On Fri, Jan 23, 2015 at 3:15 PM, Guozhang Wang <wa...@gmail.com>
> > > wrote:
> > > >
> > > >> A centralized admin operation protocol would be very useful.
> > > >>
> > > >> One more general comment here is that controller is originally
> > designed
> > > >>to
> > > >> only talk to other brokers through ControllerChannel, while the
> broker
> > > >> instance which carries the current controller is agnostic of its
> > > >>existence,
> > > >> and use KafkaApis to handle general Kafka requests. Having all admin
> > > >> requests redirected to the controller instance will force the broker
> > to
> > > >>be
> > > >> aware of its carried controller, and access its internal data for
> > > >>handling
> > > >> these requests. Plus with the number of clients out of Kafka's
> > control,
> > > >> this may easily cause the controller to be a hot spot in terms of
> > > >>request
> > > >> load.
> > > >>
> > > >>
> > > >> On Thu, Jan 22, 2015 at 10:09 PM, Joe Stein <jo...@stealth.ly>
> > > >>wrote:
> > > >>
> > > >> > inline
> > > >> >
> > > >> > On Thu, Jan 22, 2015 at 11:59 PM, Jay Kreps <ja...@gmail.com>
> > > >>wrote:
> > > >> >
> > > >> > > Hey Joe,
> > > >> > >
> > > >> > > This is great. A few comments on KIP-4
> > > >> > >
> > > >> > > 1. This is much needed functionality, but there are a lot of the
> > so
> > > >> let's
> > > >> > > really think these protocols through. We really want to end up
> > with
> > > >>a
> > > >> set
> > > >> > > of well thought-out, orthoganol apis. For this reason I think it
> > is
> > > >> > really
> > > >> > > important to think through the end state even if that includes
> > APIs
> > > >>we
> > > >> > > won't implement in the first phase.
> > > >> > >
> > > >> >
> > > >> > ok
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > 2. Let's please please please wait until we have switched the
> > server
> > > >> over
> > > >> > > to the new java protocol definitions. If we add upteen more ad
> hoc
> > > >> scala
> > > >> > > objects that is just generating more work for the conversion we
> > > >>know we
> > > >> > > have to do.
> > > >> > >
> > > >> >
> > > >> > ok :)
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > 3. This proposal introduces a new type of optional parameter.
> This
> > > >>is
> > > >> > > inconsistent with everything else in the protocol where we use
> -1
> > or
> > > >> some
> > > >> > > other marker value. You could argue either way but let's stick
> > with
> > > >> that
> > > >> > > for consistency. For clients that implemented the protocol in a
> > > >>better
> > > >> > way
> > > >> > > than our scala code these basic primitives are hard to change.
> > > >> > >
> > > >> >
> > > >> > yes, less confusing, ok.
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > 4. ClusterMetadata: This seems to duplicate TopicMetadataRequest
> > > >>which
> > > >> > has
> > > >> > > brokers, topics, and partitions. I think we should rename that
> > > >>request
> > > >> > > ClusterMetadataRequest (or just MetadataRequest) and include the
> > id
> > > >>of
> > > >> > the
> > > >> > > controller. Or are there other things we could add here?
> > > >> > >
> > > >> >
> > > >> > We could add broker version to it.
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > 5. We have a tendency to try to make a lot of requests that can
> > > >>only go
> > > >> > to
> > > >> > > particular nodes. This adds a lot of burden for client
> > > >>implementations
> > > >> > (it
> > > >> > > sounds easy but each discovery can fail in many parts so it ends
> > up
> > > >> > being a
> > > >> > > full state machine to do right). I think we should consider
> making
> > > >> admin
> > > >> > > commands and ideally as many of the other apis as possible
> > > >>available on
> > > >> > all
> > > >> > > brokers and just redirect to the controller on the broker side.
> > > >>Perhaps
> > > >> > > there would be a general way to encapsulate this re-routing
> > > >>behavior.
> > > >> > >
> > > >> >
> > > >> > If we do that then we should also preserve what we have and do
> both.
> > > >>The
> > > >> > client can then decide "do I want to go to any broker and proxy"
> or
> > > >>just
> > > >> > "go to controller and run admin task". Lots of folks have seen
> > > >> controllers
> > > >> > come under distress because of their producers/consumers. There is
> > > >>ticket
> > > >> > too for controller elect and re-elect
> > > >> > https://issues.apache.org/jira/browse/KAFKA-1778 so you can force
> > it
> > > >>to
> > > >> a
> > > >> > broker that has 0 load.
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > 6. We should probably normalize the key value pairs used for
> > configs
> > > >> > rather
> > > >> > > than embedding a new formatting. So two strings rather than one
> > > >>with an
> > > >> > > internal equals sign.
> > > >> > >
> > > >> >
> > > >> > ok
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > 7. Is the postcondition of these APIs that the command has begun
> > or
> > > >> that
> > > >> > > the command has been completed? It is a lot more usable if the
> > > >>command
> > > >> > has
> > > >> > > been completed so you know that if you create a topic and then
> > > >>publish
> > > >> to
> > > >> > > it you won't get an exception about there being no such topic.
> > > >> > >
> > > >> >
> > > >> > We should define that more. There needs to be some more state
> there,
> > > >>yes.
> > > >> >
> > > >> > We should try to cover
> > > >>https://issues.apache.org/jira/browse/KAFKA-1125
> > > >> > within what we come up with.
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > 8. Describe topic and list topics duplicate a lot of stuff in
> the
> > > >> > metadata
> > > >> > > request. Is there a reason to give back topics marked for
> > deletion?
> > > >>I
> > > >> > feel
> > > >> > > like if we just make the post-condition of the delete command be
> > > >>that
> > > >> the
> > > >> > > topic is deleted that will get rid of the need for this right?
> And
> > > >>it
> > > >> > will
> > > >> > > be much more intuitive.
> > > >> > >
> > > >> >
> > > >> > I will go back and look through it.
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > 9. Should we consider batching these requests? We have generally
> > > >>tried
> > > >> to
> > > >> > > allow multiple operations to be batched. My suspicion is that
> > > >>without
> > > >> > this
> > > >> > > we will get a lot of code that does something like
> > > >> > >    for(topic: adminClient.listTopics())
> > > >> > >       adminClient.describeTopic(topic)
> > > >> > > this code will work great when you test on 5 topics but not do
> as
> > > >>well
> > > >> if
> > > >> > > you have 50k.
> > > >> > >
> > > >> >
> > > >> > So => Input is a list of topics (or none for all) and a batch
> > response
> > > >> from
> > > >> > the controller (which could be routed through another broker) of
> the
> > > >> entire
> > > >> > response? We could introduce a Batch keyword to explicitly show
> the
> > > >>usage
> > > >> > of it.
> > > >> >
> > > >> >
> > > >> > > 10. I think we should also discuss how we want to expose a
> > > >>programmatic
> > > >> > JVM
> > > >> > > client api for these operations. Currently people rely on
> > AdminUtils
> > > >> > which
> > > >> > > is totally sketchy. I think we probably need another client
> under
> > > >> > clients/
> > > >> > > that exposes administrative functionality. We will need this
> just
> > to
> > > >> > > properly test the new apis, I suspect. We should figure out that
> > > >>API.
> > > >> > >
> > > >> >
> > > >> > We were talking about that here
> > > >> > https://issues.apache.org/jira/browse/KAFKA-1774 and wrote it in
> > java
> > > >> > https://reviews.apache.org/r/29301/diff/7/?page=4#75 so we could
> do
> > > >> > something like that, sure.
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > 11. The other information that would be really useful to get
> would
> > > >>be
> > > >> > > information about partitions--how much data is in the partition,
> > > >>what
> > > >> are
> > > >> > > the segment offsets, what is the log-end offset (i.e. last
> > offset),
> > > >> what
> > > >> > is
> > > >> > > the compaction point, etc. I think that done right this would be
> > the
> > > >> > > successor to the very awkward OffsetRequest we have today.
> > > >> > >
> > > >> >
> > > >> > yes!
> > > >> >
> > > >> >
> > > >> > >
> > > >> > > -Jay
> > > >> > >
> > > >> > > On Wed, Jan 21, 2015 at 10:27 PM, Joe Stein <
> joe.stein@stealth.ly
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Hi, created a KIP
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+an
> > > >>d+centralized+administrative+operations
> > > >> > > >
> > > >> > > > JIRA https://issues.apache.org/jira/browse/KAFKA-1694
> > > >> > > >
> > > >> > > > /*******************************************
> > > >> > > >  Joe Stein
> > > >> > > >  Founder, Principal Consultant
> > > >> > > >  Big Data Open Source Security LLC
> > > >> > > >  http://www.stealth.ly
> > > >> > > >  Twitter: @allthingshadoop
> > > >><http://www.twitter.com/allthingshadoop>
> > > >> > > > ********************************************/
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> -- Guozhang
> > > >>
> > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations

Posted by Guozhang Wang <wa...@gmail.com>.
I feel the benefits of lowering the development bar for new clients does
not worth the complexity we need to introduce in the server side, as today
the clients just need one more request type (metadata request) to send the
produce / fetch to the right brokers, whereas re-routing mechanism will
result in complicated between-brokers communication patterns that
potentially impact Kafka performance and making debugging / trouble
shooting much harder.

An alternative way to ease the development of the clients is to use a proxy
in front of the kafka servers, like the rest proxy we have built before,
which we use for non-java clients primarily but also can be treated as
handling cluster metadata discovery for clients. Comparing to the
re-routing idea, the proxy also introduces two-hops but its layered
architecture is simpler.

Guozhang


On Sun, Feb 8, 2015 at 8:00 AM, Jay Kreps <ja...@gmail.com> wrote:

> Hey Jiangjie,
>
> Re routing support doesn't force clients to use it. Java and all existing
> clients would work as now where request are intelligently routed by the
> client, but this would lower the bar for new clients. That said I agree the
> case for reroute get admin commands is much stronger than data.
>
> The idea of separating admin/metadata from would definitely solve some
> problems but it would also add a lot of complexity--new ports, thread
> pools, etc. this is an interesting idea to think over but I'm not sure if
> it's worth it. Probably a separate effort in any case.
>
> -jay
>
> On Friday, February 6, 2015, Jiangjie Qin <jq...@linkedin.com.invalid>
> wrote:
>
> > I¹m a little bit concerned about the request routers among brokers.
> > Typically we have a dominant percentage of produce and fetch
> > request/response. Routing them from one broker to another seems not
> wanted.
> > Also I think we generally have two types of requests/responses: data
> > related and admin related. It is typically a good practice to separate
> > data plain from control plain. That suggests we should have another admin
> > port to serve those admin requests and probably have different
> > authentication/authorization from the data port.
> >
> > Jiangjie (Becket) Qin
> >
> > On 2/6/15, 11:18 AM, "Joe Stein" <jo...@stealth.ly> wrote:
> >
> > >I updated the installation and sample usage for the existing patches on
> > >the
> > >KIP site
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and
> > >+centralized+administrative+operations
> > >
> > >There are still a few pending items here.
> > >
> > >1) There was already some discussion about using the Broker that is the
> > >Controller here https://issues.apache.org/jira/browse/KAFKA-1772 and we
> > >should elaborate on that more in the thread or agree we are ok with
> admin
> > >asking for the controller to talk to and then just sending that broker
> the
> > >admin tasks.
> > >
> > >2) I like this idea https://issues.apache.org/jira/browse/KAFKA-1912
> but
> > >we
> > >can refactor after KAFK-1694 committed, no? I know folks just want to
> talk
> > >to the broker that is the controller. It may even become useful to have
> > >the
> > >controller run on a broker that isn't even a topic broker anymore (small
> > >can of worms I am opening here but it elaborates on Guozhang's hot spot
> > >point.
> > >
> > >3) anymore feedback?
> > >
> > >- Joe Stein
> > >
> > >On Fri, Jan 23, 2015 at 3:15 PM, Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > >> A centralized admin operation protocol would be very useful.
> > >>
> > >> One more general comment here is that controller is originally
> designed
> > >>to
> > >> only talk to other brokers through ControllerChannel, while the broker
> > >> instance which carries the current controller is agnostic of its
> > >>existence,
> > >> and use KafkaApis to handle general Kafka requests. Having all admin
> > >> requests redirected to the controller instance will force the broker
> to
> > >>be
> > >> aware of its carried controller, and access its internal data for
> > >>handling
> > >> these requests. Plus with the number of clients out of Kafka's
> control,
> > >> this may easily cause the controller to be a hot spot in terms of
> > >>request
> > >> load.
> > >>
> > >>
> > >> On Thu, Jan 22, 2015 at 10:09 PM, Joe Stein <jo...@stealth.ly>
> > >>wrote:
> > >>
> > >> > inline
> > >> >
> > >> > On Thu, Jan 22, 2015 at 11:59 PM, Jay Kreps <ja...@gmail.com>
> > >>wrote:
> > >> >
> > >> > > Hey Joe,
> > >> > >
> > >> > > This is great. A few comments on KIP-4
> > >> > >
> > >> > > 1. This is much needed functionality, but there are a lot of the
> so
> > >> let's
> > >> > > really think these protocols through. We really want to end up
> with
> > >>a
> > >> set
> > >> > > of well thought-out, orthoganol apis. For this reason I think it
> is
> > >> > really
> > >> > > important to think through the end state even if that includes
> APIs
> > >>we
> > >> > > won't implement in the first phase.
> > >> > >
> > >> >
> > >> > ok
> > >> >
> > >> >
> > >> > >
> > >> > > 2. Let's please please please wait until we have switched the
> server
> > >> over
> > >> > > to the new java protocol definitions. If we add upteen more ad hoc
> > >> scala
> > >> > > objects that is just generating more work for the conversion we
> > >>know we
> > >> > > have to do.
> > >> > >
> > >> >
> > >> > ok :)
> > >> >
> > >> >
> > >> > >
> > >> > > 3. This proposal introduces a new type of optional parameter. This
> > >>is
> > >> > > inconsistent with everything else in the protocol where we use -1
> or
> > >> some
> > >> > > other marker value. You could argue either way but let's stick
> with
> > >> that
> > >> > > for consistency. For clients that implemented the protocol in a
> > >>better
> > >> > way
> > >> > > than our scala code these basic primitives are hard to change.
> > >> > >
> > >> >
> > >> > yes, less confusing, ok.
> > >> >
> > >> >
> > >> > >
> > >> > > 4. ClusterMetadata: This seems to duplicate TopicMetadataRequest
> > >>which
> > >> > has
> > >> > > brokers, topics, and partitions. I think we should rename that
> > >>request
> > >> > > ClusterMetadataRequest (or just MetadataRequest) and include the
> id
> > >>of
> > >> > the
> > >> > > controller. Or are there other things we could add here?
> > >> > >
> > >> >
> > >> > We could add broker version to it.
> > >> >
> > >> >
> > >> > >
> > >> > > 5. We have a tendency to try to make a lot of requests that can
> > >>only go
> > >> > to
> > >> > > particular nodes. This adds a lot of burden for client
> > >>implementations
> > >> > (it
> > >> > > sounds easy but each discovery can fail in many parts so it ends
> up
> > >> > being a
> > >> > > full state machine to do right). I think we should consider making
> > >> admin
> > >> > > commands and ideally as many of the other apis as possible
> > >>available on
> > >> > all
> > >> > > brokers and just redirect to the controller on the broker side.
> > >>Perhaps
> > >> > > there would be a general way to encapsulate this re-routing
> > >>behavior.
> > >> > >
> > >> >
> > >> > If we do that then we should also preserve what we have and do both.
> > >>The
> > >> > client can then decide "do I want to go to any broker and proxy" or
> > >>just
> > >> > "go to controller and run admin task". Lots of folks have seen
> > >> controllers
> > >> > come under distress because of their producers/consumers. There is
> > >>ticket
> > >> > too for controller elect and re-elect
> > >> > https://issues.apache.org/jira/browse/KAFKA-1778 so you can force
> it
> > >>to
> > >> a
> > >> > broker that has 0 load.
> > >> >
> > >> >
> > >> > >
> > >> > > 6. We should probably normalize the key value pairs used for
> configs
> > >> > rather
> > >> > > than embedding a new formatting. So two strings rather than one
> > >>with an
> > >> > > internal equals sign.
> > >> > >
> > >> >
> > >> > ok
> > >> >
> > >> >
> > >> > >
> > >> > > 7. Is the postcondition of these APIs that the command has begun
> or
> > >> that
> > >> > > the command has been completed? It is a lot more usable if the
> > >>command
> > >> > has
> > >> > > been completed so you know that if you create a topic and then
> > >>publish
> > >> to
> > >> > > it you won't get an exception about there being no such topic.
> > >> > >
> > >> >
> > >> > We should define that more. There needs to be some more state there,
> > >>yes.
> > >> >
> > >> > We should try to cover
> > >>https://issues.apache.org/jira/browse/KAFKA-1125
> > >> > within what we come up with.
> > >> >
> > >> >
> > >> > >
> > >> > > 8. Describe topic and list topics duplicate a lot of stuff in the
> > >> > metadata
> > >> > > request. Is there a reason to give back topics marked for
> deletion?
> > >>I
> > >> > feel
> > >> > > like if we just make the post-condition of the delete command be
> > >>that
> > >> the
> > >> > > topic is deleted that will get rid of the need for this right? And
> > >>it
> > >> > will
> > >> > > be much more intuitive.
> > >> > >
> > >> >
> > >> > I will go back and look through it.
> > >> >
> > >> >
> > >> > >
> > >> > > 9. Should we consider batching these requests? We have generally
> > >>tried
> > >> to
> > >> > > allow multiple operations to be batched. My suspicion is that
> > >>without
> > >> > this
> > >> > > we will get a lot of code that does something like
> > >> > >    for(topic: adminClient.listTopics())
> > >> > >       adminClient.describeTopic(topic)
> > >> > > this code will work great when you test on 5 topics but not do as
> > >>well
> > >> if
> > >> > > you have 50k.
> > >> > >
> > >> >
> > >> > So => Input is a list of topics (or none for all) and a batch
> response
> > >> from
> > >> > the controller (which could be routed through another broker) of the
> > >> entire
> > >> > response? We could introduce a Batch keyword to explicitly show the
> > >>usage
> > >> > of it.
> > >> >
> > >> >
> > >> > > 10. I think we should also discuss how we want to expose a
> > >>programmatic
> > >> > JVM
> > >> > > client api for these operations. Currently people rely on
> AdminUtils
> > >> > which
> > >> > > is totally sketchy. I think we probably need another client under
> > >> > clients/
> > >> > > that exposes administrative functionality. We will need this just
> to
> > >> > > properly test the new apis, I suspect. We should figure out that
> > >>API.
> > >> > >
> > >> >
> > >> > We were talking about that here
> > >> > https://issues.apache.org/jira/browse/KAFKA-1774 and wrote it in
> java
> > >> > https://reviews.apache.org/r/29301/diff/7/?page=4#75 so we could do
> > >> > something like that, sure.
> > >> >
> > >> >
> > >> > >
> > >> > > 11. The other information that would be really useful to get would
> > >>be
> > >> > > information about partitions--how much data is in the partition,
> > >>what
> > >> are
> > >> > > the segment offsets, what is the log-end offset (i.e. last
> offset),
> > >> what
> > >> > is
> > >> > > the compaction point, etc. I think that done right this would be
> the
> > >> > > successor to the very awkward OffsetRequest we have today.
> > >> > >
> > >> >
> > >> > yes!
> > >> >
> > >> >
> > >> > >
> > >> > > -Jay
> > >> > >
> > >> > > On Wed, Jan 21, 2015 at 10:27 PM, Joe Stein <joe.stein@stealth.ly
> >
> > >> > wrote:
> > >> > >
> > >> > > > Hi, created a KIP
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+an
> > >>d+centralized+administrative+operations
> > >> > > >
> > >> > > > JIRA https://issues.apache.org/jira/browse/KAFKA-1694
> > >> > > >
> > >> > > > /*******************************************
> > >> > > >  Joe Stein
> > >> > > >  Founder, Principal Consultant
> > >> > > >  Big Data Open Source Security LLC
> > >> > > >  http://www.stealth.ly
> > >> > > >  Twitter: @allthingshadoop
> > >><http://www.twitter.com/allthingshadoop>
> > >> > > > ********************************************/
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> >
> >
>



-- 
-- Guozhang

[DISCUSS] KIP-4 - Command line and centralized administrative operations

Posted by Jay Kreps <ja...@gmail.com>.
Hey Jiangjie,

Re routing support doesn't force clients to use it. Java and all existing
clients would work as now where request are intelligently routed by the
client, but this would lower the bar for new clients. That said I agree the
case for reroute get admin commands is much stronger than data.

The idea of separating admin/metadata from would definitely solve some
problems but it would also add a lot of complexity--new ports, thread
pools, etc. this is an interesting idea to think over but I'm not sure if
it's worth it. Probably a separate effort in any case.

-jay

On Friday, February 6, 2015, Jiangjie Qin <jq...@linkedin.com.invalid> wrote:

> I¹m a little bit concerned about the request routers among brokers.
> Typically we have a dominant percentage of produce and fetch
> request/response. Routing them from one broker to another seems not wanted.
> Also I think we generally have two types of requests/responses: data
> related and admin related. It is typically a good practice to separate
> data plain from control plain. That suggests we should have another admin
> port to serve those admin requests and probably have different
> authentication/authorization from the data port.
>
> Jiangjie (Becket) Qin
>
> On 2/6/15, 11:18 AM, "Joe Stein" <jo...@stealth.ly> wrote:
>
> >I updated the installation and sample usage for the existing patches on
> >the
> >KIP site
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and
> >+centralized+administrative+operations
> >
> >There are still a few pending items here.
> >
> >1) There was already some discussion about using the Broker that is the
> >Controller here https://issues.apache.org/jira/browse/KAFKA-1772 and we
> >should elaborate on that more in the thread or agree we are ok with admin
> >asking for the controller to talk to and then just sending that broker the
> >admin tasks.
> >
> >2) I like this idea https://issues.apache.org/jira/browse/KAFKA-1912 but
> >we
> >can refactor after KAFK-1694 committed, no? I know folks just want to talk
> >to the broker that is the controller. It may even become useful to have
> >the
> >controller run on a broker that isn't even a topic broker anymore (small
> >can of worms I am opening here but it elaborates on Guozhang's hot spot
> >point.
> >
> >3) anymore feedback?
> >
> >- Joe Stein
> >
> >On Fri, Jan 23, 2015 at 3:15 PM, Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> >> A centralized admin operation protocol would be very useful.
> >>
> >> One more general comment here is that controller is originally designed
> >>to
> >> only talk to other brokers through ControllerChannel, while the broker
> >> instance which carries the current controller is agnostic of its
> >>existence,
> >> and use KafkaApis to handle general Kafka requests. Having all admin
> >> requests redirected to the controller instance will force the broker to
> >>be
> >> aware of its carried controller, and access its internal data for
> >>handling
> >> these requests. Plus with the number of clients out of Kafka's control,
> >> this may easily cause the controller to be a hot spot in terms of
> >>request
> >> load.
> >>
> >>
> >> On Thu, Jan 22, 2015 at 10:09 PM, Joe Stein <jo...@stealth.ly>
> >>wrote:
> >>
> >> > inline
> >> >
> >> > On Thu, Jan 22, 2015 at 11:59 PM, Jay Kreps <ja...@gmail.com>
> >>wrote:
> >> >
> >> > > Hey Joe,
> >> > >
> >> > > This is great. A few comments on KIP-4
> >> > >
> >> > > 1. This is much needed functionality, but there are a lot of the so
> >> let's
> >> > > really think these protocols through. We really want to end up with
> >>a
> >> set
> >> > > of well thought-out, orthoganol apis. For this reason I think it is
> >> > really
> >> > > important to think through the end state even if that includes APIs
> >>we
> >> > > won't implement in the first phase.
> >> > >
> >> >
> >> > ok
> >> >
> >> >
> >> > >
> >> > > 2. Let's please please please wait until we have switched the server
> >> over
> >> > > to the new java protocol definitions. If we add upteen more ad hoc
> >> scala
> >> > > objects that is just generating more work for the conversion we
> >>know we
> >> > > have to do.
> >> > >
> >> >
> >> > ok :)
> >> >
> >> >
> >> > >
> >> > > 3. This proposal introduces a new type of optional parameter. This
> >>is
> >> > > inconsistent with everything else in the protocol where we use -1 or
> >> some
> >> > > other marker value. You could argue either way but let's stick with
> >> that
> >> > > for consistency. For clients that implemented the protocol in a
> >>better
> >> > way
> >> > > than our scala code these basic primitives are hard to change.
> >> > >
> >> >
> >> > yes, less confusing, ok.
> >> >
> >> >
> >> > >
> >> > > 4. ClusterMetadata: This seems to duplicate TopicMetadataRequest
> >>which
> >> > has
> >> > > brokers, topics, and partitions. I think we should rename that
> >>request
> >> > > ClusterMetadataRequest (or just MetadataRequest) and include the id
> >>of
> >> > the
> >> > > controller. Or are there other things we could add here?
> >> > >
> >> >
> >> > We could add broker version to it.
> >> >
> >> >
> >> > >
> >> > > 5. We have a tendency to try to make a lot of requests that can
> >>only go
> >> > to
> >> > > particular nodes. This adds a lot of burden for client
> >>implementations
> >> > (it
> >> > > sounds easy but each discovery can fail in many parts so it ends up
> >> > being a
> >> > > full state machine to do right). I think we should consider making
> >> admin
> >> > > commands and ideally as many of the other apis as possible
> >>available on
> >> > all
> >> > > brokers and just redirect to the controller on the broker side.
> >>Perhaps
> >> > > there would be a general way to encapsulate this re-routing
> >>behavior.
> >> > >
> >> >
> >> > If we do that then we should also preserve what we have and do both.
> >>The
> >> > client can then decide "do I want to go to any broker and proxy" or
> >>just
> >> > "go to controller and run admin task". Lots of folks have seen
> >> controllers
> >> > come under distress because of their producers/consumers. There is
> >>ticket
> >> > too for controller elect and re-elect
> >> > https://issues.apache.org/jira/browse/KAFKA-1778 so you can force it
> >>to
> >> a
> >> > broker that has 0 load.
> >> >
> >> >
> >> > >
> >> > > 6. We should probably normalize the key value pairs used for configs
> >> > rather
> >> > > than embedding a new formatting. So two strings rather than one
> >>with an
> >> > > internal equals sign.
> >> > >
> >> >
> >> > ok
> >> >
> >> >
> >> > >
> >> > > 7. Is the postcondition of these APIs that the command has begun or
> >> that
> >> > > the command has been completed? It is a lot more usable if the
> >>command
> >> > has
> >> > > been completed so you know that if you create a topic and then
> >>publish
> >> to
> >> > > it you won't get an exception about there being no such topic.
> >> > >
> >> >
> >> > We should define that more. There needs to be some more state there,
> >>yes.
> >> >
> >> > We should try to cover
> >>https://issues.apache.org/jira/browse/KAFKA-1125
> >> > within what we come up with.
> >> >
> >> >
> >> > >
> >> > > 8. Describe topic and list topics duplicate a lot of stuff in the
> >> > metadata
> >> > > request. Is there a reason to give back topics marked for deletion?
> >>I
> >> > feel
> >> > > like if we just make the post-condition of the delete command be
> >>that
> >> the
> >> > > topic is deleted that will get rid of the need for this right? And
> >>it
> >> > will
> >> > > be much more intuitive.
> >> > >
> >> >
> >> > I will go back and look through it.
> >> >
> >> >
> >> > >
> >> > > 9. Should we consider batching these requests? We have generally
> >>tried
> >> to
> >> > > allow multiple operations to be batched. My suspicion is that
> >>without
> >> > this
> >> > > we will get a lot of code that does something like
> >> > >    for(topic: adminClient.listTopics())
> >> > >       adminClient.describeTopic(topic)
> >> > > this code will work great when you test on 5 topics but not do as
> >>well
> >> if
> >> > > you have 50k.
> >> > >
> >> >
> >> > So => Input is a list of topics (or none for all) and a batch response
> >> from
> >> > the controller (which could be routed through another broker) of the
> >> entire
> >> > response? We could introduce a Batch keyword to explicitly show the
> >>usage
> >> > of it.
> >> >
> >> >
> >> > > 10. I think we should also discuss how we want to expose a
> >>programmatic
> >> > JVM
> >> > > client api for these operations. Currently people rely on AdminUtils
> >> > which
> >> > > is totally sketchy. I think we probably need another client under
> >> > clients/
> >> > > that exposes administrative functionality. We will need this just to
> >> > > properly test the new apis, I suspect. We should figure out that
> >>API.
> >> > >
> >> >
> >> > We were talking about that here
> >> > https://issues.apache.org/jira/browse/KAFKA-1774 and wrote it in java
> >> > https://reviews.apache.org/r/29301/diff/7/?page=4#75 so we could do
> >> > something like that, sure.
> >> >
> >> >
> >> > >
> >> > > 11. The other information that would be really useful to get would
> >>be
> >> > > information about partitions--how much data is in the partition,
> >>what
> >> are
> >> > > the segment offsets, what is the log-end offset (i.e. last offset),
> >> what
> >> > is
> >> > > the compaction point, etc. I think that done right this would be the
> >> > > successor to the very awkward OffsetRequest we have today.
> >> > >
> >> >
> >> > yes!
> >> >
> >> >
> >> > >
> >> > > -Jay
> >> > >
> >> > > On Wed, Jan 21, 2015 at 10:27 PM, Joe Stein <jo...@stealth.ly>
> >> > wrote:
> >> > >
> >> > > > Hi, created a KIP
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+an
> >>d+centralized+administrative+operations
> >> > > >
> >> > > > JIRA https://issues.apache.org/jira/browse/KAFKA-1694
> >> > > >
> >> > > > /*******************************************
> >> > > >  Joe Stein
> >> > > >  Founder, Principal Consultant
> >> > > >  Big Data Open Source Security LLC
> >> > > >  http://www.stealth.ly
> >> > > >  Twitter: @allthingshadoop
> >><http://www.twitter.com/allthingshadoop>
> >> > > > ********************************************/
> >> > > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
>
>

Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.
I¹m a little bit concerned about the request routers among brokers.
Typically we have a dominant percentage of produce and fetch
request/response. Routing them from one broker to another seems not wanted.
Also I think we generally have two types of requests/responses: data
related and admin related. It is typically a good practice to separate
data plain from control plain. That suggests we should have another admin
port to serve those admin requests and probably have different
authentication/authorization from the data port.

Jiangjie (Becket) Qin

On 2/6/15, 11:18 AM, "Joe Stein" <jo...@stealth.ly> wrote:

>I updated the installation and sample usage for the existing patches on
>the
>KIP site
>https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and
>+centralized+administrative+operations
>
>There are still a few pending items here.
>
>1) There was already some discussion about using the Broker that is the
>Controller here https://issues.apache.org/jira/browse/KAFKA-1772 and we
>should elaborate on that more in the thread or agree we are ok with admin
>asking for the controller to talk to and then just sending that broker the
>admin tasks.
>
>2) I like this idea https://issues.apache.org/jira/browse/KAFKA-1912 but
>we
>can refactor after KAFK-1694 committed, no? I know folks just want to talk
>to the broker that is the controller. It may even become useful to have
>the
>controller run on a broker that isn't even a topic broker anymore (small
>can of worms I am opening here but it elaborates on Guozhang's hot spot
>point.
>
>3) anymore feedback?
>
>- Joe Stein
>
>On Fri, Jan 23, 2015 at 3:15 PM, Guozhang Wang <wa...@gmail.com> wrote:
>
>> A centralized admin operation protocol would be very useful.
>>
>> One more general comment here is that controller is originally designed
>>to
>> only talk to other brokers through ControllerChannel, while the broker
>> instance which carries the current controller is agnostic of its
>>existence,
>> and use KafkaApis to handle general Kafka requests. Having all admin
>> requests redirected to the controller instance will force the broker to
>>be
>> aware of its carried controller, and access its internal data for
>>handling
>> these requests. Plus with the number of clients out of Kafka's control,
>> this may easily cause the controller to be a hot spot in terms of
>>request
>> load.
>>
>>
>> On Thu, Jan 22, 2015 at 10:09 PM, Joe Stein <jo...@stealth.ly>
>>wrote:
>>
>> > inline
>> >
>> > On Thu, Jan 22, 2015 at 11:59 PM, Jay Kreps <ja...@gmail.com>
>>wrote:
>> >
>> > > Hey Joe,
>> > >
>> > > This is great. A few comments on KIP-4
>> > >
>> > > 1. This is much needed functionality, but there are a lot of the so
>> let's
>> > > really think these protocols through. We really want to end up with
>>a
>> set
>> > > of well thought-out, orthoganol apis. For this reason I think it is
>> > really
>> > > important to think through the end state even if that includes APIs
>>we
>> > > won't implement in the first phase.
>> > >
>> >
>> > ok
>> >
>> >
>> > >
>> > > 2. Let's please please please wait until we have switched the server
>> over
>> > > to the new java protocol definitions. If we add upteen more ad hoc
>> scala
>> > > objects that is just generating more work for the conversion we
>>know we
>> > > have to do.
>> > >
>> >
>> > ok :)
>> >
>> >
>> > >
>> > > 3. This proposal introduces a new type of optional parameter. This
>>is
>> > > inconsistent with everything else in the protocol where we use -1 or
>> some
>> > > other marker value. You could argue either way but let's stick with
>> that
>> > > for consistency. For clients that implemented the protocol in a
>>better
>> > way
>> > > than our scala code these basic primitives are hard to change.
>> > >
>> >
>> > yes, less confusing, ok.
>> >
>> >
>> > >
>> > > 4. ClusterMetadata: This seems to duplicate TopicMetadataRequest
>>which
>> > has
>> > > brokers, topics, and partitions. I think we should rename that
>>request
>> > > ClusterMetadataRequest (or just MetadataRequest) and include the id
>>of
>> > the
>> > > controller. Or are there other things we could add here?
>> > >
>> >
>> > We could add broker version to it.
>> >
>> >
>> > >
>> > > 5. We have a tendency to try to make a lot of requests that can
>>only go
>> > to
>> > > particular nodes. This adds a lot of burden for client
>>implementations
>> > (it
>> > > sounds easy but each discovery can fail in many parts so it ends up
>> > being a
>> > > full state machine to do right). I think we should consider making
>> admin
>> > > commands and ideally as many of the other apis as possible
>>available on
>> > all
>> > > brokers and just redirect to the controller on the broker side.
>>Perhaps
>> > > there would be a general way to encapsulate this re-routing
>>behavior.
>> > >
>> >
>> > If we do that then we should also preserve what we have and do both.
>>The
>> > client can then decide "do I want to go to any broker and proxy" or
>>just
>> > "go to controller and run admin task". Lots of folks have seen
>> controllers
>> > come under distress because of their producers/consumers. There is
>>ticket
>> > too for controller elect and re-elect
>> > https://issues.apache.org/jira/browse/KAFKA-1778 so you can force it
>>to
>> a
>> > broker that has 0 load.
>> >
>> >
>> > >
>> > > 6. We should probably normalize the key value pairs used for configs
>> > rather
>> > > than embedding a new formatting. So two strings rather than one
>>with an
>> > > internal equals sign.
>> > >
>> >
>> > ok
>> >
>> >
>> > >
>> > > 7. Is the postcondition of these APIs that the command has begun or
>> that
>> > > the command has been completed? It is a lot more usable if the
>>command
>> > has
>> > > been completed so you know that if you create a topic and then
>>publish
>> to
>> > > it you won't get an exception about there being no such topic.
>> > >
>> >
>> > We should define that more. There needs to be some more state there,
>>yes.
>> >
>> > We should try to cover
>>https://issues.apache.org/jira/browse/KAFKA-1125
>> > within what we come up with.
>> >
>> >
>> > >
>> > > 8. Describe topic and list topics duplicate a lot of stuff in the
>> > metadata
>> > > request. Is there a reason to give back topics marked for deletion?
>>I
>> > feel
>> > > like if we just make the post-condition of the delete command be
>>that
>> the
>> > > topic is deleted that will get rid of the need for this right? And
>>it
>> > will
>> > > be much more intuitive.
>> > >
>> >
>> > I will go back and look through it.
>> >
>> >
>> > >
>> > > 9. Should we consider batching these requests? We have generally
>>tried
>> to
>> > > allow multiple operations to be batched. My suspicion is that
>>without
>> > this
>> > > we will get a lot of code that does something like
>> > >    for(topic: adminClient.listTopics())
>> > >       adminClient.describeTopic(topic)
>> > > this code will work great when you test on 5 topics but not do as
>>well
>> if
>> > > you have 50k.
>> > >
>> >
>> > So => Input is a list of topics (or none for all) and a batch response
>> from
>> > the controller (which could be routed through another broker) of the
>> entire
>> > response? We could introduce a Batch keyword to explicitly show the
>>usage
>> > of it.
>> >
>> >
>> > > 10. I think we should also discuss how we want to expose a
>>programmatic
>> > JVM
>> > > client api for these operations. Currently people rely on AdminUtils
>> > which
>> > > is totally sketchy. I think we probably need another client under
>> > clients/
>> > > that exposes administrative functionality. We will need this just to
>> > > properly test the new apis, I suspect. We should figure out that
>>API.
>> > >
>> >
>> > We were talking about that here
>> > https://issues.apache.org/jira/browse/KAFKA-1774 and wrote it in java
>> > https://reviews.apache.org/r/29301/diff/7/?page=4#75 so we could do
>> > something like that, sure.
>> >
>> >
>> > >
>> > > 11. The other information that would be really useful to get would
>>be
>> > > information about partitions--how much data is in the partition,
>>what
>> are
>> > > the segment offsets, what is the log-end offset (i.e. last offset),
>> what
>> > is
>> > > the compaction point, etc. I think that done right this would be the
>> > > successor to the very awkward OffsetRequest we have today.
>> > >
>> >
>> > yes!
>> >
>> >
>> > >
>> > > -Jay
>> > >
>> > > On Wed, Jan 21, 2015 at 10:27 PM, Joe Stein <jo...@stealth.ly>
>> > wrote:
>> > >
>> > > > Hi, created a KIP
>> > > >
>> > > >
>> > >
>> >
>> 
>>https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+an
>>d+centralized+administrative+operations
>> > > >
>> > > > JIRA https://issues.apache.org/jira/browse/KAFKA-1694
>> > > >
>> > > > /*******************************************
>> > > >  Joe Stein
>> > > >  Founder, Principal Consultant
>> > > >  Big Data Open Source Security LLC
>> > > >  http://www.stealth.ly
>> > > >  Twitter: @allthingshadoop
>><http://www.twitter.com/allthingshadoop>
>> > > > ********************************************/
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>


Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations

Posted by Jay Kreps <ja...@gmail.com>.
Hey Joe,

I think this is proposing several things:
1. A new command line utility. This isn't really fully specified here.
There is sample usage but I actually don't really understand what all the
commands will be. Also, presumably this will replace the existing shell
scripts, right? We obviously don't want to be in a state where we have
both...
2. A new set of language agnostic administrative protocols.
3. A new Java API for issuing administrative requests using the protocol. I
don't see any discussion on what this will look like.

It might be easiest to tackle these one at a time, no? If not we really do
need to get a complete description at each layer as these are pretty core
public apis.

-Jay

On Fri, Feb 6, 2015 at 11:18 AM, Joe Stein <jo...@stealth.ly> wrote:

> I updated the installation and sample usage for the existing patches on the
> KIP site
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations
>
> There are still a few pending items here.
>
> 1) There was already some discussion about using the Broker that is the
> Controller here https://issues.apache.org/jira/browse/KAFKA-1772 and we
> should elaborate on that more in the thread or agree we are ok with admin
> asking for the controller to talk to and then just sending that broker the
> admin tasks.
>
> 2) I like this idea https://issues.apache.org/jira/browse/KAFKA-1912 but
> we
> can refactor after KAFK-1694 committed, no? I know folks just want to talk
> to the broker that is the controller. It may even become useful to have the
> controller run on a broker that isn't even a topic broker anymore (small
> can of worms I am opening here but it elaborates on Guozhang's hot spot
> point.
>
> 3) anymore feedback?
>
> - Joe Stein
>
> On Fri, Jan 23, 2015 at 3:15 PM, Guozhang Wang <wa...@gmail.com> wrote:
>
> > A centralized admin operation protocol would be very useful.
> >
> > One more general comment here is that controller is originally designed
> to
> > only talk to other brokers through ControllerChannel, while the broker
> > instance which carries the current controller is agnostic of its
> existence,
> > and use KafkaApis to handle general Kafka requests. Having all admin
> > requests redirected to the controller instance will force the broker to
> be
> > aware of its carried controller, and access its internal data for
> handling
> > these requests. Plus with the number of clients out of Kafka's control,
> > this may easily cause the controller to be a hot spot in terms of request
> > load.
> >
> >
> > On Thu, Jan 22, 2015 at 10:09 PM, Joe Stein <jo...@stealth.ly>
> wrote:
> >
> > > inline
> > >
> > > On Thu, Jan 22, 2015 at 11:59 PM, Jay Kreps <ja...@gmail.com>
> wrote:
> > >
> > > > Hey Joe,
> > > >
> > > > This is great. A few comments on KIP-4
> > > >
> > > > 1. This is much needed functionality, but there are a lot of the so
> > let's
> > > > really think these protocols through. We really want to end up with a
> > set
> > > > of well thought-out, orthoganol apis. For this reason I think it is
> > > really
> > > > important to think through the end state even if that includes APIs
> we
> > > > won't implement in the first phase.
> > > >
> > >
> > > ok
> > >
> > >
> > > >
> > > > 2. Let's please please please wait until we have switched the server
> > over
> > > > to the new java protocol definitions. If we add upteen more ad hoc
> > scala
> > > > objects that is just generating more work for the conversion we know
> we
> > > > have to do.
> > > >
> > >
> > > ok :)
> > >
> > >
> > > >
> > > > 3. This proposal introduces a new type of optional parameter. This is
> > > > inconsistent with everything else in the protocol where we use -1 or
> > some
> > > > other marker value. You could argue either way but let's stick with
> > that
> > > > for consistency. For clients that implemented the protocol in a
> better
> > > way
> > > > than our scala code these basic primitives are hard to change.
> > > >
> > >
> > > yes, less confusing, ok.
> > >
> > >
> > > >
> > > > 4. ClusterMetadata: This seems to duplicate TopicMetadataRequest
> which
> > > has
> > > > brokers, topics, and partitions. I think we should rename that
> request
> > > > ClusterMetadataRequest (or just MetadataRequest) and include the id
> of
> > > the
> > > > controller. Or are there other things we could add here?
> > > >
> > >
> > > We could add broker version to it.
> > >
> > >
> > > >
> > > > 5. We have a tendency to try to make a lot of requests that can only
> go
> > > to
> > > > particular nodes. This adds a lot of burden for client
> implementations
> > > (it
> > > > sounds easy but each discovery can fail in many parts so it ends up
> > > being a
> > > > full state machine to do right). I think we should consider making
> > admin
> > > > commands and ideally as many of the other apis as possible available
> on
> > > all
> > > > brokers and just redirect to the controller on the broker side.
> Perhaps
> > > > there would be a general way to encapsulate this re-routing behavior.
> > > >
> > >
> > > If we do that then we should also preserve what we have and do both.
> The
> > > client can then decide "do I want to go to any broker and proxy" or
> just
> > > "go to controller and run admin task". Lots of folks have seen
> > controllers
> > > come under distress because of their producers/consumers. There is
> ticket
> > > too for controller elect and re-elect
> > > https://issues.apache.org/jira/browse/KAFKA-1778 so you can force it
> to
> > a
> > > broker that has 0 load.
> > >
> > >
> > > >
> > > > 6. We should probably normalize the key value pairs used for configs
> > > rather
> > > > than embedding a new formatting. So two strings rather than one with
> an
> > > > internal equals sign.
> > > >
> > >
> > > ok
> > >
> > >
> > > >
> > > > 7. Is the postcondition of these APIs that the command has begun or
> > that
> > > > the command has been completed? It is a lot more usable if the
> command
> > > has
> > > > been completed so you know that if you create a topic and then
> publish
> > to
> > > > it you won't get an exception about there being no such topic.
> > > >
> > >
> > > We should define that more. There needs to be some more state there,
> yes.
> > >
> > > We should try to cover
> https://issues.apache.org/jira/browse/KAFKA-1125
> > > within what we come up with.
> > >
> > >
> > > >
> > > > 8. Describe topic and list topics duplicate a lot of stuff in the
> > > metadata
> > > > request. Is there a reason to give back topics marked for deletion? I
> > > feel
> > > > like if we just make the post-condition of the delete command be that
> > the
> > > > topic is deleted that will get rid of the need for this right? And it
> > > will
> > > > be much more intuitive.
> > > >
> > >
> > > I will go back and look through it.
> > >
> > >
> > > >
> > > > 9. Should we consider batching these requests? We have generally
> tried
> > to
> > > > allow multiple operations to be batched. My suspicion is that without
> > > this
> > > > we will get a lot of code that does something like
> > > >    for(topic: adminClient.listTopics())
> > > >       adminClient.describeTopic(topic)
> > > > this code will work great when you test on 5 topics but not do as
> well
> > if
> > > > you have 50k.
> > > >
> > >
> > > So => Input is a list of topics (or none for all) and a batch response
> > from
> > > the controller (which could be routed through another broker) of the
> > entire
> > > response? We could introduce a Batch keyword to explicitly show the
> usage
> > > of it.
> > >
> > >
> > > > 10. I think we should also discuss how we want to expose a
> programmatic
> > > JVM
> > > > client api for these operations. Currently people rely on AdminUtils
> > > which
> > > > is totally sketchy. I think we probably need another client under
> > > clients/
> > > > that exposes administrative functionality. We will need this just to
> > > > properly test the new apis, I suspect. We should figure out that API.
> > > >
> > >
> > > We were talking about that here
> > > https://issues.apache.org/jira/browse/KAFKA-1774 and wrote it in java
> > > https://reviews.apache.org/r/29301/diff/7/?page=4#75 so we could do
> > > something like that, sure.
> > >
> > >
> > > >
> > > > 11. The other information that would be really useful to get would be
> > > > information about partitions--how much data is in the partition, what
> > are
> > > > the segment offsets, what is the log-end offset (i.e. last offset),
> > what
> > > is
> > > > the compaction point, etc. I think that done right this would be the
> > > > successor to the very awkward OffsetRequest we have today.
> > > >
> > >
> > > yes!
> > >
> > >
> > > >
> > > > -Jay
> > > >
> > > > On Wed, Jan 21, 2015 at 10:27 PM, Joe Stein <jo...@stealth.ly>
> > > wrote:
> > > >
> > > > > Hi, created a KIP
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations
> > > > >
> > > > > JIRA https://issues.apache.org/jira/browse/KAFKA-1694
> > > > >
> > > > > /*******************************************
> > > > >  Joe Stein
> > > > >  Founder, Principal Consultant
> > > > >  Big Data Open Source Security LLC
> > > > >  http://www.stealth.ly
> > > > >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop
> >
> > > > > ********************************************/
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>