You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Manikumar <ma...@gmail.com> on 2018/07/12 16:56:25 UTC

KIP-327: Add describe all topics API to AdminClient

Hi all,

I have created a KIP to add describe all topics API to AdminClient .

https://cwiki.apache.org/confluence/display/KAFKA/KIP-327%3A+Add+describe+all+topics+API+to+AdminClient

Please take a look.

Thanks,

Re: KIP-327: Add describe all topics API to AdminClient

Posted by Colin McCabe <cm...@apache.org>.
Hi Stephane,

Pagniation would be useful.  But I think the more immediate need is to stop sending stuff over the wire that we don't even use.

For example, imagine that you have a cluster with 50,000 topics and your Consumer subscribes to abracadabra*. Perhaps there's actually only 3 topics  that match that regular expression.  But with the current system, the broker would send all 50,000 topics over the wire to the client.  Then the client applies the regular expression, and throws away 49,997 of those entries, and just uses the remaining 3.

With pagniation, you still have a huge load on the network and the broker from sending all that unnecessary data.  With server-side regular expressions, you could only send the stuff you need.

best,
Colin

On Sat, Jul 14, 2018, at 01:06, Stephane Maarek wrote:
> Why not paginate ? Then one can retrieve as many topics as desired ?
> 
> On Sat., 14 Jul. 2018, 4:15 pm Colin McCabe, <cm...@apache.org> wrote:
> 
> > Good point.  We should probably have a maximum number of results like
> > 1000 or something.  That can go in the request RPC as well...
> > Cheers,
> > Colin
> >
> > On Fri, Jul 13, 2018, at 18:15, Ted Yu wrote:
> > > bq. describe topics by a regular expression on the server side
> > >
> > > Should caution be taken if the regex doesn't filter ("*") ?
> > >
> > > Cheers
> > >
> > > On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe
> > > <cm...@apache.org> wrote:>
> > > > As Jason wrote, this won't scale as the number of partitions
> > > > increases.> > We already have users who have tens of thousands of
> > topics, or
> > > > more.  If> > you multiply that by 100x over the next few years, you
> > end up with
> > > > this API> > returning full information about millions of topics, which
> > clearly
> > > > doesn't> > work.
> > > >
> > > > We discussed this a lot in the original KIP-117 DISCUSS thread
> > > > which added> > the Java AdminClient.  ListTopics and DescribeTopics
> > were
> > > > deliberately kept> > separate because we understood that eventually a
> > single RPC would
> > > > not be> > able to return information about all the topics in the
> > cluster.  So
> > > > I have> > to vote -1 for this proposal as it stands.
> > > >
> > > > I do agree that adding a way to describe topics by a regular
> > > > expression on> > the server side would be very useful.  This would
> > also fix a major
> > > > scalability problem we have now, which is that when
> > > > subscribing via a> > regular expression, clients need to fetch the
> > full list of all
> > > > topics in> > the cluster and filter locally.
> > > >
> > > > I think a regular expression library like re2 would be ideal
> > > > for this> > purpose.  re2 is standardized and language-agnostic (it's
> > not tied
> > > > only to> > Java).  In contrast, Java regular expression change with
> > different
> > > > releases> > of the JDK (there were some changes in java 8, for
> > example).
> > > > Also, re2> > regular expressions are linear time, never exponential
> > time.  See
> > > > https://github.com/google/re2j
> > > >
> > > > regards,
> > > > Colin
> > > >
> > > >
> > > > On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> > > > > The KIP looks good to me.
> > > > > However, if there is willingness in the community to work on
> > > > > metadata> > > request with patterns, the feature proposed here and
> > filtering by
> > > > > '*' or> > > '.*' would be redundant.
> > > > >
> > > > > Andras
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson
> > > > > <ja...@confluent.io>> > wrote:
> > > > >
> > > > > > Hey Manikumar,
> > > > > >
> > > > > > As Kafka begins to scale to larger and larger numbers of
> > > > topics/partitions,
> > > > > > I'm a little concerned about the scalability of APIs such as
> > > > > > this. The> > API
> > > > > > looks benign, but imagine you have have a few million
> > > > > > partitions. We> > > > already expose similar APIs in the producer
> > and consumer, so
> > > > > > probably> > not
> > > > > > much additional harm to expose it in the AdminClient, but it
> > > > > > would be> > nice
> > > > > > to put a little thought into some longer term options. We should
> > > > > > be> > giving
> > > > > > users an efficient way to select a smaller set of the topics
> > > > > > they are> > > > interested in. We have always discussed adding
> > some filtering
> > > > > > support> > to
> > > > > > the Metadata API. Perhaps now is a good time to reconsider this?
> > > > > > We now> > > > have a convention for wildcard ACLs, so perhaps we
> > can do
> > > > > > something> > > > similar. Full regex support might be ideal given
> > the consumer's> > > > subscription API, but that is more challenging. What
> > do you
> > > > > > think?> > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > > On Thu, Jul 12, 2018 at 2:35 PM, Harsha <ka...@harsha.io> wrote:>
> > > > >
> > > > > > > Very useful. LGTM.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Harsha
> > > > > > >
> > > > > > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I have created a KIP to add describe all topics API to
> > > > > > > > AdminClient> > .
> > > > > > > >
> > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > > > > > > >
> > > > > > > > Please take a look.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > >
> > > > > >
> > > >
> >
> >

Re: KIP-327: Add describe all topics API to AdminClient

Posted by Colin McCabe <cm...@apache.org>.
Thanks, Manikumar.

best,
Colin

On Tue, Jul 17, 2018, at 19:44, Manikumar wrote:
> Closing this KIP in favor of adding filtering support to the Metadata API
> and KIP-142. Will open a new KIP when ready.
> Thanks for your reviews.
> 
> On Mon, Jul 16, 2018 at 8:38 AM Colin McCabe <cm...@apache.org> wrote:
> 
> > Thanks, Manikumar.  I've been meaning to bring up KIP-142 again.  It would
> > definitely be a nice improvement.
> >
> > best,
> > Colin
> >
> >
> > On Sat, Jul 14, 2018, at 08:51, Manikumar wrote:
> > > Hi Jason and Colin,
> > >
> > > Thanks for the feedback. I agree that having filtering support to the
> > > Metadata API would be useful and solves
> > > the scalability issues.
> > >
> > > But to implement specific use case of "describe all topics", regex
> > > support
> > > won't help. In any case user needs to
> > > call listTopics() to get topic list, and then make describeTopics()
> > > calls
> > > with a subset of the topics set.
> > > This leads to improving existing listTopics() API performance. Colin
> > > already raised a KIP for this: KIP-142
> > > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-142%3A+Add+ListTopicsRequest+to+efficiently+list+all+the+topics+in+a+cluster
> > >
> > >  .
> > > May be we should consider implementing KIP-142.
> > >
> > > Since we have support wildcard ACLs, Initially, I can explore
> > > prefixed/wildcards patterns support to Metadata API.
> > > We can later extend support for regular expressions.
> > >
> > > Thanks
> > >
> > >
> > >
> > > On Sat, Jul 14, 2018 at 2:42 PM Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > What if broker crashes before all the pages can be returned ?
> > > >
> > > > Cheers
> > > >
> > > > On Sat, Jul 14, 2018 at 1:07 AM Stephane Maarek <
> > > > stephane@simplemachines.com.au> wrote:
> > > >
> > > > > Why not paginate ? Then one can retrieve as many topics as desired ?
> > > > >
> > > > > On Sat., 14 Jul. 2018, 4:15 pm Colin McCabe, <cm...@apache.org>
> > wrote:
> > > > >
> > > > > > Good point.  We should probably have a maximum number of results
> > like
> > > > > > 1000 or something.  That can go in the request RPC as well...
> > > > > > Cheers,
> > > > > > Colin
> > > > > >
> > > > > > On Fri, Jul 13, 2018, at 18:15, Ted Yu wrote:
> > > > > > > bq. describe topics by a regular expression on the server side
> > > > > > >
> > > > > > > Should caution be taken if the regex doesn't filter ("*") ?
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe
> > > > > > > <cm...@apache.org> wrote:>
> > > > > > > > As Jason wrote, this won't scale as the number of partitions
> > > > > > > > increases.> > We already have users who have tens of thousands
> > of
> > > > > > topics, or
> > > > > > > > more.  If> > you multiply that by 100x over the next few
> > years, you
> > > > > > end up with
> > > > > > > > this API> > returning full information about millions of
> > topics,
> > > > > which
> > > > > > clearly
> > > > > > > > doesn't> > work.
> > > > > > > >
> > > > > > > > We discussed this a lot in the original KIP-117 DISCUSS thread
> > > > > > > > which added> > the Java AdminClient.  ListTopics and
> > DescribeTopics
> > > > > > were
> > > > > > > > deliberately kept> > separate because we understood that
> > > > eventually a
> > > > > > single RPC would
> > > > > > > > not be> > able to return information about all the topics in
> > the
> > > > > > cluster.  So
> > > > > > > > I have> > to vote -1 for this proposal as it stands.
> > > > > > > >
> > > > > > > > I do agree that adding a way to describe topics by a regular
> > > > > > > > expression on> > the server side would be very useful.  This
> > would
> > > > > > also fix a major
> > > > > > > > scalability problem we have now, which is that when
> > > > > > > > subscribing via a> > regular expression, clients need to fetch
> > the
> > > > > > full list of all
> > > > > > > > topics in> > the cluster and filter locally.
> > > > > > > >
> > > > > > > > I think a regular expression library like re2 would be ideal
> > > > > > > > for this> > purpose.  re2 is standardized and language-agnostic
> > > > (it's
> > > > > > not tied
> > > > > > > > only to> > Java).  In contrast, Java regular expression change
> > with
> > > > > > different
> > > > > > > > releases> > of the JDK (there were some changes in java 8, for
> > > > > > example).
> > > > > > > > Also, re2> > regular expressions are linear time, never
> > exponential
> > > > > > time.  See
> > > > > > > > https://github.com/google/re2j
> > > > > > > >
> > > > > > > > regards,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> > > > > > > > > The KIP looks good to me.
> > > > > > > > > However, if there is willingness in the community to work on
> > > > > > > > > metadata> > > request with patterns, the feature proposed
> > here
> > > > and
> > > > > > filtering by
> > > > > > > > > '*' or> > > '.*' would be redundant.
> > > > > > > > >
> > > > > > > > > Andras
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson
> > > > > > > > > <ja...@confluent.io>> > wrote:
> > > > > > > > >
> > > > > > > > > > Hey Manikumar,
> > > > > > > > > >
> > > > > > > > > > As Kafka begins to scale to larger and larger numbers of
> > > > > > > > topics/partitions,
> > > > > > > > > > I'm a little concerned about the scalability of APIs such
> > as
> > > > > > > > > > this. The> > API
> > > > > > > > > > looks benign, but imagine you have have a few million
> > > > > > > > > > partitions. We> > > > already expose similar APIs in the
> > > > producer
> > > > > > and consumer, so
> > > > > > > > > > probably> > not
> > > > > > > > > > much additional harm to expose it in the AdminClient, but
> > it
> > > > > > > > > > would be> > nice
> > > > > > > > > > to put a little thought into some longer term options. We
> > > > should
> > > > > > > > > > be> > giving
> > > > > > > > > > users an efficient way to select a smaller set of the
> > topics
> > > > > > > > > > they are> > > > interested in. We have always discussed
> > adding
> > > > > > some filtering
> > > > > > > > > > support> > to
> > > > > > > > > > the Metadata API. Perhaps now is a good time to reconsider
> > > > this?
> > > > > > > > > > We now> > > > have a convention for wildcard ACLs, so
> > perhaps
> > > > we
> > > > > > can do
> > > > > > > > > > something> > > > similar. Full regex support might be ideal
> > > > given
> > > > > > the consumer's> > > > subscription API, but that is more
> > challenging.
> > > > > What
> > > > > > do you
> > > > > > > > > > think?> > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Jason
> > > > > > > > > >
> > > > > > > > > > On Thu, Jul 12, 2018 at 2:35 PM, Harsha <ka...@harsha.io>
> > > > > wrote:>
> > > > > > > > >
> > > > > > > > > > > Very useful. LGTM.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Harsha
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > > > > > > > > > > Hi all,
> > > > > > > > > > > >
> > > > > > > > > > > > I have created a KIP to add describe all topics API to
> > > > > > > > > > > > AdminClient> > .
> > > > > > > > > > > >
> > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > > > > > > > > > > >
> > > > > > > > > > > > Please take a look.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> >

Re: KIP-327: Add describe all topics API to AdminClient

Posted by Manikumar <ma...@gmail.com>.
Closing this KIP in favor of adding filtering support to the Metadata API
and KIP-142. Will open a new KIP when ready.
Thanks for your reviews.

On Mon, Jul 16, 2018 at 8:38 AM Colin McCabe <cm...@apache.org> wrote:

> Thanks, Manikumar.  I've been meaning to bring up KIP-142 again.  It would
> definitely be a nice improvement.
>
> best,
> Colin
>
>
> On Sat, Jul 14, 2018, at 08:51, Manikumar wrote:
> > Hi Jason and Colin,
> >
> > Thanks for the feedback. I agree that having filtering support to the
> > Metadata API would be useful and solves
> > the scalability issues.
> >
> > But to implement specific use case of "describe all topics", regex
> > support
> > won't help. In any case user needs to
> > call listTopics() to get topic list, and then make describeTopics()
> > calls
> > with a subset of the topics set.
> > This leads to improving existing listTopics() API performance. Colin
> > already raised a KIP for this: KIP-142
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-142%3A+Add+ListTopicsRequest+to+efficiently+list+all+the+topics+in+a+cluster
> >
> >  .
> > May be we should consider implementing KIP-142.
> >
> > Since we have support wildcard ACLs, Initially, I can explore
> > prefixed/wildcards patterns support to Metadata API.
> > We can later extend support for regular expressions.
> >
> > Thanks
> >
> >
> >
> > On Sat, Jul 14, 2018 at 2:42 PM Ted Yu <yu...@gmail.com> wrote:
> >
> > > What if broker crashes before all the pages can be returned ?
> > >
> > > Cheers
> > >
> > > On Sat, Jul 14, 2018 at 1:07 AM Stephane Maarek <
> > > stephane@simplemachines.com.au> wrote:
> > >
> > > > Why not paginate ? Then one can retrieve as many topics as desired ?
> > > >
> > > > On Sat., 14 Jul. 2018, 4:15 pm Colin McCabe, <cm...@apache.org>
> wrote:
> > > >
> > > > > Good point.  We should probably have a maximum number of results
> like
> > > > > 1000 or something.  That can go in the request RPC as well...
> > > > > Cheers,
> > > > > Colin
> > > > >
> > > > > On Fri, Jul 13, 2018, at 18:15, Ted Yu wrote:
> > > > > > bq. describe topics by a regular expression on the server side
> > > > > >
> > > > > > Should caution be taken if the regex doesn't filter ("*") ?
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe
> > > > > > <cm...@apache.org> wrote:>
> > > > > > > As Jason wrote, this won't scale as the number of partitions
> > > > > > > increases.> > We already have users who have tens of thousands
> of
> > > > > topics, or
> > > > > > > more.  If> > you multiply that by 100x over the next few
> years, you
> > > > > end up with
> > > > > > > this API> > returning full information about millions of
> topics,
> > > > which
> > > > > clearly
> > > > > > > doesn't> > work.
> > > > > > >
> > > > > > > We discussed this a lot in the original KIP-117 DISCUSS thread
> > > > > > > which added> > the Java AdminClient.  ListTopics and
> DescribeTopics
> > > > > were
> > > > > > > deliberately kept> > separate because we understood that
> > > eventually a
> > > > > single RPC would
> > > > > > > not be> > able to return information about all the topics in
> the
> > > > > cluster.  So
> > > > > > > I have> > to vote -1 for this proposal as it stands.
> > > > > > >
> > > > > > > I do agree that adding a way to describe topics by a regular
> > > > > > > expression on> > the server side would be very useful.  This
> would
> > > > > also fix a major
> > > > > > > scalability problem we have now, which is that when
> > > > > > > subscribing via a> > regular expression, clients need to fetch
> the
> > > > > full list of all
> > > > > > > topics in> > the cluster and filter locally.
> > > > > > >
> > > > > > > I think a regular expression library like re2 would be ideal
> > > > > > > for this> > purpose.  re2 is standardized and language-agnostic
> > > (it's
> > > > > not tied
> > > > > > > only to> > Java).  In contrast, Java regular expression change
> with
> > > > > different
> > > > > > > releases> > of the JDK (there were some changes in java 8, for
> > > > > example).
> > > > > > > Also, re2> > regular expressions are linear time, never
> exponential
> > > > > time.  See
> > > > > > > https://github.com/google/re2j
> > > > > > >
> > > > > > > regards,
> > > > > > > Colin
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> > > > > > > > The KIP looks good to me.
> > > > > > > > However, if there is willingness in the community to work on
> > > > > > > > metadata> > > request with patterns, the feature proposed
> here
> > > and
> > > > > filtering by
> > > > > > > > '*' or> > > '.*' would be redundant.
> > > > > > > >
> > > > > > > > Andras
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson
> > > > > > > > <ja...@confluent.io>> > wrote:
> > > > > > > >
> > > > > > > > > Hey Manikumar,
> > > > > > > > >
> > > > > > > > > As Kafka begins to scale to larger and larger numbers of
> > > > > > > topics/partitions,
> > > > > > > > > I'm a little concerned about the scalability of APIs such
> as
> > > > > > > > > this. The> > API
> > > > > > > > > looks benign, but imagine you have have a few million
> > > > > > > > > partitions. We> > > > already expose similar APIs in the
> > > producer
> > > > > and consumer, so
> > > > > > > > > probably> > not
> > > > > > > > > much additional harm to expose it in the AdminClient, but
> it
> > > > > > > > > would be> > nice
> > > > > > > > > to put a little thought into some longer term options. We
> > > should
> > > > > > > > > be> > giving
> > > > > > > > > users an efficient way to select a smaller set of the
> topics
> > > > > > > > > they are> > > > interested in. We have always discussed
> adding
> > > > > some filtering
> > > > > > > > > support> > to
> > > > > > > > > the Metadata API. Perhaps now is a good time to reconsider
> > > this?
> > > > > > > > > We now> > > > have a convention for wildcard ACLs, so
> perhaps
> > > we
> > > > > can do
> > > > > > > > > something> > > > similar. Full regex support might be ideal
> > > given
> > > > > the consumer's> > > > subscription API, but that is more
> challenging.
> > > > What
> > > > > do you
> > > > > > > > > think?> > > >
> > > > > > > > > Thanks,
> > > > > > > > > Jason
> > > > > > > > >
> > > > > > > > > On Thu, Jul 12, 2018 at 2:35 PM, Harsha <ka...@harsha.io>
> > > > wrote:>
> > > > > > > >
> > > > > > > > > > Very useful. LGTM.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Harsha
> > > > > > > > > >
> > > > > > > > > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > I have created a KIP to add describe all topics API to
> > > > > > > > > > > AdminClient> > .
> > > > > > > > > > >
> > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > > > > > > > > > >
> > > > > > > > > > > Please take a look.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > >
> > >
>

Re: KIP-327: Add describe all topics API to AdminClient

Posted by Colin McCabe <cm...@apache.org>.
Thanks, Manikumar.  I've been meaning to bring up KIP-142 again.  It would definitely be a nice improvement.

best,
Colin


On Sat, Jul 14, 2018, at 08:51, Manikumar wrote:
> Hi Jason and Colin,
> 
> Thanks for the feedback. I agree that having filtering support to the
> Metadata API would be useful and solves
> the scalability issues.
> 
> But to implement specific use case of "describe all topics", regex 
> support
> won't help. In any case user needs to
> call listTopics() to get topic list, and then make describeTopics() 
> calls
> with a subset of the topics set.
> This leads to improving existing listTopics() API performance. Colin
> already raised a KIP for this: KIP-142
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-142%3A+Add+ListTopicsRequest+to+efficiently+list+all+the+topics+in+a+cluster>
>  .
> May be we should consider implementing KIP-142.
> 
> Since we have support wildcard ACLs, Initially, I can explore
> prefixed/wildcards patterns support to Metadata API.
> We can later extend support for regular expressions.
> 
> Thanks
> 
> 
> 
> On Sat, Jul 14, 2018 at 2:42 PM Ted Yu <yu...@gmail.com> wrote:
> 
> > What if broker crashes before all the pages can be returned ?
> >
> > Cheers
> >
> > On Sat, Jul 14, 2018 at 1:07 AM Stephane Maarek <
> > stephane@simplemachines.com.au> wrote:
> >
> > > Why not paginate ? Then one can retrieve as many topics as desired ?
> > >
> > > On Sat., 14 Jul. 2018, 4:15 pm Colin McCabe, <cm...@apache.org> wrote:
> > >
> > > > Good point.  We should probably have a maximum number of results like
> > > > 1000 or something.  That can go in the request RPC as well...
> > > > Cheers,
> > > > Colin
> > > >
> > > > On Fri, Jul 13, 2018, at 18:15, Ted Yu wrote:
> > > > > bq. describe topics by a regular expression on the server side
> > > > >
> > > > > Should caution be taken if the regex doesn't filter ("*") ?
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe
> > > > > <cm...@apache.org> wrote:>
> > > > > > As Jason wrote, this won't scale as the number of partitions
> > > > > > increases.> > We already have users who have tens of thousands of
> > > > topics, or
> > > > > > more.  If> > you multiply that by 100x over the next few years, you
> > > > end up with
> > > > > > this API> > returning full information about millions of topics,
> > > which
> > > > clearly
> > > > > > doesn't> > work.
> > > > > >
> > > > > > We discussed this a lot in the original KIP-117 DISCUSS thread
> > > > > > which added> > the Java AdminClient.  ListTopics and DescribeTopics
> > > > were
> > > > > > deliberately kept> > separate because we understood that
> > eventually a
> > > > single RPC would
> > > > > > not be> > able to return information about all the topics in the
> > > > cluster.  So
> > > > > > I have> > to vote -1 for this proposal as it stands.
> > > > > >
> > > > > > I do agree that adding a way to describe topics by a regular
> > > > > > expression on> > the server side would be very useful.  This would
> > > > also fix a major
> > > > > > scalability problem we have now, which is that when
> > > > > > subscribing via a> > regular expression, clients need to fetch the
> > > > full list of all
> > > > > > topics in> > the cluster and filter locally.
> > > > > >
> > > > > > I think a regular expression library like re2 would be ideal
> > > > > > for this> > purpose.  re2 is standardized and language-agnostic
> > (it's
> > > > not tied
> > > > > > only to> > Java).  In contrast, Java regular expression change with
> > > > different
> > > > > > releases> > of the JDK (there were some changes in java 8, for
> > > > example).
> > > > > > Also, re2> > regular expressions are linear time, never exponential
> > > > time.  See
> > > > > > https://github.com/google/re2j
> > > > > >
> > > > > > regards,
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> > > > > > > The KIP looks good to me.
> > > > > > > However, if there is willingness in the community to work on
> > > > > > > metadata> > > request with patterns, the feature proposed here
> > and
> > > > filtering by
> > > > > > > '*' or> > > '.*' would be redundant.
> > > > > > >
> > > > > > > Andras
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson
> > > > > > > <ja...@confluent.io>> > wrote:
> > > > > > >
> > > > > > > > Hey Manikumar,
> > > > > > > >
> > > > > > > > As Kafka begins to scale to larger and larger numbers of
> > > > > > topics/partitions,
> > > > > > > > I'm a little concerned about the scalability of APIs such as
> > > > > > > > this. The> > API
> > > > > > > > looks benign, but imagine you have have a few million
> > > > > > > > partitions. We> > > > already expose similar APIs in the
> > producer
> > > > and consumer, so
> > > > > > > > probably> > not
> > > > > > > > much additional harm to expose it in the AdminClient, but it
> > > > > > > > would be> > nice
> > > > > > > > to put a little thought into some longer term options. We
> > should
> > > > > > > > be> > giving
> > > > > > > > users an efficient way to select a smaller set of the topics
> > > > > > > > they are> > > > interested in. We have always discussed adding
> > > > some filtering
> > > > > > > > support> > to
> > > > > > > > the Metadata API. Perhaps now is a good time to reconsider
> > this?
> > > > > > > > We now> > > > have a convention for wildcard ACLs, so perhaps
> > we
> > > > can do
> > > > > > > > something> > > > similar. Full regex support might be ideal
> > given
> > > > the consumer's> > > > subscription API, but that is more challenging.
> > > What
> > > > do you
> > > > > > > > think?> > > >
> > > > > > > > Thanks,
> > > > > > > > Jason
> > > > > > > >
> > > > > > > > On Thu, Jul 12, 2018 at 2:35 PM, Harsha <ka...@harsha.io>
> > > wrote:>
> > > > > > >
> > > > > > > > > Very useful. LGTM.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Harsha
> > > > > > > > >
> > > > > > > > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I have created a KIP to add describe all topics API to
> > > > > > > > > > AdminClient> > .
> > > > > > > > > >
> > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > > > > > > > > >
> > > > > > > > > > Please take a look.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > > >
> > >
> >

Re: KIP-327: Add describe all topics API to AdminClient

Posted by Manikumar <ma...@gmail.com>.
Hi Jason and Colin,

Thanks for the feedback. I agree that having filtering support to the
Metadata API would be useful and solves
the scalability issues.

But to implement specific use case of "describe all topics", regex support
won't help. In any case user needs to
call listTopics() to get topic list, and then make describeTopics() calls
with a subset of the topics set.
This leads to improving existing listTopics() API performance. Colin
already raised a KIP for this: KIP-142
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-142%3A+Add+ListTopicsRequest+to+efficiently+list+all+the+topics+in+a+cluster>
 .
May be we should consider implementing KIP-142.

Since we have support wildcard ACLs, Initially, I can explore
prefixed/wildcards patterns support to Metadata API.
We can later extend support for regular expressions.

Thanks



On Sat, Jul 14, 2018 at 2:42 PM Ted Yu <yu...@gmail.com> wrote:

> What if broker crashes before all the pages can be returned ?
>
> Cheers
>
> On Sat, Jul 14, 2018 at 1:07 AM Stephane Maarek <
> stephane@simplemachines.com.au> wrote:
>
> > Why not paginate ? Then one can retrieve as many topics as desired ?
> >
> > On Sat., 14 Jul. 2018, 4:15 pm Colin McCabe, <cm...@apache.org> wrote:
> >
> > > Good point.  We should probably have a maximum number of results like
> > > 1000 or something.  That can go in the request RPC as well...
> > > Cheers,
> > > Colin
> > >
> > > On Fri, Jul 13, 2018, at 18:15, Ted Yu wrote:
> > > > bq. describe topics by a regular expression on the server side
> > > >
> > > > Should caution be taken if the regex doesn't filter ("*") ?
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe
> > > > <cm...@apache.org> wrote:>
> > > > > As Jason wrote, this won't scale as the number of partitions
> > > > > increases.> > We already have users who have tens of thousands of
> > > topics, or
> > > > > more.  If> > you multiply that by 100x over the next few years, you
> > > end up with
> > > > > this API> > returning full information about millions of topics,
> > which
> > > clearly
> > > > > doesn't> > work.
> > > > >
> > > > > We discussed this a lot in the original KIP-117 DISCUSS thread
> > > > > which added> > the Java AdminClient.  ListTopics and DescribeTopics
> > > were
> > > > > deliberately kept> > separate because we understood that
> eventually a
> > > single RPC would
> > > > > not be> > able to return information about all the topics in the
> > > cluster.  So
> > > > > I have> > to vote -1 for this proposal as it stands.
> > > > >
> > > > > I do agree that adding a way to describe topics by a regular
> > > > > expression on> > the server side would be very useful.  This would
> > > also fix a major
> > > > > scalability problem we have now, which is that when
> > > > > subscribing via a> > regular expression, clients need to fetch the
> > > full list of all
> > > > > topics in> > the cluster and filter locally.
> > > > >
> > > > > I think a regular expression library like re2 would be ideal
> > > > > for this> > purpose.  re2 is standardized and language-agnostic
> (it's
> > > not tied
> > > > > only to> > Java).  In contrast, Java regular expression change with
> > > different
> > > > > releases> > of the JDK (there were some changes in java 8, for
> > > example).
> > > > > Also, re2> > regular expressions are linear time, never exponential
> > > time.  See
> > > > > https://github.com/google/re2j
> > > > >
> > > > > regards,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> > > > > > The KIP looks good to me.
> > > > > > However, if there is willingness in the community to work on
> > > > > > metadata> > > request with patterns, the feature proposed here
> and
> > > filtering by
> > > > > > '*' or> > > '.*' would be redundant.
> > > > > >
> > > > > > Andras
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson
> > > > > > <ja...@confluent.io>> > wrote:
> > > > > >
> > > > > > > Hey Manikumar,
> > > > > > >
> > > > > > > As Kafka begins to scale to larger and larger numbers of
> > > > > topics/partitions,
> > > > > > > I'm a little concerned about the scalability of APIs such as
> > > > > > > this. The> > API
> > > > > > > looks benign, but imagine you have have a few million
> > > > > > > partitions. We> > > > already expose similar APIs in the
> producer
> > > and consumer, so
> > > > > > > probably> > not
> > > > > > > much additional harm to expose it in the AdminClient, but it
> > > > > > > would be> > nice
> > > > > > > to put a little thought into some longer term options. We
> should
> > > > > > > be> > giving
> > > > > > > users an efficient way to select a smaller set of the topics
> > > > > > > they are> > > > interested in. We have always discussed adding
> > > some filtering
> > > > > > > support> > to
> > > > > > > the Metadata API. Perhaps now is a good time to reconsider
> this?
> > > > > > > We now> > > > have a convention for wildcard ACLs, so perhaps
> we
> > > can do
> > > > > > > something> > > > similar. Full regex support might be ideal
> given
> > > the consumer's> > > > subscription API, but that is more challenging.
> > What
> > > do you
> > > > > > > think?> > > >
> > > > > > > Thanks,
> > > > > > > Jason
> > > > > > >
> > > > > > > On Thu, Jul 12, 2018 at 2:35 PM, Harsha <ka...@harsha.io>
> > wrote:>
> > > > > >
> > > > > > > > Very useful. LGTM.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Harsha
> > > > > > > >
> > > > > > > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I have created a KIP to add describe all topics API to
> > > > > > > > > AdminClient> > .
> > > > > > > > >
> > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > > > > > > > >
> > > > > > > > > Please take a look.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > >
> > > > > > >
> > > > >
> > >
> > >
> >
>

Re: KIP-327: Add describe all topics API to AdminClient

Posted by Ted Yu <yu...@gmail.com>.
What if broker crashes before all the pages can be returned ?

Cheers

On Sat, Jul 14, 2018 at 1:07 AM Stephane Maarek <
stephane@simplemachines.com.au> wrote:

> Why not paginate ? Then one can retrieve as many topics as desired ?
>
> On Sat., 14 Jul. 2018, 4:15 pm Colin McCabe, <cm...@apache.org> wrote:
>
> > Good point.  We should probably have a maximum number of results like
> > 1000 or something.  That can go in the request RPC as well...
> > Cheers,
> > Colin
> >
> > On Fri, Jul 13, 2018, at 18:15, Ted Yu wrote:
> > > bq. describe topics by a regular expression on the server side
> > >
> > > Should caution be taken if the regex doesn't filter ("*") ?
> > >
> > > Cheers
> > >
> > > On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe
> > > <cm...@apache.org> wrote:>
> > > > As Jason wrote, this won't scale as the number of partitions
> > > > increases.> > We already have users who have tens of thousands of
> > topics, or
> > > > more.  If> > you multiply that by 100x over the next few years, you
> > end up with
> > > > this API> > returning full information about millions of topics,
> which
> > clearly
> > > > doesn't> > work.
> > > >
> > > > We discussed this a lot in the original KIP-117 DISCUSS thread
> > > > which added> > the Java AdminClient.  ListTopics and DescribeTopics
> > were
> > > > deliberately kept> > separate because we understood that eventually a
> > single RPC would
> > > > not be> > able to return information about all the topics in the
> > cluster.  So
> > > > I have> > to vote -1 for this proposal as it stands.
> > > >
> > > > I do agree that adding a way to describe topics by a regular
> > > > expression on> > the server side would be very useful.  This would
> > also fix a major
> > > > scalability problem we have now, which is that when
> > > > subscribing via a> > regular expression, clients need to fetch the
> > full list of all
> > > > topics in> > the cluster and filter locally.
> > > >
> > > > I think a regular expression library like re2 would be ideal
> > > > for this> > purpose.  re2 is standardized and language-agnostic (it's
> > not tied
> > > > only to> > Java).  In contrast, Java regular expression change with
> > different
> > > > releases> > of the JDK (there were some changes in java 8, for
> > example).
> > > > Also, re2> > regular expressions are linear time, never exponential
> > time.  See
> > > > https://github.com/google/re2j
> > > >
> > > > regards,
> > > > Colin
> > > >
> > > >
> > > > On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> > > > > The KIP looks good to me.
> > > > > However, if there is willingness in the community to work on
> > > > > metadata> > > request with patterns, the feature proposed here and
> > filtering by
> > > > > '*' or> > > '.*' would be redundant.
> > > > >
> > > > > Andras
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson
> > > > > <ja...@confluent.io>> > wrote:
> > > > >
> > > > > > Hey Manikumar,
> > > > > >
> > > > > > As Kafka begins to scale to larger and larger numbers of
> > > > topics/partitions,
> > > > > > I'm a little concerned about the scalability of APIs such as
> > > > > > this. The> > API
> > > > > > looks benign, but imagine you have have a few million
> > > > > > partitions. We> > > > already expose similar APIs in the producer
> > and consumer, so
> > > > > > probably> > not
> > > > > > much additional harm to expose it in the AdminClient, but it
> > > > > > would be> > nice
> > > > > > to put a little thought into some longer term options. We should
> > > > > > be> > giving
> > > > > > users an efficient way to select a smaller set of the topics
> > > > > > they are> > > > interested in. We have always discussed adding
> > some filtering
> > > > > > support> > to
> > > > > > the Metadata API. Perhaps now is a good time to reconsider this?
> > > > > > We now> > > > have a convention for wildcard ACLs, so perhaps we
> > can do
> > > > > > something> > > > similar. Full regex support might be ideal given
> > the consumer's> > > > subscription API, but that is more challenging.
> What
> > do you
> > > > > > think?> > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > > On Thu, Jul 12, 2018 at 2:35 PM, Harsha <ka...@harsha.io>
> wrote:>
> > > > >
> > > > > > > Very useful. LGTM.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Harsha
> > > > > > >
> > > > > > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I have created a KIP to add describe all topics API to
> > > > > > > > AdminClient> > .
> > > > > > > >
> > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > > > > > > >
> > > > > > > > Please take a look.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > >
> > > > > >
> > > >
> >
> >
>

Re: KIP-327: Add describe all topics API to AdminClient

Posted by Stephane Maarek <st...@simplemachines.com.au>.
Why not paginate ? Then one can retrieve as many topics as desired ?

On Sat., 14 Jul. 2018, 4:15 pm Colin McCabe, <cm...@apache.org> wrote:

> Good point.  We should probably have a maximum number of results like
> 1000 or something.  That can go in the request RPC as well...
> Cheers,
> Colin
>
> On Fri, Jul 13, 2018, at 18:15, Ted Yu wrote:
> > bq. describe topics by a regular expression on the server side
> >
> > Should caution be taken if the regex doesn't filter ("*") ?
> >
> > Cheers
> >
> > On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe
> > <cm...@apache.org> wrote:>
> > > As Jason wrote, this won't scale as the number of partitions
> > > increases.> > We already have users who have tens of thousands of
> topics, or
> > > more.  If> > you multiply that by 100x over the next few years, you
> end up with
> > > this API> > returning full information about millions of topics, which
> clearly
> > > doesn't> > work.
> > >
> > > We discussed this a lot in the original KIP-117 DISCUSS thread
> > > which added> > the Java AdminClient.  ListTopics and DescribeTopics
> were
> > > deliberately kept> > separate because we understood that eventually a
> single RPC would
> > > not be> > able to return information about all the topics in the
> cluster.  So
> > > I have> > to vote -1 for this proposal as it stands.
> > >
> > > I do agree that adding a way to describe topics by a regular
> > > expression on> > the server side would be very useful.  This would
> also fix a major
> > > scalability problem we have now, which is that when
> > > subscribing via a> > regular expression, clients need to fetch the
> full list of all
> > > topics in> > the cluster and filter locally.
> > >
> > > I think a regular expression library like re2 would be ideal
> > > for this> > purpose.  re2 is standardized and language-agnostic (it's
> not tied
> > > only to> > Java).  In contrast, Java regular expression change with
> different
> > > releases> > of the JDK (there were some changes in java 8, for
> example).
> > > Also, re2> > regular expressions are linear time, never exponential
> time.  See
> > > https://github.com/google/re2j
> > >
> > > regards,
> > > Colin
> > >
> > >
> > > On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> > > > The KIP looks good to me.
> > > > However, if there is willingness in the community to work on
> > > > metadata> > > request with patterns, the feature proposed here and
> filtering by
> > > > '*' or> > > '.*' would be redundant.
> > > >
> > > > Andras
> > > >
> > > >
> > > >
> > > > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson
> > > > <ja...@confluent.io>> > wrote:
> > > >
> > > > > Hey Manikumar,
> > > > >
> > > > > As Kafka begins to scale to larger and larger numbers of
> > > topics/partitions,
> > > > > I'm a little concerned about the scalability of APIs such as
> > > > > this. The> > API
> > > > > looks benign, but imagine you have have a few million
> > > > > partitions. We> > > > already expose similar APIs in the producer
> and consumer, so
> > > > > probably> > not
> > > > > much additional harm to expose it in the AdminClient, but it
> > > > > would be> > nice
> > > > > to put a little thought into some longer term options. We should
> > > > > be> > giving
> > > > > users an efficient way to select a smaller set of the topics
> > > > > they are> > > > interested in. We have always discussed adding
> some filtering
> > > > > support> > to
> > > > > the Metadata API. Perhaps now is a good time to reconsider this?
> > > > > We now> > > > have a convention for wildcard ACLs, so perhaps we
> can do
> > > > > something> > > > similar. Full regex support might be ideal given
> the consumer's> > > > subscription API, but that is more challenging. What
> do you
> > > > > think?> > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Thu, Jul 12, 2018 at 2:35 PM, Harsha <ka...@harsha.io> wrote:>
> > > >
> > > > > > Very useful. LGTM.
> > > > > >
> > > > > > Thanks,
> > > > > > Harsha
> > > > > >
> > > > > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I have created a KIP to add describe all topics API to
> > > > > > > AdminClient> > .
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > > > > > >
> > > > > > > Please take a look.
> > > > > > >
> > > > > > > Thanks,
> > > > > >
> > > > >
> > >
>
>

Re: KIP-327: Add describe all topics API to AdminClient

Posted by Colin McCabe <cm...@apache.org>.
Good point.  We should probably have a maximum number of results like
1000 or something.  That can go in the request RPC as well...
Cheers,
Colin

On Fri, Jul 13, 2018, at 18:15, Ted Yu wrote:
> bq. describe topics by a regular expression on the server side
>
> Should caution be taken if the regex doesn't filter ("*") ?
>
> Cheers
>
> On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe
> <cm...@apache.org> wrote:>
> > As Jason wrote, this won't scale as the number of partitions
> > increases.> > We already have users who have tens of thousands of topics, or
> > more.  If> > you multiply that by 100x over the next few years, you end up with
> > this API> > returning full information about millions of topics, which clearly
> > doesn't> > work.
> >
> > We discussed this a lot in the original KIP-117 DISCUSS thread
> > which added> > the Java AdminClient.  ListTopics and DescribeTopics were
> > deliberately kept> > separate because we understood that eventually a single RPC would
> > not be> > able to return information about all the topics in the cluster.  So
> > I have> > to vote -1 for this proposal as it stands.
> >
> > I do agree that adding a way to describe topics by a regular
> > expression on> > the server side would be very useful.  This would also fix a major
> > scalability problem we have now, which is that when
> > subscribing via a> > regular expression, clients need to fetch the full list of all
> > topics in> > the cluster and filter locally.
> >
> > I think a regular expression library like re2 would be ideal
> > for this> > purpose.  re2 is standardized and language-agnostic (it's not tied
> > only to> > Java).  In contrast, Java regular expression change with different
> > releases> > of the JDK (there were some changes in java 8, for example).
> > Also, re2> > regular expressions are linear time, never exponential time.  See
> > https://github.com/google/re2j
> >
> > regards,
> > Colin
> >
> >
> > On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> > > The KIP looks good to me.
> > > However, if there is willingness in the community to work on
> > > metadata> > > request with patterns, the feature proposed here and filtering by
> > > '*' or> > > '.*' would be redundant.
> > >
> > > Andras
> > >
> > >
> > >
> > > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson
> > > <ja...@confluent.io>> > wrote:
> > >
> > > > Hey Manikumar,
> > > >
> > > > As Kafka begins to scale to larger and larger numbers of
> > topics/partitions,
> > > > I'm a little concerned about the scalability of APIs such as
> > > > this. The> > API
> > > > looks benign, but imagine you have have a few million
> > > > partitions. We> > > > already expose similar APIs in the producer and consumer, so
> > > > probably> > not
> > > > much additional harm to expose it in the AdminClient, but it
> > > > would be> > nice
> > > > to put a little thought into some longer term options. We should
> > > > be> > giving
> > > > users an efficient way to select a smaller set of the topics
> > > > they are> > > > interested in. We have always discussed adding some filtering
> > > > support> > to
> > > > the Metadata API. Perhaps now is a good time to reconsider this?
> > > > We now> > > > have a convention for wildcard ACLs, so perhaps we can do
> > > > something> > > > similar. Full regex support might be ideal given the consumer's> > > > subscription API, but that is more challenging. What do you
> > > > think?> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Thu, Jul 12, 2018 at 2:35 PM, Harsha <ka...@harsha.io> wrote:> > > >
> > > > > Very useful. LGTM.
> > > > >
> > > > > Thanks,
> > > > > Harsha
> > > > >
> > > > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I have created a KIP to add describe all topics API to
> > > > > > AdminClient> > .
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > > > > >
> > > > > > Please take a look.
> > > > > >
> > > > > > Thanks,
> > > > >
> > > >
> >


Re: KIP-327: Add describe all topics API to AdminClient

Posted by Ted Yu <yu...@gmail.com>.
bq. describe topics by a regular expression on the server side

Should caution be taken if the regex doesn't filter ("*") ?

Cheers

On Fri, Jul 13, 2018 at 6:02 PM Colin McCabe <cm...@apache.org> wrote:

> As Jason wrote, this won't scale as the number of partitions increases.
> We already have users who have tens of thousands of topics, or more.  If
> you multiply that by 100x over the next few years, you end up with this API
> returning full information about millions of topics, which clearly doesn't
> work.
>
> We discussed this a lot in the original KIP-117 DISCUSS thread which added
> the Java AdminClient.  ListTopics and DescribeTopics were deliberately kept
> separate because we understood that eventually a single RPC would not be
> able to return information about all the topics in the cluster.  So I have
> to vote -1 for this proposal as it stands.
>
> I do agree that adding a way to describe topics by a regular expression on
> the server side would be very useful.  This would also fix a major
> scalability problem we have now, which is that when subscribing via a
> regular expression, clients need to fetch the full list of all topics in
> the cluster and filter locally.
>
> I think a regular expression library like re2 would be ideal for this
> purpose.  re2 is standardized and language-agnostic (it's not tied only to
> Java).  In contrast, Java regular expression change with different releases
> of the JDK (there were some changes in java 8, for example).  Also, re2
> regular expressions are linear time, never exponential time.  See
> https://github.com/google/re2j
>
> regards,
> Colin
>
>
> On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> > The KIP looks good to me.
> > However, if there is willingness in the community to work on metadata
> > request with patterns, the feature proposed here and filtering by '*' or
> > '.*' would be redundant.
> >
> > Andras
> >
> >
> >
> > On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson <ja...@confluent.io>
> wrote:
> >
> > > Hey Manikumar,
> > >
> > > As Kafka begins to scale to larger and larger numbers of
> topics/partitions,
> > > I'm a little concerned about the scalability of APIs such as this. The
> API
> > > looks benign, but imagine you have have a few million partitions. We
> > > already expose similar APIs in the producer and consumer, so probably
> not
> > > much additional harm to expose it in the AdminClient, but it would be
> nice
> > > to put a little thought into some longer term options. We should be
> giving
> > > users an efficient way to select a smaller set of the topics they are
> > > interested in. We have always discussed adding some filtering support
> to
> > > the Metadata API. Perhaps now is a good time to reconsider this? We now
> > > have a convention for wildcard ACLs, so perhaps we can do something
> > > similar. Full regex support might be ideal given the consumer's
> > > subscription API, but that is more challenging. What do you think?
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Thu, Jul 12, 2018 at 2:35 PM, Harsha <ka...@harsha.io> wrote:
> > >
> > > > Very useful. LGTM.
> > > >
> > > > Thanks,
> > > > Harsha
> > > >
> > > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > > > Hi all,
> > > > >
> > > > > I have created a KIP to add describe all topics API to AdminClient
> .
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > > > >
> > > > > Please take a look.
> > > > >
> > > > > Thanks,
> > > >
> > >
>

Re: KIP-327: Add describe all topics API to AdminClient

Posted by Colin McCabe <cm...@apache.org>.
As Jason wrote, this won't scale as the number of partitions increases.  We already have users who have tens of thousands of topics, or more.  If you multiply that by 100x over the next few years, you end up with this API returning full information about millions of topics, which clearly doesn't work.

We discussed this a lot in the original KIP-117 DISCUSS thread which added the Java AdminClient.  ListTopics and DescribeTopics were deliberately kept separate because we understood that eventually a single RPC would not be able to return information about all the topics in the cluster.  So I have to vote -1 for this proposal as it stands.

I do agree that adding a way to describe topics by a regular expression on the server side would be very useful.  This would also fix a major scalability problem we have now, which is that when subscribing via a regular expression, clients need to fetch the full list of all topics in the cluster and filter locally.

I think a regular expression library like re2 would be ideal for this purpose.  re2 is standardized and language-agnostic (it's not tied only to Java).  In contrast, Java regular expression change with different releases of the JDK (there were some changes in java 8, for example).  Also, re2 regular expressions are linear time, never exponential time.  See https://github.com/google/re2j

regards,
Colin


On Fri, Jul 13, 2018, at 05:00, Andras Beni wrote:
> The KIP looks good to me.
> However, if there is willingness in the community to work on metadata
> request with patterns, the feature proposed here and filtering by '*' or
> '.*' would be redundant.
> 
> Andras
> 
> 
> 
> On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson <ja...@confluent.io> wrote:
> 
> > Hey Manikumar,
> >
> > As Kafka begins to scale to larger and larger numbers of topics/partitions,
> > I'm a little concerned about the scalability of APIs such as this. The API
> > looks benign, but imagine you have have a few million partitions. We
> > already expose similar APIs in the producer and consumer, so probably not
> > much additional harm to expose it in the AdminClient, but it would be nice
> > to put a little thought into some longer term options. We should be giving
> > users an efficient way to select a smaller set of the topics they are
> > interested in. We have always discussed adding some filtering support to
> > the Metadata API. Perhaps now is a good time to reconsider this? We now
> > have a convention for wildcard ACLs, so perhaps we can do something
> > similar. Full regex support might be ideal given the consumer's
> > subscription API, but that is more challenging. What do you think?
> >
> > Thanks,
> > Jason
> >
> > On Thu, Jul 12, 2018 at 2:35 PM, Harsha <ka...@harsha.io> wrote:
> >
> > > Very useful. LGTM.
> > >
> > > Thanks,
> > > Harsha
> > >
> > > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > > Hi all,
> > > >
> > > > I have created a KIP to add describe all topics API to AdminClient .
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > > >
> > > > Please take a look.
> > > >
> > > > Thanks,
> > >
> >

Re: KIP-327: Add describe all topics API to AdminClient

Posted by Andras Beni <an...@cloudera.com.INVALID>.
The KIP looks good to me.
However, if there is willingness in the community to work on metadata
request with patterns, the feature proposed here and filtering by '*' or
'.*' would be redundant.

Andras



On Fri, Jul 13, 2018 at 12:38 AM Jason Gustafson <ja...@confluent.io> wrote:

> Hey Manikumar,
>
> As Kafka begins to scale to larger and larger numbers of topics/partitions,
> I'm a little concerned about the scalability of APIs such as this. The API
> looks benign, but imagine you have have a few million partitions. We
> already expose similar APIs in the producer and consumer, so probably not
> much additional harm to expose it in the AdminClient, but it would be nice
> to put a little thought into some longer term options. We should be giving
> users an efficient way to select a smaller set of the topics they are
> interested in. We have always discussed adding some filtering support to
> the Metadata API. Perhaps now is a good time to reconsider this? We now
> have a convention for wildcard ACLs, so perhaps we can do something
> similar. Full regex support might be ideal given the consumer's
> subscription API, but that is more challenging. What do you think?
>
> Thanks,
> Jason
>
> On Thu, Jul 12, 2018 at 2:35 PM, Harsha <ka...@harsha.io> wrote:
>
> > Very useful. LGTM.
> >
> > Thanks,
> > Harsha
> >
> > On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > > Hi all,
> > >
> > > I have created a KIP to add describe all topics API to AdminClient .
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 327%3A+Add+describe+all+topics+API+to+AdminClient
> > >
> > > Please take a look.
> > >
> > > Thanks,
> >
>

Re: KIP-327: Add describe all topics API to AdminClient

Posted by Jason Gustafson <ja...@confluent.io>.
Hey Manikumar,

As Kafka begins to scale to larger and larger numbers of topics/partitions,
I'm a little concerned about the scalability of APIs such as this. The API
looks benign, but imagine you have have a few million partitions. We
already expose similar APIs in the producer and consumer, so probably not
much additional harm to expose it in the AdminClient, but it would be nice
to put a little thought into some longer term options. We should be giving
users an efficient way to select a smaller set of the topics they are
interested in. We have always discussed adding some filtering support to
the Metadata API. Perhaps now is a good time to reconsider this? We now
have a convention for wildcard ACLs, so perhaps we can do something
similar. Full regex support might be ideal given the consumer's
subscription API, but that is more challenging. What do you think?

Thanks,
Jason

On Thu, Jul 12, 2018 at 2:35 PM, Harsha <ka...@harsha.io> wrote:

> Very useful. LGTM.
>
> Thanks,
> Harsha
>
> On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> > Hi all,
> >
> > I have created a KIP to add describe all topics API to AdminClient .
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 327%3A+Add+describe+all+topics+API+to+AdminClient
> >
> > Please take a look.
> >
> > Thanks,
>

Re: KIP-327: Add describe all topics API to AdminClient

Posted by Harsha <ka...@harsha.io>.
Very useful. LGTM.

Thanks,
Harsha

On Thu, Jul 12, 2018, at 9:56 AM, Manikumar wrote:
> Hi all,
> 
> I have created a KIP to add describe all topics API to AdminClient .
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-327%3A+Add+describe+all+topics+API+to+AdminClient
> 
> Please take a look.
> 
> Thanks,