You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Lucas Bradstreet <lu...@confluent.io> on 2019/10/04 01:50:50 UTC

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Hi Brian,

This looks great, and should help reduce blocking and high metadata request
volumes when the producer is sending to large numbers of topics, especially
at low volumes. I think the approach to make metadata fetching asynchronous
and batch metadata requests together will help significantly.

The only other approach I can think of is to allow users to supply the
producer with the expected topics upfront, allowing the producer to perform
a single initial metadata request before any sends occur. I see no real
advantages to this approach compared to the async method you’ve proposed,
but maybe we could add it to the rejected alternatives section?

Thanks,

Lucas

On Fri, 20 Sep 2019 at 11:46, Brian Byrne <bb...@confluent.io> wrote:

> I've updated the 'Proposed Changes' to include two new producer
> configuration variables: topic.expiry.ms and topic.refresh.ms. Please take
> a look.
>
> Thanks,
> Brian
>
> On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <bb...@confluent.io> wrote:
>
> > Dev team,
> >
> > Requesting discussion for improvement to the producer when dealing with a
> > large number of topics.
> >
> > KIP:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> >
> > JIRA: https://issues.apache.org/jira/browse/KAFKA-8904
> >
> > Thoughts and feedback would be appreciated.
> >
> > Thanks,
> > Brian
> >
>

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Guozhang Wang <wa...@gmail.com>.
Cool. Sounds good to me!

On Thu, Jan 2, 2020 at 2:21 PM Brian Byrne <bb...@confluent.io> wrote:

> Hi Guozhang,
>
> Only evicting due to memory pressure (or if the metadata is stale, to
> prevent infinitely refreshing no longer used topics when memory's not an
> issue) makes sense, and would be the most intuitive way of going about it.
>
> From an implementation perspective, it's a little more difficult because
> it'd require the producer's BufferPool to feed back into the Metadata
> object, introduce a sane limit on the consumer side, and add the concept of
> potentially evicting stale metadata instead of refreshing it. So it's a
> little involved, and I'd prefer to save that for a round 2, if necessary.
>
> I've updated the KIP to change the default eviction time to 2 minutes.
>
> Thanks,
> Brian
>
>
> On Thu, Jan 2, 2020 at 1:28 PM Guozhang Wang <wa...@gmail.com> wrote:
>
> > On Thu, Jan 2, 2020 at 12:42 PM Brian Byrne <bb...@confluent.io> wrote:
> >
> > > Hi Guozhang,
> > >
> > > You're correct in that, with the current defined default values, it's
> > > likely we'll be more proactive in refreshing metadata for
> > > once/rarely-touched topics, which isn't ideal. It'd be at most 2x the
> > rate,
> > > but we can certainly do better. I made the default eviction rate to
> > mirror
> > > the current hard-coded value in effort to minimize surprise, but
> perhaps
> > > there's a more suitable middle ground.
> > >
> > > Your mental model is good. I also agree that having a more aggressive
> > > eviction rate, or at least one that's a fair amount shorter than the
> > topic
> > > refresh rate, behaves better for the once/rarely-touched topics and is
> > more
> > > appropriate overall. One thing to note though is that, if metadata is
> > > evicted + refetched, there'll be a small latency bubble for the
> producer.
> > >
> > > The concern I had with shortening the eviction rate dramatically was
> that
> > > it could have unexpected consequences for clusters that keep the
> default,
> > > but whose P1 is slightly longer than the value. For example, if the
> > > metadata.evict.ms is 30 seconds, a periodic access rate of 40 seconds
> > > could
> > > request the topic metadata up to 7 times during a 5-minute interval. Of
> > > course, this is easily fixed by setting the eviction rate appropriately
> > by
> > > the client, but perhaps lowering the default to a moderate ~2 minutes
> > might
> > > be a better balance to avoid surprise of worst-case behavior.
> > >
> > > Something else that could be considered is defining a "soft topic
> > metadata
> > > cache size", which would only evict unused topics (last used >=
> > > metadata.evict.ms) if the soft cache size was exceeded or the metadata
> > was
> > > stale. However, it still doesn't save the client from degenerate
> behavior
> > > where P1 + e ~= metadata.evict.ms, so I'm more partial to keeping the
> > > eviction rate to ~2+ minutes.
> > >
> > > This makes sense.
> >
> > I think since the scenario that a producer sends to a topic once and
> never
> > did again is not very common, and more commonly a single producer would
> > only send to a very small subset of topics hosted on the cluster during
> its
> > life time, we can actually use a cache size to guide when should we
> > consider eviction (my understanding is that the memory pressure is the
> only
> > reason we want to evict, lmk if there are other motivations). A wild
> > thought is that we do not even need to introduce a new config since we
> have
> > a `buffer.size` already and we can calculate the exact bytes-size of each
> > topic metadata entry as well, so we can, say, consider evicting only when
> > that buffer size is reached together with buffered data AND metadata in
> > bytes. Then in practice we would not need to evict metadata anyways, AND
> if
> > we do need to, having some aggressive eviction threshold is more valid
> (btw
> > I think I also agree that something like 2minutes is fine).
> >
> >
> > > Yet another option is to have topics in the "U" set (i.e. topics
> > requesting
> > > non-urgent metadata update) exclude those that are nearing their
> eviction
> > > duration. I fear this might be over-engineering a potential non-problem
> > > though. Thoughts?
> > >
> >
> > Yeah I also feel it is a bit overkill for this.
> >
> >
> > >
> > > Thanks,
> > > Brian
> > >
> > >
> > >
> > > On Thu, Jan 2, 2020 at 11:44 AM Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > > > Hello Brain,
> > > >
> > > > For the newly added `metadata.evict.ms` config, since its default
> > value
> > > is
> > > > set as the same as `metadata.max.age.ms` and most users would not
> > > override
> > > > default config values, would it be possible for scenarios that
> producer
> > > > send to a specific topic once and never send again, we may end up
> > > > refreshing the metadata that are about to be removed soon? In other
> > words
> > > > I'm wondering if for common cases the eviction period should be
> larger
> > > than
> > > > max.age or vice versa. This is my mental model:
> > > >
> > > > 1) If we evict a topic eagerly compared with refreshing it, e.g.
> after
> > > time
> > > > T1 has elapsed, and then after T2 (> T1) elapsed that topic is
> > requested
> > > > for sending again, we would force a metadata update which includes
> this
> > > > topic (remember we now always batch a set of topics in each metadata
> > > > request)
> > > > 2) if we evict a topic more lazily compared with refreshing it, say
> > after
> > > > T2 has elapsed, but before that after T1 (< T2) we have to refresh it
> > > > already, it is included in that metadata request.
> > > >
> > > > So from the number of metadata requests it may incur, evicting more
> > > eagerly
> > > > seems okay --- i.e. even if we do not evict this topic, it would
> still
> > > > cause a metadata request after the max.age.ms anyways so evicting it
> > > early
> > > > does not make things worse. More generally, if we know a topic's
> > sending
> > > > frequency is P1 (i.e. every P1 seconds we send once for that topic),
> > and
> > > > eviction period is P2, and the metadata.max.age is P3. Then the
> optimal
> > > > scenario in terms of metadata update frequency seems to be P1 = P2 <
> > P3.
> > > So
> > > > I feel making the default value for metadata.evict.ms smaller is
> fine,
> > > > e.g.
> > > > say 1minute or 30seconds. WDYT?
> > > >
> > > >
> > > > Guozhang
> > > >
> > > > On Thu, Dec 26, 2019 at 7:08 AM Brian Byrne <bb...@confluent.io>
> > wrote:
> > > >
> > > > > Hi Stanislav,
> > > > >
> > > > > Appreciate the feedback!
> > > > >
> > > > > 1. You're correct. I've added notes to the KIP to clarify.
> > > > >
> > > > > 2. Yes it should. Fixed.
> > > > >
> > > > > 3. So I made a mistake when generalizing the target refresh size,
> > which
> > > > > should have been using `metadata.max.age.ms` instead of `
> > > > metadata.evict.ms
> > > > > `.
> > > > > Therefore, `metadata.max.age.ms` still controls the target refresh
> > > time
> > > > of
> > > > > active topic metadata, and doesn't change in that regard. You're
> > right
> > > in
> > > > > that the U set will make metadata refreshing more proactive in some
> > > > cases,
> > > > > which is done to smooth out the metadata RPCs over time and lessen
> > the
> > > > > potential bulk of some of the RPC responses by working in smaller
> > > units.
> > > > > The T set is what logically exists today and doesn't change.
> > > > >
> > > > > Thanks,
> > > > > Brian
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Dec 26, 2019 at 2:16 AM Stanislav Kozlovski <
> > > > > stanislav@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hey Brian,
> > > > > >
> > > > > > 1. Could we more explicitly clarify the behavior of the algorithm
> > > when
> > > > > `|T|
> > > > > > > TARGET_METADATA_FETCH SIZE` ? I assume we ignore the config in
> > that
> > > > > > scenario
> > > > > > 2. Should `targetMetadataFetchSize = Math.max(topicsPerSec / 10,
> > 20)`
> > > > be
> > > > > > `topicsPerSec * 10` ?
> > > > > > 3. When is this new algorithm applied? To confirm my
> understanding
> > -
> > > > what
> > > > > > is the behavior of `metadata.max.age.ms` after this KIP? Are we
> > > > adding a
> > > > > > new, more proactive metadata fetch for topics in |U|?
> > > > > >
> > > > > > Thanks,
> > > > > > Stanislav
> > > > > >
> > > > > > On Thu, Dec 19, 2019 at 11:37 PM Brian Byrne <
> bbyrne@confluent.io>
> > > > > wrote:
> > > > > >
> > > > > > > Hello everyone,
> > > > > > >
> > > > > > > For all interested, please take a look at the proposed
> algorithm
> > as
> > > > I'd
> > > > > > > like to get more feedback. I'll call for a vote once the break
> is
> > > > over.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Brian
> > > > > > >
> > > > > > > On Mon, Dec 9, 2019 at 10:18 PM Guozhang Wang <
> > wangguoz@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Sounds good, I agree that should not make a big difference in
> > > > > practice.
> > > > > > > >
> > > > > > > > On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne <
> > bbyrne@confluent.io>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Guozhang,
> > > > > > > > >
> > > > > > > > > I see, we agree on the topic threshold not applying to
> urgent
> > > > > topics,
> > > > > > > but
> > > > > > > > > differ slightly on what should be considered urgent. I
> would
> > > > argue
> > > > > > that
> > > > > > > > we
> > > > > > > > > should consider topics nearing the metadata.max.age.ms to
> be
> > > > > urgent
> > > > > > > > since
> > > > > > > > > they may still be well within the metadata.expiry.ms. That
> > is,
> > > > the
> > > > > > > > client
> > > > > > > > > still considers these topics to be relevant (not expired),
> > but
> > > > > > doesn't
> > > > > > > > want
> > > > > > > > > to incur the latency bubble of having to wait for the
> > metadata
> > > to
> > > > > be
> > > > > > > > > re-fetched if it's stale. This could be a frequent case if
> > the
> > > > > > > > > metadata.max.age.ms << metadata.expiry.ms.
> > > > > > > > >
> > > > > > > > > In practice, I wouldn't expect this to make a noticeable
> > > > difference
> > > > > > so
> > > > > > > I
> > > > > > > > > don't have a strong leaning, but the current behavior today
> > is
> > > to
> > > > > > > > > aggressively refresh the metadata of stale topics by
> > ensuring a
> > > > > > refresh
> > > > > > > > is
> > > > > > > > > triggered before that metadata.max.age.ms duration
> elapses.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Brian
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang <
> > > > wangguoz@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hello Brian,
> > > > > > > > > >
> > > > > > > > > > Thanks for your explanation, could you then update the
> wiki
> > > > page
> > > > > > for
> > > > > > > > the
> > > > > > > > > > algorithm part since when I read it, I thought it was
> > > different
> > > > > > from
> > > > > > > > the
> > > > > > > > > > above, e.g. urgent topics should not be added just
> because
> > of
> > > > > > max.age
> > > > > > > > > > expiration, but should only be added if there are sending
> > > data
> > > > > > > pending.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Guozhang
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne <
> > > > bbyrne@confluent.io
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Guozhang,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the feedback!
> > > > > > > > > > >
> > > > > > > > > > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <
> > > > > wangguoz@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > 1. The addition of *metadata.expiry.ms <
> > > > > > > http://metadata.expiry.ms>
> > > > > > > > > > > *should
> > > > > > > > > > > > be included in the public interface. Also its
> semantics
> > > > needs
> > > > > > > more
> > > > > > > > > > > > clarification (since previously it is hard-coded
> > > internally
> > > > > we
> > > > > > do
> > > > > > > > not
> > > > > > > > > > > need
> > > > > > > > > > > > to explain it publicly, but now with the configurable
> > > value
> > > > > we
> > > > > > do
> > > > > > > > > > need).
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > This was an oversight. Done.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > 2. There are a couple of hard-coded parameters like
> 25
> > > and
> > > > > 0.5
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > > proposal, maybe we need to explain why these magic
> > values
> > > > > makes
> > > > > > > > sense
> > > > > > > > > > in
> > > > > > > > > > > > common scenarios.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > So these are pretty fuzzy numbers, and seemed to be a
> > > decent
> > > > > > > balance
> > > > > > > > > > > between trade-offs. I've updated the target size to
> > account
> > > > for
> > > > > > > > setups
> > > > > > > > > > with
> > > > > > > > > > > a large number of topics or a shorter refresh time, as
> > well
> > > > as
> > > > > > > added
> > > > > > > > > some
> > > > > > > > > > > light rationale.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > 3. In the Urgent set condition, do you actually mean
> > > "with
> > > > no
> > > > > > > > cached
> > > > > > > > > > > > metadata AND there are existing data buffered for the
> > > > topic"?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Yes, fixed.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > One concern I have is whether or not we may
> introduce a
> > > > > > > regression,
> > > > > > > > > > > > especially during producer startup such that since we
> > > only
> > > > > > > require
> > > > > > > > up
> > > > > > > > > > to
> > > > > > > > > > > 25
> > > > > > > > > > > > topics each request, it may cause the send data to be
> > > > > buffered
> > > > > > > more
> > > > > > > > > > time
> > > > > > > > > > > > than now due to metadata not available. I understand
> > this
> > > > is
> > > > > a
> > > > > > > > > > > acknowledged
> > > > > > > > > > > > trade-off in our design but any regression that may
> > > surface
> > > > > to
> > > > > > > > users
> > > > > > > > > > need
> > > > > > > > > > > > to be very carefully considered. I'm wondering, e.g.
> if
> > > we
> > > > > can
> > > > > > > > tweak
> > > > > > > > > > our
> > > > > > > > > > > > algorithm for the Urgent set, e.g. to consider those
> > with
> > > > non
> > > > > > > > cached
> > > > > > > > > > > > metadata have higher priority than those who have
> > elapsed
> > > > > > max.age
> > > > > > > > but
> > > > > > > > > > not
> > > > > > > > > > > > yet have been called for sending. More specifically:
> > > > > > > > > > > >
> > > > > > > > > > > > Urgent: topics that have been requested for sending
> but
> > > no
> > > > > > cached
> > > > > > > > > > > metadata,
> > > > > > > > > > > > and topics that have send request failed with e.g.
> > > > > NOT_LEADER.
> > > > > > > > > > > > Non-urgent: topics that are not in Urgent but have
> > > expired
> > > > > > > max.age.
> > > > > > > > > > > >
> > > > > > > > > > > > Then when sending metadata, we always send ALL in the
> > > > urgent
> > > > > > > (i.e.
> > > > > > > > > > ignore
> > > > > > > > > > > > the size limit), and only when they do not exceed the
> > > size
> > > > > > limit,
> > > > > > > > > > > consider
> > > > > > > > > > > > fill in more topics from Non-urgent up to the size
> > limit.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I think we're on the same page here. Urgent topics
> ignore
> > > the
> > > > > > > target
> > > > > > > > > > > metadata RPC size, and are not bound by it in any way,
> > i.e.
> > > > if
> > > > > > > > there's
> > > > > > > > > > 100
> > > > > > > > > > > urgent topics, we'll fetch all 100 in a single RPC.
> Like
> > > you
> > > > > say,
> > > > > > > > > > however,
> > > > > > > > > > > if a topic transitions to urgent and there's several
> > > > non-urgent
> > > > > > > ones,
> > > > > > > > > > we'll
> > > > > > > > > > > piggyback the non-urgent updates up to the target size.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >  Brian
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Nov 20, 2019 at 7:00 PM deng ziming <
> > > > > > > > dengziming1993@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I think it's ok, and you can add another issue
> about
> > > > > > > > `asynchronous
> > > > > > > > > > > > > metadata` if `topic expiry` is not enough.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <
> > > > > > > bbyrne@confluent.io
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hello all,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I've refactored the KIP to remove implementing
> > > > > asynchronous
> > > > > > > > > > metadata
> > > > > > > > > > > > > > fetching in the producer during send(). It's now
> > > > > > exclusively
> > > > > > > > > > focused
> > > > > > > > > > > on
> > > > > > > > > > > > > > reducing the topic metadata fetch payload and
> > > proposes
> > > > > > > adding a
> > > > > > > > > new
> > > > > > > > > > > > > > configuration flag to control topic expiry
> > behavior.
> > > > > Please
> > > > > > > > take
> > > > > > > > > a
> > > > > > > > > > > look
> > > > > > > > > > > > > > when possible.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > > > > > > > > > > >
> > > > > > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Brian
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <
> > > > > > > > bbyrne@confluent.io
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Lucas, Guozhang,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you for the comments. Good point on
> > > > > > > > > > METADATA_MAX_AGE_CONFIG -
> > > > > > > > > > > > it
> > > > > > > > > > > > > > > looks like the ProducerMetadata was
> > differentiating
> > > > > > between
> > > > > > > > > > expiry
> > > > > > > > > > > > and
> > > > > > > > > > > > > > > refresh, but it should be unnecessary to do so
> > once
> > > > the
> > > > > > > cost
> > > > > > > > of
> > > > > > > > > > > > > fetching
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > single topic's metadata is reduced.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I've updated the rejected alternatives and
> > removed
> > > > the
> > > > > > > config
> > > > > > > > > > > > > variables.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Brian
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <
> > > > > > > > > wangguoz@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >> Hello Brian,
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> Thanks for the KIP.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> I think using asynchronous metadata update to
> > > > address
> > > > > 1)
> > > > > > > > > > metadata
> > > > > > > > > > > > > update
> > > > > > > > > > > > > > >> blocking send, but for other issues, currently
> > at
> > > > > > producer
> > > > > > > > we
> > > > > > > > > do
> > > > > > > > > > > > have
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > >> configurable `METADATA_MAX_AGE_CONFIG` similar
> > to
> > > > > > > consumer,
> > > > > > > > by
> > > > > > > > > > > > default
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > >> 5min. So maybe we do not need to introduce new
> > > > configs
> > > > > > > here,
> > > > > > > > > but
> > > > > > > > > > > > only
> > > > > > > > > > > > > > >> change the semantics of that config from
> global
> > > > expiry
> > > > > > > > (today
> > > > > > > > > we
> > > > > > > > > > > > just
> > > > > > > > > > > > > > >> enforce a full metadata update for the whole
> > > > cluster)
> > > > > to
> > > > > > > > > > > > single-topic
> > > > > > > > > > > > > > >> expiry, and we can also extend its expiry
> > deadline
> > > > > > > whenever
> > > > > > > > > that
> > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > >> is successfully used to send a produce
> request.
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> Guozhang
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas
> Bradstreet
> > <
> > > > > > > > > > > lucas@confluent.io
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> > Hi Brian,
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > This looks great, and should help reduce
> > > blocking
> > > > > and
> > > > > > > high
> > > > > > > > > > > > metadata
> > > > > > > > > > > > > > >> request
> > > > > > > > > > > > > > >> > volumes when the producer is sending to
> large
> > > > > numbers
> > > > > > of
> > > > > > > > > > topics,
> > > > > > > > > > > > > > >> especially
> > > > > > > > > > > > > > >> > at low volumes. I think the approach to make
> > > > > metadata
> > > > > > > > > fetching
> > > > > > > > > > > > > > >> asynchronous
> > > > > > > > > > > > > > >> > and batch metadata requests together will
> help
> > > > > > > > > significantly.
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > The only other approach I can think of is to
> > > allow
> > > > > > users
> > > > > > > > to
> > > > > > > > > > > supply
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > >> > producer with the expected topics upfront,
> > > > allowing
> > > > > > the
> > > > > > > > > > producer
> > > > > > > > > > > > to
> > > > > > > > > > > > > > >> perform
> > > > > > > > > > > > > > >> > a single initial metadata request before any
> > > sends
> > > > > > > occur.
> > > > > > > > I
> > > > > > > > > > see
> > > > > > > > > > > no
> > > > > > > > > > > > > > real
> > > > > > > > > > > > > > >> > advantages to this approach compared to the
> > > async
> > > > > > method
> > > > > > > > > > you’ve
> > > > > > > > > > > > > > >> proposed,
> > > > > > > > > > > > > > >> > but maybe we could add it to the rejected
> > > > > alternatives
> > > > > > > > > > section?
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Lucas
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <
> > > > > > > > > > bbyrne@confluent.io>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > > I've updated the 'Proposed Changes' to
> > include
> > > > two
> > > > > > new
> > > > > > > > > > > producer
> > > > > > > > > > > > > > >> > > configuration variables: topic.expiry.ms
> > and
> > > > > > > > > > topic.refresh.ms
> > > > > > > > > > > .
> > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > >> > take
> > > > > > > > > > > > > > >> > > a look.
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > Thanks,
> > > > > > > > > > > > > > >> > > Brian
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian
> > Byrne <
> > > > > > > > > > > > bbyrne@confluent.io
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >> > wrote:
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > > Dev team,
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > > > Requesting discussion for improvement to
> > the
> > > > > > > producer
> > > > > > > > > when
> > > > > > > > > > > > > dealing
> > > > > > > > > > > > > > >> > with a
> > > > > > > > > > > > > > >> > > > large number of topics.
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > > > KIP:
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > > > JIRA:
> > > > > > > > https://issues.apache.org/jira/browse/KAFKA-8904
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > > > Thoughts and feedback would be
> > appreciated.
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > > > > >> > > > Brian
> > > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> --
> > > > > > > > > > > > > > >> -- Guozhang
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > -- Guozhang
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best,
> > > > > > Stanislav
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Brian Byrne <bb...@confluent.io>.
Hi Guozhang,

Only evicting due to memory pressure (or if the metadata is stale, to
prevent infinitely refreshing no longer used topics when memory's not an
issue) makes sense, and would be the most intuitive way of going about it.

From an implementation perspective, it's a little more difficult because
it'd require the producer's BufferPool to feed back into the Metadata
object, introduce a sane limit on the consumer side, and add the concept of
potentially evicting stale metadata instead of refreshing it. So it's a
little involved, and I'd prefer to save that for a round 2, if necessary.

I've updated the KIP to change the default eviction time to 2 minutes.

Thanks,
Brian


On Thu, Jan 2, 2020 at 1:28 PM Guozhang Wang <wa...@gmail.com> wrote:

> On Thu, Jan 2, 2020 at 12:42 PM Brian Byrne <bb...@confluent.io> wrote:
>
> > Hi Guozhang,
> >
> > You're correct in that, with the current defined default values, it's
> > likely we'll be more proactive in refreshing metadata for
> > once/rarely-touched topics, which isn't ideal. It'd be at most 2x the
> rate,
> > but we can certainly do better. I made the default eviction rate to
> mirror
> > the current hard-coded value in effort to minimize surprise, but perhaps
> > there's a more suitable middle ground.
> >
> > Your mental model is good. I also agree that having a more aggressive
> > eviction rate, or at least one that's a fair amount shorter than the
> topic
> > refresh rate, behaves better for the once/rarely-touched topics and is
> more
> > appropriate overall. One thing to note though is that, if metadata is
> > evicted + refetched, there'll be a small latency bubble for the producer.
> >
> > The concern I had with shortening the eviction rate dramatically was that
> > it could have unexpected consequences for clusters that keep the default,
> > but whose P1 is slightly longer than the value. For example, if the
> > metadata.evict.ms is 30 seconds, a periodic access rate of 40 seconds
> > could
> > request the topic metadata up to 7 times during a 5-minute interval. Of
> > course, this is easily fixed by setting the eviction rate appropriately
> by
> > the client, but perhaps lowering the default to a moderate ~2 minutes
> might
> > be a better balance to avoid surprise of worst-case behavior.
> >
> > Something else that could be considered is defining a "soft topic
> metadata
> > cache size", which would only evict unused topics (last used >=
> > metadata.evict.ms) if the soft cache size was exceeded or the metadata
> was
> > stale. However, it still doesn't save the client from degenerate behavior
> > where P1 + e ~= metadata.evict.ms, so I'm more partial to keeping the
> > eviction rate to ~2+ minutes.
> >
> > This makes sense.
>
> I think since the scenario that a producer sends to a topic once and never
> did again is not very common, and more commonly a single producer would
> only send to a very small subset of topics hosted on the cluster during its
> life time, we can actually use a cache size to guide when should we
> consider eviction (my understanding is that the memory pressure is the only
> reason we want to evict, lmk if there are other motivations). A wild
> thought is that we do not even need to introduce a new config since we have
> a `buffer.size` already and we can calculate the exact bytes-size of each
> topic metadata entry as well, so we can, say, consider evicting only when
> that buffer size is reached together with buffered data AND metadata in
> bytes. Then in practice we would not need to evict metadata anyways, AND if
> we do need to, having some aggressive eviction threshold is more valid (btw
> I think I also agree that something like 2minutes is fine).
>
>
> > Yet another option is to have topics in the "U" set (i.e. topics
> requesting
> > non-urgent metadata update) exclude those that are nearing their eviction
> > duration. I fear this might be over-engineering a potential non-problem
> > though. Thoughts?
> >
>
> Yeah I also feel it is a bit overkill for this.
>
>
> >
> > Thanks,
> > Brian
> >
> >
> >
> > On Thu, Jan 2, 2020 at 11:44 AM Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> > > Hello Brain,
> > >
> > > For the newly added `metadata.evict.ms` config, since its default
> value
> > is
> > > set as the same as `metadata.max.age.ms` and most users would not
> > override
> > > default config values, would it be possible for scenarios that producer
> > > send to a specific topic once and never send again, we may end up
> > > refreshing the metadata that are about to be removed soon? In other
> words
> > > I'm wondering if for common cases the eviction period should be larger
> > than
> > > max.age or vice versa. This is my mental model:
> > >
> > > 1) If we evict a topic eagerly compared with refreshing it, e.g. after
> > time
> > > T1 has elapsed, and then after T2 (> T1) elapsed that topic is
> requested
> > > for sending again, we would force a metadata update which includes this
> > > topic (remember we now always batch a set of topics in each metadata
> > > request)
> > > 2) if we evict a topic more lazily compared with refreshing it, say
> after
> > > T2 has elapsed, but before that after T1 (< T2) we have to refresh it
> > > already, it is included in that metadata request.
> > >
> > > So from the number of metadata requests it may incur, evicting more
> > eagerly
> > > seems okay --- i.e. even if we do not evict this topic, it would still
> > > cause a metadata request after the max.age.ms anyways so evicting it
> > early
> > > does not make things worse. More generally, if we know a topic's
> sending
> > > frequency is P1 (i.e. every P1 seconds we send once for that topic),
> and
> > > eviction period is P2, and the metadata.max.age is P3. Then the optimal
> > > scenario in terms of metadata update frequency seems to be P1 = P2 <
> P3.
> > So
> > > I feel making the default value for metadata.evict.ms smaller is fine,
> > > e.g.
> > > say 1minute or 30seconds. WDYT?
> > >
> > >
> > > Guozhang
> > >
> > > On Thu, Dec 26, 2019 at 7:08 AM Brian Byrne <bb...@confluent.io>
> wrote:
> > >
> > > > Hi Stanislav,
> > > >
> > > > Appreciate the feedback!
> > > >
> > > > 1. You're correct. I've added notes to the KIP to clarify.
> > > >
> > > > 2. Yes it should. Fixed.
> > > >
> > > > 3. So I made a mistake when generalizing the target refresh size,
> which
> > > > should have been using `metadata.max.age.ms` instead of `
> > > metadata.evict.ms
> > > > `.
> > > > Therefore, `metadata.max.age.ms` still controls the target refresh
> > time
> > > of
> > > > active topic metadata, and doesn't change in that regard. You're
> right
> > in
> > > > that the U set will make metadata refreshing more proactive in some
> > > cases,
> > > > which is done to smooth out the metadata RPCs over time and lessen
> the
> > > > potential bulk of some of the RPC responses by working in smaller
> > units.
> > > > The T set is what logically exists today and doesn't change.
> > > >
> > > > Thanks,
> > > > Brian
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Dec 26, 2019 at 2:16 AM Stanislav Kozlovski <
> > > > stanislav@confluent.io>
> > > > wrote:
> > > >
> > > > > Hey Brian,
> > > > >
> > > > > 1. Could we more explicitly clarify the behavior of the algorithm
> > when
> > > > `|T|
> > > > > > TARGET_METADATA_FETCH SIZE` ? I assume we ignore the config in
> that
> > > > > scenario
> > > > > 2. Should `targetMetadataFetchSize = Math.max(topicsPerSec / 10,
> 20)`
> > > be
> > > > > `topicsPerSec * 10` ?
> > > > > 3. When is this new algorithm applied? To confirm my understanding
> -
> > > what
> > > > > is the behavior of `metadata.max.age.ms` after this KIP? Are we
> > > adding a
> > > > > new, more proactive metadata fetch for topics in |U|?
> > > > >
> > > > > Thanks,
> > > > > Stanislav
> > > > >
> > > > > On Thu, Dec 19, 2019 at 11:37 PM Brian Byrne <bb...@confluent.io>
> > > > wrote:
> > > > >
> > > > > > Hello everyone,
> > > > > >
> > > > > > For all interested, please take a look at the proposed algorithm
> as
> > > I'd
> > > > > > like to get more feedback. I'll call for a vote once the break is
> > > over.
> > > > > >
> > > > > > Thanks,
> > > > > > Brian
> > > > > >
> > > > > > On Mon, Dec 9, 2019 at 10:18 PM Guozhang Wang <
> wangguoz@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Sounds good, I agree that should not make a big difference in
> > > > practice.
> > > > > > >
> > > > > > > On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne <
> bbyrne@confluent.io>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Guozhang,
> > > > > > > >
> > > > > > > > I see, we agree on the topic threshold not applying to urgent
> > > > topics,
> > > > > > but
> > > > > > > > differ slightly on what should be considered urgent. I would
> > > argue
> > > > > that
> > > > > > > we
> > > > > > > > should consider topics nearing the metadata.max.age.ms to be
> > > > urgent
> > > > > > > since
> > > > > > > > they may still be well within the metadata.expiry.ms. That
> is,
> > > the
> > > > > > > client
> > > > > > > > still considers these topics to be relevant (not expired),
> but
> > > > > doesn't
> > > > > > > want
> > > > > > > > to incur the latency bubble of having to wait for the
> metadata
> > to
> > > > be
> > > > > > > > re-fetched if it's stale. This could be a frequent case if
> the
> > > > > > > > metadata.max.age.ms << metadata.expiry.ms.
> > > > > > > >
> > > > > > > > In practice, I wouldn't expect this to make a noticeable
> > > difference
> > > > > so
> > > > > > I
> > > > > > > > don't have a strong leaning, but the current behavior today
> is
> > to
> > > > > > > > aggressively refresh the metadata of stale topics by
> ensuring a
> > > > > refresh
> > > > > > > is
> > > > > > > > triggered before that metadata.max.age.ms duration elapses.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Brian
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang <
> > > wangguoz@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello Brian,
> > > > > > > > >
> > > > > > > > > Thanks for your explanation, could you then update the wiki
> > > page
> > > > > for
> > > > > > > the
> > > > > > > > > algorithm part since when I read it, I thought it was
> > different
> > > > > from
> > > > > > > the
> > > > > > > > > above, e.g. urgent topics should not be added just because
> of
> > > > > max.age
> > > > > > > > > expiration, but should only be added if there are sending
> > data
> > > > > > pending.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Guozhang
> > > > > > > > >
> > > > > > > > > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne <
> > > bbyrne@confluent.io
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Guozhang,
> > > > > > > > > >
> > > > > > > > > > Thanks for the feedback!
> > > > > > > > > >
> > > > > > > > > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <
> > > > wangguoz@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > 1. The addition of *metadata.expiry.ms <
> > > > > > http://metadata.expiry.ms>
> > > > > > > > > > *should
> > > > > > > > > > > be included in the public interface. Also its semantics
> > > needs
> > > > > > more
> > > > > > > > > > > clarification (since previously it is hard-coded
> > internally
> > > > we
> > > > > do
> > > > > > > not
> > > > > > > > > > need
> > > > > > > > > > > to explain it publicly, but now with the configurable
> > value
> > > > we
> > > > > do
> > > > > > > > > need).
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This was an oversight. Done.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 2. There are a couple of hard-coded parameters like 25
> > and
> > > > 0.5
> > > > > in
> > > > > > > the
> > > > > > > > > > > proposal, maybe we need to explain why these magic
> values
> > > > makes
> > > > > > > sense
> > > > > > > > > in
> > > > > > > > > > > common scenarios.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > So these are pretty fuzzy numbers, and seemed to be a
> > decent
> > > > > > balance
> > > > > > > > > > between trade-offs. I've updated the target size to
> account
> > > for
> > > > > > > setups
> > > > > > > > > with
> > > > > > > > > > a large number of topics or a shorter refresh time, as
> well
> > > as
> > > > > > added
> > > > > > > > some
> > > > > > > > > > light rationale.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > 3. In the Urgent set condition, do you actually mean
> > "with
> > > no
> > > > > > > cached
> > > > > > > > > > > metadata AND there are existing data buffered for the
> > > topic"?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Yes, fixed.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > One concern I have is whether or not we may introduce a
> > > > > > regression,
> > > > > > > > > > > especially during producer startup such that since we
> > only
> > > > > > require
> > > > > > > up
> > > > > > > > > to
> > > > > > > > > > 25
> > > > > > > > > > > topics each request, it may cause the send data to be
> > > > buffered
> > > > > > more
> > > > > > > > > time
> > > > > > > > > > > than now due to metadata not available. I understand
> this
> > > is
> > > > a
> > > > > > > > > > acknowledged
> > > > > > > > > > > trade-off in our design but any regression that may
> > surface
> > > > to
> > > > > > > users
> > > > > > > > > need
> > > > > > > > > > > to be very carefully considered. I'm wondering, e.g. if
> > we
> > > > can
> > > > > > > tweak
> > > > > > > > > our
> > > > > > > > > > > algorithm for the Urgent set, e.g. to consider those
> with
> > > non
> > > > > > > cached
> > > > > > > > > > > metadata have higher priority than those who have
> elapsed
> > > > > max.age
> > > > > > > but
> > > > > > > > > not
> > > > > > > > > > > yet have been called for sending. More specifically:
> > > > > > > > > > >
> > > > > > > > > > > Urgent: topics that have been requested for sending but
> > no
> > > > > cached
> > > > > > > > > > metadata,
> > > > > > > > > > > and topics that have send request failed with e.g.
> > > > NOT_LEADER.
> > > > > > > > > > > Non-urgent: topics that are not in Urgent but have
> > expired
> > > > > > max.age.
> > > > > > > > > > >
> > > > > > > > > > > Then when sending metadata, we always send ALL in the
> > > urgent
> > > > > > (i.e.
> > > > > > > > > ignore
> > > > > > > > > > > the size limit), and only when they do not exceed the
> > size
> > > > > limit,
> > > > > > > > > > consider
> > > > > > > > > > > fill in more topics from Non-urgent up to the size
> limit.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I think we're on the same page here. Urgent topics ignore
> > the
> > > > > > target
> > > > > > > > > > metadata RPC size, and are not bound by it in any way,
> i.e.
> > > if
> > > > > > > there's
> > > > > > > > > 100
> > > > > > > > > > urgent topics, we'll fetch all 100 in a single RPC. Like
> > you
> > > > say,
> > > > > > > > > however,
> > > > > > > > > > if a topic transitions to urgent and there's several
> > > non-urgent
> > > > > > ones,
> > > > > > > > > we'll
> > > > > > > > > > piggyback the non-urgent updates up to the target size.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >  Brian
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Nov 20, 2019 at 7:00 PM deng ziming <
> > > > > > > dengziming1993@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I think it's ok, and you can add another issue about
> > > > > > > `asynchronous
> > > > > > > > > > > > metadata` if `topic expiry` is not enough.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <
> > > > > > bbyrne@confluent.io
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hello all,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've refactored the KIP to remove implementing
> > > > asynchronous
> > > > > > > > > metadata
> > > > > > > > > > > > > fetching in the producer during send(). It's now
> > > > > exclusively
> > > > > > > > > focused
> > > > > > > > > > on
> > > > > > > > > > > > > reducing the topic metadata fetch payload and
> > proposes
> > > > > > adding a
> > > > > > > > new
> > > > > > > > > > > > > configuration flag to control topic expiry
> behavior.
> > > > Please
> > > > > > > take
> > > > > > > > a
> > > > > > > > > > look
> > > > > > > > > > > > > when possible.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > > > > > > > > > >
> > > > > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Brian
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <
> > > > > > > bbyrne@confluent.io
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Lucas, Guozhang,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you for the comments. Good point on
> > > > > > > > > METADATA_MAX_AGE_CONFIG -
> > > > > > > > > > > it
> > > > > > > > > > > > > > looks like the ProducerMetadata was
> differentiating
> > > > > between
> > > > > > > > > expiry
> > > > > > > > > > > and
> > > > > > > > > > > > > > refresh, but it should be unnecessary to do so
> once
> > > the
> > > > > > cost
> > > > > > > of
> > > > > > > > > > > > fetching
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > single topic's metadata is reduced.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I've updated the rejected alternatives and
> removed
> > > the
> > > > > > config
> > > > > > > > > > > > variables.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Brian
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <
> > > > > > > > wangguoz@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> Hello Brian,
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Thanks for the KIP.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> I think using asynchronous metadata update to
> > > address
> > > > 1)
> > > > > > > > > metadata
> > > > > > > > > > > > update
> > > > > > > > > > > > > >> blocking send, but for other issues, currently
> at
> > > > > producer
> > > > > > > we
> > > > > > > > do
> > > > > > > > > > > have
> > > > > > > > > > > > a
> > > > > > > > > > > > > >> configurable `METADATA_MAX_AGE_CONFIG` similar
> to
> > > > > > consumer,
> > > > > > > by
> > > > > > > > > > > default
> > > > > > > > > > > > > is
> > > > > > > > > > > > > >> 5min. So maybe we do not need to introduce new
> > > configs
> > > > > > here,
> > > > > > > > but
> > > > > > > > > > > only
> > > > > > > > > > > > > >> change the semantics of that config from global
> > > expiry
> > > > > > > (today
> > > > > > > > we
> > > > > > > > > > > just
> > > > > > > > > > > > > >> enforce a full metadata update for the whole
> > > cluster)
> > > > to
> > > > > > > > > > > single-topic
> > > > > > > > > > > > > >> expiry, and we can also extend its expiry
> deadline
> > > > > > whenever
> > > > > > > > that
> > > > > > > > > > > > > metadata
> > > > > > > > > > > > > >> is successfully used to send a produce request.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Guozhang
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet
> <
> > > > > > > > > > lucas@confluent.io
> > > > > > > > > > > >
> > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> > Hi Brian,
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > This looks great, and should help reduce
> > blocking
> > > > and
> > > > > > high
> > > > > > > > > > > metadata
> > > > > > > > > > > > > >> request
> > > > > > > > > > > > > >> > volumes when the producer is sending to large
> > > > numbers
> > > > > of
> > > > > > > > > topics,
> > > > > > > > > > > > > >> especially
> > > > > > > > > > > > > >> > at low volumes. I think the approach to make
> > > > metadata
> > > > > > > > fetching
> > > > > > > > > > > > > >> asynchronous
> > > > > > > > > > > > > >> > and batch metadata requests together will help
> > > > > > > > significantly.
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > The only other approach I can think of is to
> > allow
> > > > > users
> > > > > > > to
> > > > > > > > > > supply
> > > > > > > > > > > > the
> > > > > > > > > > > > > >> > producer with the expected topics upfront,
> > > allowing
> > > > > the
> > > > > > > > > producer
> > > > > > > > > > > to
> > > > > > > > > > > > > >> perform
> > > > > > > > > > > > > >> > a single initial metadata request before any
> > sends
> > > > > > occur.
> > > > > > > I
> > > > > > > > > see
> > > > > > > > > > no
> > > > > > > > > > > > > real
> > > > > > > > > > > > > >> > advantages to this approach compared to the
> > async
> > > > > method
> > > > > > > > > you’ve
> > > > > > > > > > > > > >> proposed,
> > > > > > > > > > > > > >> > but maybe we could add it to the rejected
> > > > alternatives
> > > > > > > > > section?
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Lucas
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <
> > > > > > > > > bbyrne@confluent.io>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > > I've updated the 'Proposed Changes' to
> include
> > > two
> > > > > new
> > > > > > > > > > producer
> > > > > > > > > > > > > >> > > configuration variables: topic.expiry.ms
> and
> > > > > > > > > topic.refresh.ms
> > > > > > > > > > .
> > > > > > > > > > > > > Please
> > > > > > > > > > > > > >> > take
> > > > > > > > > > > > > >> > > a look.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > Thanks,
> > > > > > > > > > > > > >> > > Brian
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian
> Byrne <
> > > > > > > > > > > bbyrne@confluent.io
> > > > > > > > > > > > >
> > > > > > > > > > > > > >> > wrote:
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > > Dev team,
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > Requesting discussion for improvement to
> the
> > > > > > producer
> > > > > > > > when
> > > > > > > > > > > > dealing
> > > > > > > > > > > > > >> > with a
> > > > > > > > > > > > > >> > > > large number of topics.
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > KIP:
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > JIRA:
> > > > > > > https://issues.apache.org/jira/browse/KAFKA-8904
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > Thoughts and feedback would be
> appreciated.
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > > > >> > > > Brian
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> --
> > > > > > > > > > > > > >> -- Guozhang
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > -- Guozhang
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > -- Guozhang
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Guozhang Wang <wa...@gmail.com>.
On Thu, Jan 2, 2020 at 12:42 PM Brian Byrne <bb...@confluent.io> wrote:

> Hi Guozhang,
>
> You're correct in that, with the current defined default values, it's
> likely we'll be more proactive in refreshing metadata for
> once/rarely-touched topics, which isn't ideal. It'd be at most 2x the rate,
> but we can certainly do better. I made the default eviction rate to mirror
> the current hard-coded value in effort to minimize surprise, but perhaps
> there's a more suitable middle ground.
>
> Your mental model is good. I also agree that having a more aggressive
> eviction rate, or at least one that's a fair amount shorter than the topic
> refresh rate, behaves better for the once/rarely-touched topics and is more
> appropriate overall. One thing to note though is that, if metadata is
> evicted + refetched, there'll be a small latency bubble for the producer.
>
> The concern I had with shortening the eviction rate dramatically was that
> it could have unexpected consequences for clusters that keep the default,
> but whose P1 is slightly longer than the value. For example, if the
> metadata.evict.ms is 30 seconds, a periodic access rate of 40 seconds
> could
> request the topic metadata up to 7 times during a 5-minute interval. Of
> course, this is easily fixed by setting the eviction rate appropriately by
> the client, but perhaps lowering the default to a moderate ~2 minutes might
> be a better balance to avoid surprise of worst-case behavior.
>
> Something else that could be considered is defining a "soft topic metadata
> cache size", which would only evict unused topics (last used >=
> metadata.evict.ms) if the soft cache size was exceeded or the metadata was
> stale. However, it still doesn't save the client from degenerate behavior
> where P1 + e ~= metadata.evict.ms, so I'm more partial to keeping the
> eviction rate to ~2+ minutes.
>
> This makes sense.

I think since the scenario that a producer sends to a topic once and never
did again is not very common, and more commonly a single producer would
only send to a very small subset of topics hosted on the cluster during its
life time, we can actually use a cache size to guide when should we
consider eviction (my understanding is that the memory pressure is the only
reason we want to evict, lmk if there are other motivations). A wild
thought is that we do not even need to introduce a new config since we have
a `buffer.size` already and we can calculate the exact bytes-size of each
topic metadata entry as well, so we can, say, consider evicting only when
that buffer size is reached together with buffered data AND metadata in
bytes. Then in practice we would not need to evict metadata anyways, AND if
we do need to, having some aggressive eviction threshold is more valid (btw
I think I also agree that something like 2minutes is fine).


> Yet another option is to have topics in the "U" set (i.e. topics requesting
> non-urgent metadata update) exclude those that are nearing their eviction
> duration. I fear this might be over-engineering a potential non-problem
> though. Thoughts?
>

Yeah I also feel it is a bit overkill for this.


>
> Thanks,
> Brian
>
>
>
> On Thu, Jan 2, 2020 at 11:44 AM Guozhang Wang <wa...@gmail.com> wrote:
>
> > Hello Brain,
> >
> > For the newly added `metadata.evict.ms` config, since its default value
> is
> > set as the same as `metadata.max.age.ms` and most users would not
> override
> > default config values, would it be possible for scenarios that producer
> > send to a specific topic once and never send again, we may end up
> > refreshing the metadata that are about to be removed soon? In other words
> > I'm wondering if for common cases the eviction period should be larger
> than
> > max.age or vice versa. This is my mental model:
> >
> > 1) If we evict a topic eagerly compared with refreshing it, e.g. after
> time
> > T1 has elapsed, and then after T2 (> T1) elapsed that topic is requested
> > for sending again, we would force a metadata update which includes this
> > topic (remember we now always batch a set of topics in each metadata
> > request)
> > 2) if we evict a topic more lazily compared with refreshing it, say after
> > T2 has elapsed, but before that after T1 (< T2) we have to refresh it
> > already, it is included in that metadata request.
> >
> > So from the number of metadata requests it may incur, evicting more
> eagerly
> > seems okay --- i.e. even if we do not evict this topic, it would still
> > cause a metadata request after the max.age.ms anyways so evicting it
> early
> > does not make things worse. More generally, if we know a topic's sending
> > frequency is P1 (i.e. every P1 seconds we send once for that topic), and
> > eviction period is P2, and the metadata.max.age is P3. Then the optimal
> > scenario in terms of metadata update frequency seems to be P1 = P2 < P3.
> So
> > I feel making the default value for metadata.evict.ms smaller is fine,
> > e.g.
> > say 1minute or 30seconds. WDYT?
> >
> >
> > Guozhang
> >
> > On Thu, Dec 26, 2019 at 7:08 AM Brian Byrne <bb...@confluent.io> wrote:
> >
> > > Hi Stanislav,
> > >
> > > Appreciate the feedback!
> > >
> > > 1. You're correct. I've added notes to the KIP to clarify.
> > >
> > > 2. Yes it should. Fixed.
> > >
> > > 3. So I made a mistake when generalizing the target refresh size, which
> > > should have been using `metadata.max.age.ms` instead of `
> > metadata.evict.ms
> > > `.
> > > Therefore, `metadata.max.age.ms` still controls the target refresh
> time
> > of
> > > active topic metadata, and doesn't change in that regard. You're right
> in
> > > that the U set will make metadata refreshing more proactive in some
> > cases,
> > > which is done to smooth out the metadata RPCs over time and lessen the
> > > potential bulk of some of the RPC responses by working in smaller
> units.
> > > The T set is what logically exists today and doesn't change.
> > >
> > > Thanks,
> > > Brian
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Dec 26, 2019 at 2:16 AM Stanislav Kozlovski <
> > > stanislav@confluent.io>
> > > wrote:
> > >
> > > > Hey Brian,
> > > >
> > > > 1. Could we more explicitly clarify the behavior of the algorithm
> when
> > > `|T|
> > > > > TARGET_METADATA_FETCH SIZE` ? I assume we ignore the config in that
> > > > scenario
> > > > 2. Should `targetMetadataFetchSize = Math.max(topicsPerSec / 10, 20)`
> > be
> > > > `topicsPerSec * 10` ?
> > > > 3. When is this new algorithm applied? To confirm my understanding -
> > what
> > > > is the behavior of `metadata.max.age.ms` after this KIP? Are we
> > adding a
> > > > new, more proactive metadata fetch for topics in |U|?
> > > >
> > > > Thanks,
> > > > Stanislav
> > > >
> > > > On Thu, Dec 19, 2019 at 11:37 PM Brian Byrne <bb...@confluent.io>
> > > wrote:
> > > >
> > > > > Hello everyone,
> > > > >
> > > > > For all interested, please take a look at the proposed algorithm as
> > I'd
> > > > > like to get more feedback. I'll call for a vote once the break is
> > over.
> > > > >
> > > > > Thanks,
> > > > > Brian
> > > > >
> > > > > On Mon, Dec 9, 2019 at 10:18 PM Guozhang Wang <wa...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Sounds good, I agree that should not make a big difference in
> > > practice.
> > > > > >
> > > > > > On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne <bb...@confluent.io>
> > > > wrote:
> > > > > >
> > > > > > > Hi Guozhang,
> > > > > > >
> > > > > > > I see, we agree on the topic threshold not applying to urgent
> > > topics,
> > > > > but
> > > > > > > differ slightly on what should be considered urgent. I would
> > argue
> > > > that
> > > > > > we
> > > > > > > should consider topics nearing the metadata.max.age.ms to be
> > > urgent
> > > > > > since
> > > > > > > they may still be well within the metadata.expiry.ms. That is,
> > the
> > > > > > client
> > > > > > > still considers these topics to be relevant (not expired), but
> > > > doesn't
> > > > > > want
> > > > > > > to incur the latency bubble of having to wait for the metadata
> to
> > > be
> > > > > > > re-fetched if it's stale. This could be a frequent case if the
> > > > > > > metadata.max.age.ms << metadata.expiry.ms.
> > > > > > >
> > > > > > > In practice, I wouldn't expect this to make a noticeable
> > difference
> > > > so
> > > > > I
> > > > > > > don't have a strong leaning, but the current behavior today is
> to
> > > > > > > aggressively refresh the metadata of stale topics by ensuring a
> > > > refresh
> > > > > > is
> > > > > > > triggered before that metadata.max.age.ms duration elapses.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Brian
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang <
> > wangguoz@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hello Brian,
> > > > > > > >
> > > > > > > > Thanks for your explanation, could you then update the wiki
> > page
> > > > for
> > > > > > the
> > > > > > > > algorithm part since when I read it, I thought it was
> different
> > > > from
> > > > > > the
> > > > > > > > above, e.g. urgent topics should not be added just because of
> > > > max.age
> > > > > > > > expiration, but should only be added if there are sending
> data
> > > > > pending.
> > > > > > > >
> > > > > > > >
> > > > > > > > Guozhang
> > > > > > > >
> > > > > > > > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne <
> > bbyrne@confluent.io
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Guozhang,
> > > > > > > > >
> > > > > > > > > Thanks for the feedback!
> > > > > > > > >
> > > > > > > > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <
> > > wangguoz@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > 1. The addition of *metadata.expiry.ms <
> > > > > http://metadata.expiry.ms>
> > > > > > > > > *should
> > > > > > > > > > be included in the public interface. Also its semantics
> > needs
> > > > > more
> > > > > > > > > > clarification (since previously it is hard-coded
> internally
> > > we
> > > > do
> > > > > > not
> > > > > > > > > need
> > > > > > > > > > to explain it publicly, but now with the configurable
> value
> > > we
> > > > do
> > > > > > > > need).
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > This was an oversight. Done.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > 2. There are a couple of hard-coded parameters like 25
> and
> > > 0.5
> > > > in
> > > > > > the
> > > > > > > > > > proposal, maybe we need to explain why these magic values
> > > makes
> > > > > > sense
> > > > > > > > in
> > > > > > > > > > common scenarios.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > So these are pretty fuzzy numbers, and seemed to be a
> decent
> > > > > balance
> > > > > > > > > between trade-offs. I've updated the target size to account
> > for
> > > > > > setups
> > > > > > > > with
> > > > > > > > > a large number of topics or a shorter refresh time, as well
> > as
> > > > > added
> > > > > > > some
> > > > > > > > > light rationale.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > 3. In the Urgent set condition, do you actually mean
> "with
> > no
> > > > > > cached
> > > > > > > > > > metadata AND there are existing data buffered for the
> > topic"?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, fixed.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > One concern I have is whether or not we may introduce a
> > > > > regression,
> > > > > > > > > > especially during producer startup such that since we
> only
> > > > > require
> > > > > > up
> > > > > > > > to
> > > > > > > > > 25
> > > > > > > > > > topics each request, it may cause the send data to be
> > > buffered
> > > > > more
> > > > > > > > time
> > > > > > > > > > than now due to metadata not available. I understand this
> > is
> > > a
> > > > > > > > > acknowledged
> > > > > > > > > > trade-off in our design but any regression that may
> surface
> > > to
> > > > > > users
> > > > > > > > need
> > > > > > > > > > to be very carefully considered. I'm wondering, e.g. if
> we
> > > can
> > > > > > tweak
> > > > > > > > our
> > > > > > > > > > algorithm for the Urgent set, e.g. to consider those with
> > non
> > > > > > cached
> > > > > > > > > > metadata have higher priority than those who have elapsed
> > > > max.age
> > > > > > but
> > > > > > > > not
> > > > > > > > > > yet have been called for sending. More specifically:
> > > > > > > > > >
> > > > > > > > > > Urgent: topics that have been requested for sending but
> no
> > > > cached
> > > > > > > > > metadata,
> > > > > > > > > > and topics that have send request failed with e.g.
> > > NOT_LEADER.
> > > > > > > > > > Non-urgent: topics that are not in Urgent but have
> expired
> > > > > max.age.
> > > > > > > > > >
> > > > > > > > > > Then when sending metadata, we always send ALL in the
> > urgent
> > > > > (i.e.
> > > > > > > > ignore
> > > > > > > > > > the size limit), and only when they do not exceed the
> size
> > > > limit,
> > > > > > > > > consider
> > > > > > > > > > fill in more topics from Non-urgent up to the size limit.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think we're on the same page here. Urgent topics ignore
> the
> > > > > target
> > > > > > > > > metadata RPC size, and are not bound by it in any way, i.e.
> > if
> > > > > > there's
> > > > > > > > 100
> > > > > > > > > urgent topics, we'll fetch all 100 in a single RPC. Like
> you
> > > say,
> > > > > > > > however,
> > > > > > > > > if a topic transitions to urgent and there's several
> > non-urgent
> > > > > ones,
> > > > > > > > we'll
> > > > > > > > > piggyback the non-urgent updates up to the target size.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >  Brian
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Nov 20, 2019 at 7:00 PM deng ziming <
> > > > > > dengziming1993@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I think it's ok, and you can add another issue about
> > > > > > `asynchronous
> > > > > > > > > > > metadata` if `topic expiry` is not enough.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <
> > > > > bbyrne@confluent.io
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hello all,
> > > > > > > > > > > >
> > > > > > > > > > > > I've refactored the KIP to remove implementing
> > > asynchronous
> > > > > > > > metadata
> > > > > > > > > > > > fetching in the producer during send(). It's now
> > > > exclusively
> > > > > > > > focused
> > > > > > > > > on
> > > > > > > > > > > > reducing the topic metadata fetch payload and
> proposes
> > > > > adding a
> > > > > > > new
> > > > > > > > > > > > configuration flag to control topic expiry behavior.
> > > Please
> > > > > > take
> > > > > > > a
> > > > > > > > > look
> > > > > > > > > > > > when possible.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > > > > > > > > >
> > > > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Brian
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <
> > > > > > bbyrne@confluent.io
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Lucas, Guozhang,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you for the comments. Good point on
> > > > > > > > METADATA_MAX_AGE_CONFIG -
> > > > > > > > > > it
> > > > > > > > > > > > > looks like the ProducerMetadata was differentiating
> > > > between
> > > > > > > > expiry
> > > > > > > > > > and
> > > > > > > > > > > > > refresh, but it should be unnecessary to do so once
> > the
> > > > > cost
> > > > > > of
> > > > > > > > > > > fetching
> > > > > > > > > > > > a
> > > > > > > > > > > > > single topic's metadata is reduced.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've updated the rejected alternatives and removed
> > the
> > > > > config
> > > > > > > > > > > variables.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Brian
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <
> > > > > > > wangguoz@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >> Hello Brian,
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Thanks for the KIP.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> I think using asynchronous metadata update to
> > address
> > > 1)
> > > > > > > > metadata
> > > > > > > > > > > update
> > > > > > > > > > > > >> blocking send, but for other issues, currently at
> > > > producer
> > > > > > we
> > > > > > > do
> > > > > > > > > > have
> > > > > > > > > > > a
> > > > > > > > > > > > >> configurable `METADATA_MAX_AGE_CONFIG` similar to
> > > > > consumer,
> > > > > > by
> > > > > > > > > > default
> > > > > > > > > > > > is
> > > > > > > > > > > > >> 5min. So maybe we do not need to introduce new
> > configs
> > > > > here,
> > > > > > > but
> > > > > > > > > > only
> > > > > > > > > > > > >> change the semantics of that config from global
> > expiry
> > > > > > (today
> > > > > > > we
> > > > > > > > > > just
> > > > > > > > > > > > >> enforce a full metadata update for the whole
> > cluster)
> > > to
> > > > > > > > > > single-topic
> > > > > > > > > > > > >> expiry, and we can also extend its expiry deadline
> > > > > whenever
> > > > > > > that
> > > > > > > > > > > > metadata
> > > > > > > > > > > > >> is successfully used to send a produce request.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Guozhang
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <
> > > > > > > > > lucas@confluent.io
> > > > > > > > > > >
> > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> > Hi Brian,
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > This looks great, and should help reduce
> blocking
> > > and
> > > > > high
> > > > > > > > > > metadata
> > > > > > > > > > > > >> request
> > > > > > > > > > > > >> > volumes when the producer is sending to large
> > > numbers
> > > > of
> > > > > > > > topics,
> > > > > > > > > > > > >> especially
> > > > > > > > > > > > >> > at low volumes. I think the approach to make
> > > metadata
> > > > > > > fetching
> > > > > > > > > > > > >> asynchronous
> > > > > > > > > > > > >> > and batch metadata requests together will help
> > > > > > > significantly.
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > The only other approach I can think of is to
> allow
> > > > users
> > > > > > to
> > > > > > > > > supply
> > > > > > > > > > > the
> > > > > > > > > > > > >> > producer with the expected topics upfront,
> > allowing
> > > > the
> > > > > > > > producer
> > > > > > > > > > to
> > > > > > > > > > > > >> perform
> > > > > > > > > > > > >> > a single initial metadata request before any
> sends
> > > > > occur.
> > > > > > I
> > > > > > > > see
> > > > > > > > > no
> > > > > > > > > > > > real
> > > > > > > > > > > > >> > advantages to this approach compared to the
> async
> > > > method
> > > > > > > > you’ve
> > > > > > > > > > > > >> proposed,
> > > > > > > > > > > > >> > but maybe we could add it to the rejected
> > > alternatives
> > > > > > > > section?
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Lucas
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <
> > > > > > > > bbyrne@confluent.io>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > > I've updated the 'Proposed Changes' to include
> > two
> > > > new
> > > > > > > > > producer
> > > > > > > > > > > > >> > > configuration variables: topic.expiry.ms and
> > > > > > > > topic.refresh.ms
> > > > > > > > > .
> > > > > > > > > > > > Please
> > > > > > > > > > > > >> > take
> > > > > > > > > > > > >> > > a look.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > Thanks,
> > > > > > > > > > > > >> > > Brian
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <
> > > > > > > > > > bbyrne@confluent.io
> > > > > > > > > > > >
> > > > > > > > > > > > >> > wrote:
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > > Dev team,
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > Requesting discussion for improvement to the
> > > > > producer
> > > > > > > when
> > > > > > > > > > > dealing
> > > > > > > > > > > > >> > with a
> > > > > > > > > > > > >> > > > large number of topics.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > KIP:
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > JIRA:
> > > > > > https://issues.apache.org/jira/browse/KAFKA-8904
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > Thoughts and feedback would be appreciated.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > > >> > > > Brian
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> --
> > > > > > > > > > > > >> -- Guozhang
> > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > -- Guozhang
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Brian Byrne <bb...@confluent.io>.
Hi Guozhang,

You're correct in that, with the current defined default values, it's
likely we'll be more proactive in refreshing metadata for
once/rarely-touched topics, which isn't ideal. It'd be at most 2x the rate,
but we can certainly do better. I made the default eviction rate to mirror
the current hard-coded value in effort to minimize surprise, but perhaps
there's a more suitable middle ground.

Your mental model is good. I also agree that having a more aggressive
eviction rate, or at least one that's a fair amount shorter than the topic
refresh rate, behaves better for the once/rarely-touched topics and is more
appropriate overall. One thing to note though is that, if metadata is
evicted + refetched, there'll be a small latency bubble for the producer.

The concern I had with shortening the eviction rate dramatically was that
it could have unexpected consequences for clusters that keep the default,
but whose P1 is slightly longer than the value. For example, if the
metadata.evict.ms is 30 seconds, a periodic access rate of 40 seconds could
request the topic metadata up to 7 times during a 5-minute interval. Of
course, this is easily fixed by setting the eviction rate appropriately by
the client, but perhaps lowering the default to a moderate ~2 minutes might
be a better balance to avoid surprise of worst-case behavior.

Something else that could be considered is defining a "soft topic metadata
cache size", which would only evict unused topics (last used >=
metadata.evict.ms) if the soft cache size was exceeded or the metadata was
stale. However, it still doesn't save the client from degenerate behavior
where P1 + e ~= metadata.evict.ms, so I'm more partial to keeping the
eviction rate to ~2+ minutes.

Yet another option is to have topics in the "U" set (i.e. topics requesting
non-urgent metadata update) exclude those that are nearing their eviction
duration. I fear this might be over-engineering a potential non-problem
though. Thoughts?

Thanks,
Brian



On Thu, Jan 2, 2020 at 11:44 AM Guozhang Wang <wa...@gmail.com> wrote:

> Hello Brain,
>
> For the newly added `metadata.evict.ms` config, since its default value is
> set as the same as `metadata.max.age.ms` and most users would not override
> default config values, would it be possible for scenarios that producer
> send to a specific topic once and never send again, we may end up
> refreshing the metadata that are about to be removed soon? In other words
> I'm wondering if for common cases the eviction period should be larger than
> max.age or vice versa. This is my mental model:
>
> 1) If we evict a topic eagerly compared with refreshing it, e.g. after time
> T1 has elapsed, and then after T2 (> T1) elapsed that topic is requested
> for sending again, we would force a metadata update which includes this
> topic (remember we now always batch a set of topics in each metadata
> request)
> 2) if we evict a topic more lazily compared with refreshing it, say after
> T2 has elapsed, but before that after T1 (< T2) we have to refresh it
> already, it is included in that metadata request.
>
> So from the number of metadata requests it may incur, evicting more eagerly
> seems okay --- i.e. even if we do not evict this topic, it would still
> cause a metadata request after the max.age.ms anyways so evicting it early
> does not make things worse. More generally, if we know a topic's sending
> frequency is P1 (i.e. every P1 seconds we send once for that topic), and
> eviction period is P2, and the metadata.max.age is P3. Then the optimal
> scenario in terms of metadata update frequency seems to be P1 = P2 < P3. So
> I feel making the default value for metadata.evict.ms smaller is fine,
> e.g.
> say 1minute or 30seconds. WDYT?
>
>
> Guozhang
>
> On Thu, Dec 26, 2019 at 7:08 AM Brian Byrne <bb...@confluent.io> wrote:
>
> > Hi Stanislav,
> >
> > Appreciate the feedback!
> >
> > 1. You're correct. I've added notes to the KIP to clarify.
> >
> > 2. Yes it should. Fixed.
> >
> > 3. So I made a mistake when generalizing the target refresh size, which
> > should have been using `metadata.max.age.ms` instead of `
> metadata.evict.ms
> > `.
> > Therefore, `metadata.max.age.ms` still controls the target refresh time
> of
> > active topic metadata, and doesn't change in that regard. You're right in
> > that the U set will make metadata refreshing more proactive in some
> cases,
> > which is done to smooth out the metadata RPCs over time and lessen the
> > potential bulk of some of the RPC responses by working in smaller units.
> > The T set is what logically exists today and doesn't change.
> >
> > Thanks,
> > Brian
> >
> >
> >
> >
> >
> > On Thu, Dec 26, 2019 at 2:16 AM Stanislav Kozlovski <
> > stanislav@confluent.io>
> > wrote:
> >
> > > Hey Brian,
> > >
> > > 1. Could we more explicitly clarify the behavior of the algorithm when
> > `|T|
> > > > TARGET_METADATA_FETCH SIZE` ? I assume we ignore the config in that
> > > scenario
> > > 2. Should `targetMetadataFetchSize = Math.max(topicsPerSec / 10, 20)`
> be
> > > `topicsPerSec * 10` ?
> > > 3. When is this new algorithm applied? To confirm my understanding -
> what
> > > is the behavior of `metadata.max.age.ms` after this KIP? Are we
> adding a
> > > new, more proactive metadata fetch for topics in |U|?
> > >
> > > Thanks,
> > > Stanislav
> > >
> > > On Thu, Dec 19, 2019 at 11:37 PM Brian Byrne <bb...@confluent.io>
> > wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > For all interested, please take a look at the proposed algorithm as
> I'd
> > > > like to get more feedback. I'll call for a vote once the break is
> over.
> > > >
> > > > Thanks,
> > > > Brian
> > > >
> > > > On Mon, Dec 9, 2019 at 10:18 PM Guozhang Wang <wa...@gmail.com>
> > > wrote:
> > > >
> > > > > Sounds good, I agree that should not make a big difference in
> > practice.
> > > > >
> > > > > On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne <bb...@confluent.io>
> > > wrote:
> > > > >
> > > > > > Hi Guozhang,
> > > > > >
> > > > > > I see, we agree on the topic threshold not applying to urgent
> > topics,
> > > > but
> > > > > > differ slightly on what should be considered urgent. I would
> argue
> > > that
> > > > > we
> > > > > > should consider topics nearing the metadata.max.age.ms to be
> > urgent
> > > > > since
> > > > > > they may still be well within the metadata.expiry.ms. That is,
> the
> > > > > client
> > > > > > still considers these topics to be relevant (not expired), but
> > > doesn't
> > > > > want
> > > > > > to incur the latency bubble of having to wait for the metadata to
> > be
> > > > > > re-fetched if it's stale. This could be a frequent case if the
> > > > > > metadata.max.age.ms << metadata.expiry.ms.
> > > > > >
> > > > > > In practice, I wouldn't expect this to make a noticeable
> difference
> > > so
> > > > I
> > > > > > don't have a strong leaning, but the current behavior today is to
> > > > > > aggressively refresh the metadata of stale topics by ensuring a
> > > refresh
> > > > > is
> > > > > > triggered before that metadata.max.age.ms duration elapses.
> > > > > >
> > > > > > Thanks,
> > > > > > Brian
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang <
> wangguoz@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hello Brian,
> > > > > > >
> > > > > > > Thanks for your explanation, could you then update the wiki
> page
> > > for
> > > > > the
> > > > > > > algorithm part since when I read it, I thought it was different
> > > from
> > > > > the
> > > > > > > above, e.g. urgent topics should not be added just because of
> > > max.age
> > > > > > > expiration, but should only be added if there are sending data
> > > > pending.
> > > > > > >
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne <
> bbyrne@confluent.io
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Guozhang,
> > > > > > > >
> > > > > > > > Thanks for the feedback!
> > > > > > > >
> > > > > > > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <
> > wangguoz@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > 1. The addition of *metadata.expiry.ms <
> > > > http://metadata.expiry.ms>
> > > > > > > > *should
> > > > > > > > > be included in the public interface. Also its semantics
> needs
> > > > more
> > > > > > > > > clarification (since previously it is hard-coded internally
> > we
> > > do
> > > > > not
> > > > > > > > need
> > > > > > > > > to explain it publicly, but now with the configurable value
> > we
> > > do
> > > > > > > need).
> > > > > > > > >
> > > > > > > >
> > > > > > > > This was an oversight. Done.
> > > > > > > >
> > > > > > > >
> > > > > > > > > 2. There are a couple of hard-coded parameters like 25 and
> > 0.5
> > > in
> > > > > the
> > > > > > > > > proposal, maybe we need to explain why these magic values
> > makes
> > > > > sense
> > > > > > > in
> > > > > > > > > common scenarios.
> > > > > > > > >
> > > > > > > >
> > > > > > > > So these are pretty fuzzy numbers, and seemed to be a decent
> > > > balance
> > > > > > > > between trade-offs. I've updated the target size to account
> for
> > > > > setups
> > > > > > > with
> > > > > > > > a large number of topics or a shorter refresh time, as well
> as
> > > > added
> > > > > > some
> > > > > > > > light rationale.
> > > > > > > >
> > > > > > > >
> > > > > > > > > 3. In the Urgent set condition, do you actually mean "with
> no
> > > > > cached
> > > > > > > > > metadata AND there are existing data buffered for the
> topic"?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes, fixed.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > One concern I have is whether or not we may introduce a
> > > > regression,
> > > > > > > > > especially during producer startup such that since we only
> > > > require
> > > > > up
> > > > > > > to
> > > > > > > > 25
> > > > > > > > > topics each request, it may cause the send data to be
> > buffered
> > > > more
> > > > > > > time
> > > > > > > > > than now due to metadata not available. I understand this
> is
> > a
> > > > > > > > acknowledged
> > > > > > > > > trade-off in our design but any regression that may surface
> > to
> > > > > users
> > > > > > > need
> > > > > > > > > to be very carefully considered. I'm wondering, e.g. if we
> > can
> > > > > tweak
> > > > > > > our
> > > > > > > > > algorithm for the Urgent set, e.g. to consider those with
> non
> > > > > cached
> > > > > > > > > metadata have higher priority than those who have elapsed
> > > max.age
> > > > > but
> > > > > > > not
> > > > > > > > > yet have been called for sending. More specifically:
> > > > > > > > >
> > > > > > > > > Urgent: topics that have been requested for sending but no
> > > cached
> > > > > > > > metadata,
> > > > > > > > > and topics that have send request failed with e.g.
> > NOT_LEADER.
> > > > > > > > > Non-urgent: topics that are not in Urgent but have expired
> > > > max.age.
> > > > > > > > >
> > > > > > > > > Then when sending metadata, we always send ALL in the
> urgent
> > > > (i.e.
> > > > > > > ignore
> > > > > > > > > the size limit), and only when they do not exceed the size
> > > limit,
> > > > > > > > consider
> > > > > > > > > fill in more topics from Non-urgent up to the size limit.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I think we're on the same page here. Urgent topics ignore the
> > > > target
> > > > > > > > metadata RPC size, and are not bound by it in any way, i.e.
> if
> > > > > there's
> > > > > > > 100
> > > > > > > > urgent topics, we'll fetch all 100 in a single RPC. Like you
> > say,
> > > > > > > however,
> > > > > > > > if a topic transitions to urgent and there's several
> non-urgent
> > > > ones,
> > > > > > > we'll
> > > > > > > > piggyback the non-urgent updates up to the target size.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >  Brian
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Nov 20, 2019 at 7:00 PM deng ziming <
> > > > > dengziming1993@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I think it's ok, and you can add another issue about
> > > > > `asynchronous
> > > > > > > > > > metadata` if `topic expiry` is not enough.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <
> > > > bbyrne@confluent.io
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hello all,
> > > > > > > > > > >
> > > > > > > > > > > I've refactored the KIP to remove implementing
> > asynchronous
> > > > > > > metadata
> > > > > > > > > > > fetching in the producer during send(). It's now
> > > exclusively
> > > > > > > focused
> > > > > > > > on
> > > > > > > > > > > reducing the topic metadata fetch payload and proposes
> > > > adding a
> > > > > > new
> > > > > > > > > > > configuration flag to control topic expiry behavior.
> > Please
> > > > > take
> > > > > > a
> > > > > > > > look
> > > > > > > > > > > when possible.
> > > > > > > > > > >
> > > > > > > > > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > > > > > > > >
> > > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Brian
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <
> > > > > bbyrne@confluent.io
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Lucas, Guozhang,
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you for the comments. Good point on
> > > > > > > METADATA_MAX_AGE_CONFIG -
> > > > > > > > > it
> > > > > > > > > > > > looks like the ProducerMetadata was differentiating
> > > between
> > > > > > > expiry
> > > > > > > > > and
> > > > > > > > > > > > refresh, but it should be unnecessary to do so once
> the
> > > > cost
> > > > > of
> > > > > > > > > > fetching
> > > > > > > > > > > a
> > > > > > > > > > > > single topic's metadata is reduced.
> > > > > > > > > > > >
> > > > > > > > > > > > I've updated the rejected alternatives and removed
> the
> > > > config
> > > > > > > > > > variables.
> > > > > > > > > > > >
> > > > > > > > > > > > Brian
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <
> > > > > > wangguoz@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> Hello Brian,
> > > > > > > > > > > >>
> > > > > > > > > > > >> Thanks for the KIP.
> > > > > > > > > > > >>
> > > > > > > > > > > >> I think using asynchronous metadata update to
> address
> > 1)
> > > > > > > metadata
> > > > > > > > > > update
> > > > > > > > > > > >> blocking send, but for other issues, currently at
> > > producer
> > > > > we
> > > > > > do
> > > > > > > > > have
> > > > > > > > > > a
> > > > > > > > > > > >> configurable `METADATA_MAX_AGE_CONFIG` similar to
> > > > consumer,
> > > > > by
> > > > > > > > > default
> > > > > > > > > > > is
> > > > > > > > > > > >> 5min. So maybe we do not need to introduce new
> configs
> > > > here,
> > > > > > but
> > > > > > > > > only
> > > > > > > > > > > >> change the semantics of that config from global
> expiry
> > > > > (today
> > > > > > we
> > > > > > > > > just
> > > > > > > > > > > >> enforce a full metadata update for the whole
> cluster)
> > to
> > > > > > > > > single-topic
> > > > > > > > > > > >> expiry, and we can also extend its expiry deadline
> > > > whenever
> > > > > > that
> > > > > > > > > > > metadata
> > > > > > > > > > > >> is successfully used to send a produce request.
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> Guozhang
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <
> > > > > > > > lucas@confluent.io
> > > > > > > > > >
> > > > > > > > > > > >> wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >> > Hi Brian,
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > This looks great, and should help reduce blocking
> > and
> > > > high
> > > > > > > > > metadata
> > > > > > > > > > > >> request
> > > > > > > > > > > >> > volumes when the producer is sending to large
> > numbers
> > > of
> > > > > > > topics,
> > > > > > > > > > > >> especially
> > > > > > > > > > > >> > at low volumes. I think the approach to make
> > metadata
> > > > > > fetching
> > > > > > > > > > > >> asynchronous
> > > > > > > > > > > >> > and batch metadata requests together will help
> > > > > > significantly.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > The only other approach I can think of is to allow
> > > users
> > > > > to
> > > > > > > > supply
> > > > > > > > > > the
> > > > > > > > > > > >> > producer with the expected topics upfront,
> allowing
> > > the
> > > > > > > producer
> > > > > > > > > to
> > > > > > > > > > > >> perform
> > > > > > > > > > > >> > a single initial metadata request before any sends
> > > > occur.
> > > > > I
> > > > > > > see
> > > > > > > > no
> > > > > > > > > > > real
> > > > > > > > > > > >> > advantages to this approach compared to the async
> > > method
> > > > > > > you’ve
> > > > > > > > > > > >> proposed,
> > > > > > > > > > > >> > but maybe we could add it to the rejected
> > alternatives
> > > > > > > section?
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Thanks,
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Lucas
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <
> > > > > > > bbyrne@confluent.io>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > > I've updated the 'Proposed Changes' to include
> two
> > > new
> > > > > > > > producer
> > > > > > > > > > > >> > > configuration variables: topic.expiry.ms and
> > > > > > > topic.refresh.ms
> > > > > > > > .
> > > > > > > > > > > Please
> > > > > > > > > > > >> > take
> > > > > > > > > > > >> > > a look.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > Thanks,
> > > > > > > > > > > >> > > Brian
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <
> > > > > > > > > bbyrne@confluent.io
> > > > > > > > > > >
> > > > > > > > > > > >> > wrote:
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > > Dev team,
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > Requesting discussion for improvement to the
> > > > producer
> > > > > > when
> > > > > > > > > > dealing
> > > > > > > > > > > >> > with a
> > > > > > > > > > > >> > > > large number of topics.
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > KIP:
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> >
> > > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > JIRA:
> > > > > https://issues.apache.org/jira/browse/KAFKA-8904
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > Thoughts and feedback would be appreciated.
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > >> > > > Brian
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> >
> > > > > > > > > > > >>
> > > > > > > > > > > >>
> > > > > > > > > > > >> --
> > > > > > > > > > > >> -- Guozhang
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > -- Guozhang
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Brain,

For the newly added `metadata.evict.ms` config, since its default value is
set as the same as `metadata.max.age.ms` and most users would not override
default config values, would it be possible for scenarios that producer
send to a specific topic once and never send again, we may end up
refreshing the metadata that are about to be removed soon? In other words
I'm wondering if for common cases the eviction period should be larger than
max.age or vice versa. This is my mental model:

1) If we evict a topic eagerly compared with refreshing it, e.g. after time
T1 has elapsed, and then after T2 (> T1) elapsed that topic is requested
for sending again, we would force a metadata update which includes this
topic (remember we now always batch a set of topics in each metadata
request)
2) if we evict a topic more lazily compared with refreshing it, say after
T2 has elapsed, but before that after T1 (< T2) we have to refresh it
already, it is included in that metadata request.

So from the number of metadata requests it may incur, evicting more eagerly
seems okay --- i.e. even if we do not evict this topic, it would still
cause a metadata request after the max.age.ms anyways so evicting it early
does not make things worse. More generally, if we know a topic's sending
frequency is P1 (i.e. every P1 seconds we send once for that topic), and
eviction period is P2, and the metadata.max.age is P3. Then the optimal
scenario in terms of metadata update frequency seems to be P1 = P2 < P3. So
I feel making the default value for metadata.evict.ms smaller is fine, e.g.
say 1minute or 30seconds. WDYT?


Guozhang

On Thu, Dec 26, 2019 at 7:08 AM Brian Byrne <bb...@confluent.io> wrote:

> Hi Stanislav,
>
> Appreciate the feedback!
>
> 1. You're correct. I've added notes to the KIP to clarify.
>
> 2. Yes it should. Fixed.
>
> 3. So I made a mistake when generalizing the target refresh size, which
> should have been using `metadata.max.age.ms` instead of `metadata.evict.ms
> `.
> Therefore, `metadata.max.age.ms` still controls the target refresh time of
> active topic metadata, and doesn't change in that regard. You're right in
> that the U set will make metadata refreshing more proactive in some cases,
> which is done to smooth out the metadata RPCs over time and lessen the
> potential bulk of some of the RPC responses by working in smaller units.
> The T set is what logically exists today and doesn't change.
>
> Thanks,
> Brian
>
>
>
>
>
> On Thu, Dec 26, 2019 at 2:16 AM Stanislav Kozlovski <
> stanislav@confluent.io>
> wrote:
>
> > Hey Brian,
> >
> > 1. Could we more explicitly clarify the behavior of the algorithm when
> `|T|
> > > TARGET_METADATA_FETCH SIZE` ? I assume we ignore the config in that
> > scenario
> > 2. Should `targetMetadataFetchSize = Math.max(topicsPerSec / 10, 20)` be
> > `topicsPerSec * 10` ?
> > 3. When is this new algorithm applied? To confirm my understanding - what
> > is the behavior of `metadata.max.age.ms` after this KIP? Are we adding a
> > new, more proactive metadata fetch for topics in |U|?
> >
> > Thanks,
> > Stanislav
> >
> > On Thu, Dec 19, 2019 at 11:37 PM Brian Byrne <bb...@confluent.io>
> wrote:
> >
> > > Hello everyone,
> > >
> > > For all interested, please take a look at the proposed algorithm as I'd
> > > like to get more feedback. I'll call for a vote once the break is over.
> > >
> > > Thanks,
> > > Brian
> > >
> > > On Mon, Dec 9, 2019 at 10:18 PM Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > > > Sounds good, I agree that should not make a big difference in
> practice.
> > > >
> > > > On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne <bb...@confluent.io>
> > wrote:
> > > >
> > > > > Hi Guozhang,
> > > > >
> > > > > I see, we agree on the topic threshold not applying to urgent
> topics,
> > > but
> > > > > differ slightly on what should be considered urgent. I would argue
> > that
> > > > we
> > > > > should consider topics nearing the metadata.max.age.ms to be
> urgent
> > > > since
> > > > > they may still be well within the metadata.expiry.ms. That is, the
> > > > client
> > > > > still considers these topics to be relevant (not expired), but
> > doesn't
> > > > want
> > > > > to incur the latency bubble of having to wait for the metadata to
> be
> > > > > re-fetched if it's stale. This could be a frequent case if the
> > > > > metadata.max.age.ms << metadata.expiry.ms.
> > > > >
> > > > > In practice, I wouldn't expect this to make a noticeable difference
> > so
> > > I
> > > > > don't have a strong leaning, but the current behavior today is to
> > > > > aggressively refresh the metadata of stale topics by ensuring a
> > refresh
> > > > is
> > > > > triggered before that metadata.max.age.ms duration elapses.
> > > > >
> > > > > Thanks,
> > > > > Brian
> > > > >
> > > > >
> > > > > On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang <wa...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hello Brian,
> > > > > >
> > > > > > Thanks for your explanation, could you then update the wiki page
> > for
> > > > the
> > > > > > algorithm part since when I read it, I thought it was different
> > from
> > > > the
> > > > > > above, e.g. urgent topics should not be added just because of
> > max.age
> > > > > > expiration, but should only be added if there are sending data
> > > pending.
> > > > > >
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne <bbyrne@confluent.io
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi Guozhang,
> > > > > > >
> > > > > > > Thanks for the feedback!
> > > > > > >
> > > > > > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <
> wangguoz@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > 1. The addition of *metadata.expiry.ms <
> > > http://metadata.expiry.ms>
> > > > > > > *should
> > > > > > > > be included in the public interface. Also its semantics needs
> > > more
> > > > > > > > clarification (since previously it is hard-coded internally
> we
> > do
> > > > not
> > > > > > > need
> > > > > > > > to explain it publicly, but now with the configurable value
> we
> > do
> > > > > > need).
> > > > > > > >
> > > > > > >
> > > > > > > This was an oversight. Done.
> > > > > > >
> > > > > > >
> > > > > > > > 2. There are a couple of hard-coded parameters like 25 and
> 0.5
> > in
> > > > the
> > > > > > > > proposal, maybe we need to explain why these magic values
> makes
> > > > sense
> > > > > > in
> > > > > > > > common scenarios.
> > > > > > > >
> > > > > > >
> > > > > > > So these are pretty fuzzy numbers, and seemed to be a decent
> > > balance
> > > > > > > between trade-offs. I've updated the target size to account for
> > > > setups
> > > > > > with
> > > > > > > a large number of topics or a shorter refresh time, as well as
> > > added
> > > > > some
> > > > > > > light rationale.
> > > > > > >
> > > > > > >
> > > > > > > > 3. In the Urgent set condition, do you actually mean "with no
> > > > cached
> > > > > > > > metadata AND there are existing data buffered for the topic"?
> > > > > > > >
> > > > > > >
> > > > > > > Yes, fixed.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > One concern I have is whether or not we may introduce a
> > > regression,
> > > > > > > > especially during producer startup such that since we only
> > > require
> > > > up
> > > > > > to
> > > > > > > 25
> > > > > > > > topics each request, it may cause the send data to be
> buffered
> > > more
> > > > > > time
> > > > > > > > than now due to metadata not available. I understand this is
> a
> > > > > > > acknowledged
> > > > > > > > trade-off in our design but any regression that may surface
> to
> > > > users
> > > > > > need
> > > > > > > > to be very carefully considered. I'm wondering, e.g. if we
> can
> > > > tweak
> > > > > > our
> > > > > > > > algorithm for the Urgent set, e.g. to consider those with non
> > > > cached
> > > > > > > > metadata have higher priority than those who have elapsed
> > max.age
> > > > but
> > > > > > not
> > > > > > > > yet have been called for sending. More specifically:
> > > > > > > >
> > > > > > > > Urgent: topics that have been requested for sending but no
> > cached
> > > > > > > metadata,
> > > > > > > > and topics that have send request failed with e.g.
> NOT_LEADER.
> > > > > > > > Non-urgent: topics that are not in Urgent but have expired
> > > max.age.
> > > > > > > >
> > > > > > > > Then when sending metadata, we always send ALL in the urgent
> > > (i.e.
> > > > > > ignore
> > > > > > > > the size limit), and only when they do not exceed the size
> > limit,
> > > > > > > consider
> > > > > > > > fill in more topics from Non-urgent up to the size limit.
> > > > > > > >
> > > > > > >
> > > > > > > I think we're on the same page here. Urgent topics ignore the
> > > target
> > > > > > > metadata RPC size, and are not bound by it in any way, i.e. if
> > > > there's
> > > > > > 100
> > > > > > > urgent topics, we'll fetch all 100 in a single RPC. Like you
> say,
> > > > > > however,
> > > > > > > if a topic transitions to urgent and there's several non-urgent
> > > ones,
> > > > > > we'll
> > > > > > > piggyback the non-urgent updates up to the target size.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >  Brian
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Nov 20, 2019 at 7:00 PM deng ziming <
> > > > dengziming1993@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I think it's ok, and you can add another issue about
> > > > `asynchronous
> > > > > > > > > metadata` if `topic expiry` is not enough.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <
> > > bbyrne@confluent.io
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hello all,
> > > > > > > > > >
> > > > > > > > > > I've refactored the KIP to remove implementing
> asynchronous
> > > > > > metadata
> > > > > > > > > > fetching in the producer during send(). It's now
> > exclusively
> > > > > > focused
> > > > > > > on
> > > > > > > > > > reducing the topic metadata fetch payload and proposes
> > > adding a
> > > > > new
> > > > > > > > > > configuration flag to control topic expiry behavior.
> Please
> > > > take
> > > > > a
> > > > > > > look
> > > > > > > > > > when possible.
> > > > > > > > > >
> > > > > > > > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > > > > > > >
> > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Brian
> > > > > > > > > >
> > > > > > > > > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <
> > > > bbyrne@confluent.io
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Lucas, Guozhang,
> > > > > > > > > > >
> > > > > > > > > > > Thank you for the comments. Good point on
> > > > > > METADATA_MAX_AGE_CONFIG -
> > > > > > > > it
> > > > > > > > > > > looks like the ProducerMetadata was differentiating
> > between
> > > > > > expiry
> > > > > > > > and
> > > > > > > > > > > refresh, but it should be unnecessary to do so once the
> > > cost
> > > > of
> > > > > > > > > fetching
> > > > > > > > > > a
> > > > > > > > > > > single topic's metadata is reduced.
> > > > > > > > > > >
> > > > > > > > > > > I've updated the rejected alternatives and removed the
> > > config
> > > > > > > > > variables.
> > > > > > > > > > >
> > > > > > > > > > > Brian
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <
> > > > > wangguoz@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> Hello Brian,
> > > > > > > > > > >>
> > > > > > > > > > >> Thanks for the KIP.
> > > > > > > > > > >>
> > > > > > > > > > >> I think using asynchronous metadata update to address
> 1)
> > > > > > metadata
> > > > > > > > > update
> > > > > > > > > > >> blocking send, but for other issues, currently at
> > producer
> > > > we
> > > > > do
> > > > > > > > have
> > > > > > > > > a
> > > > > > > > > > >> configurable `METADATA_MAX_AGE_CONFIG` similar to
> > > consumer,
> > > > by
> > > > > > > > default
> > > > > > > > > > is
> > > > > > > > > > >> 5min. So maybe we do not need to introduce new configs
> > > here,
> > > > > but
> > > > > > > > only
> > > > > > > > > > >> change the semantics of that config from global expiry
> > > > (today
> > > > > we
> > > > > > > > just
> > > > > > > > > > >> enforce a full metadata update for the whole cluster)
> to
> > > > > > > > single-topic
> > > > > > > > > > >> expiry, and we can also extend its expiry deadline
> > > whenever
> > > > > that
> > > > > > > > > > metadata
> > > > > > > > > > >> is successfully used to send a produce request.
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> Guozhang
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <
> > > > > > > lucas@confluent.io
> > > > > > > > >
> > > > > > > > > > >> wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> > Hi Brian,
> > > > > > > > > > >> >
> > > > > > > > > > >> > This looks great, and should help reduce blocking
> and
> > > high
> > > > > > > > metadata
> > > > > > > > > > >> request
> > > > > > > > > > >> > volumes when the producer is sending to large
> numbers
> > of
> > > > > > topics,
> > > > > > > > > > >> especially
> > > > > > > > > > >> > at low volumes. I think the approach to make
> metadata
> > > > > fetching
> > > > > > > > > > >> asynchronous
> > > > > > > > > > >> > and batch metadata requests together will help
> > > > > significantly.
> > > > > > > > > > >> >
> > > > > > > > > > >> > The only other approach I can think of is to allow
> > users
> > > > to
> > > > > > > supply
> > > > > > > > > the
> > > > > > > > > > >> > producer with the expected topics upfront, allowing
> > the
> > > > > > producer
> > > > > > > > to
> > > > > > > > > > >> perform
> > > > > > > > > > >> > a single initial metadata request before any sends
> > > occur.
> > > > I
> > > > > > see
> > > > > > > no
> > > > > > > > > > real
> > > > > > > > > > >> > advantages to this approach compared to the async
> > method
> > > > > > you’ve
> > > > > > > > > > >> proposed,
> > > > > > > > > > >> > but maybe we could add it to the rejected
> alternatives
> > > > > > section?
> > > > > > > > > > >> >
> > > > > > > > > > >> > Thanks,
> > > > > > > > > > >> >
> > > > > > > > > > >> > Lucas
> > > > > > > > > > >> >
> > > > > > > > > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <
> > > > > > bbyrne@confluent.io>
> > > > > > > > > > wrote:
> > > > > > > > > > >> >
> > > > > > > > > > >> > > I've updated the 'Proposed Changes' to include two
> > new
> > > > > > > producer
> > > > > > > > > > >> > > configuration variables: topic.expiry.ms and
> > > > > > topic.refresh.ms
> > > > > > > .
> > > > > > > > > > Please
> > > > > > > > > > >> > take
> > > > > > > > > > >> > > a look.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Thanks,
> > > > > > > > > > >> > > Brian
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <
> > > > > > > > bbyrne@confluent.io
> > > > > > > > > >
> > > > > > > > > > >> > wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > > Dev team,
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Requesting discussion for improvement to the
> > > producer
> > > > > when
> > > > > > > > > dealing
> > > > > > > > > > >> > with a
> > > > > > > > > > >> > > > large number of topics.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > KIP:
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > JIRA:
> > > > https://issues.apache.org/jira/browse/KAFKA-8904
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Thoughts and feedback would be appreciated.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > >> > > > Brian
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> --
> > > > > > > > > > >> -- Guozhang
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > -- Guozhang
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Brian Byrne <bb...@confluent.io>.
Hi Stanislav,

Appreciate the feedback!

1. You're correct. I've added notes to the KIP to clarify.

2. Yes it should. Fixed.

3. So I made a mistake when generalizing the target refresh size, which
should have been using `metadata.max.age.ms` instead of `metadata.evict.ms`.
Therefore, `metadata.max.age.ms` still controls the target refresh time of
active topic metadata, and doesn't change in that regard. You're right in
that the U set will make metadata refreshing more proactive in some cases,
which is done to smooth out the metadata RPCs over time and lessen the
potential bulk of some of the RPC responses by working in smaller units.
The T set is what logically exists today and doesn't change.

Thanks,
Brian





On Thu, Dec 26, 2019 at 2:16 AM Stanislav Kozlovski <st...@confluent.io>
wrote:

> Hey Brian,
>
> 1. Could we more explicitly clarify the behavior of the algorithm when `|T|
> > TARGET_METADATA_FETCH SIZE` ? I assume we ignore the config in that
> scenario
> 2. Should `targetMetadataFetchSize = Math.max(topicsPerSec / 10, 20)` be
> `topicsPerSec * 10` ?
> 3. When is this new algorithm applied? To confirm my understanding - what
> is the behavior of `metadata.max.age.ms` after this KIP? Are we adding a
> new, more proactive metadata fetch for topics in |U|?
>
> Thanks,
> Stanislav
>
> On Thu, Dec 19, 2019 at 11:37 PM Brian Byrne <bb...@confluent.io> wrote:
>
> > Hello everyone,
> >
> > For all interested, please take a look at the proposed algorithm as I'd
> > like to get more feedback. I'll call for a vote once the break is over.
> >
> > Thanks,
> > Brian
> >
> > On Mon, Dec 9, 2019 at 10:18 PM Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> > > Sounds good, I agree that should not make a big difference in practice.
> > >
> > > On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne <bb...@confluent.io>
> wrote:
> > >
> > > > Hi Guozhang,
> > > >
> > > > I see, we agree on the topic threshold not applying to urgent topics,
> > but
> > > > differ slightly on what should be considered urgent. I would argue
> that
> > > we
> > > > should consider topics nearing the metadata.max.age.ms to be urgent
> > > since
> > > > they may still be well within the metadata.expiry.ms. That is, the
> > > client
> > > > still considers these topics to be relevant (not expired), but
> doesn't
> > > want
> > > > to incur the latency bubble of having to wait for the metadata to be
> > > > re-fetched if it's stale. This could be a frequent case if the
> > > > metadata.max.age.ms << metadata.expiry.ms.
> > > >
> > > > In practice, I wouldn't expect this to make a noticeable difference
> so
> > I
> > > > don't have a strong leaning, but the current behavior today is to
> > > > aggressively refresh the metadata of stale topics by ensuring a
> refresh
> > > is
> > > > triggered before that metadata.max.age.ms duration elapses.
> > > >
> > > > Thanks,
> > > > Brian
> > > >
> > > >
> > > > On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang <wa...@gmail.com>
> > > wrote:
> > > >
> > > > > Hello Brian,
> > > > >
> > > > > Thanks for your explanation, could you then update the wiki page
> for
> > > the
> > > > > algorithm part since when I read it, I thought it was different
> from
> > > the
> > > > > above, e.g. urgent topics should not be added just because of
> max.age
> > > > > expiration, but should only be added if there are sending data
> > pending.
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne <bb...@confluent.io>
> > > wrote:
> > > > >
> > > > > > Hi Guozhang,
> > > > > >
> > > > > > Thanks for the feedback!
> > > > > >
> > > > > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <wangguoz@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > 1. The addition of *metadata.expiry.ms <
> > http://metadata.expiry.ms>
> > > > > > *should
> > > > > > > be included in the public interface. Also its semantics needs
> > more
> > > > > > > clarification (since previously it is hard-coded internally we
> do
> > > not
> > > > > > need
> > > > > > > to explain it publicly, but now with the configurable value we
> do
> > > > > need).
> > > > > > >
> > > > > >
> > > > > > This was an oversight. Done.
> > > > > >
> > > > > >
> > > > > > > 2. There are a couple of hard-coded parameters like 25 and 0.5
> in
> > > the
> > > > > > > proposal, maybe we need to explain why these magic values makes
> > > sense
> > > > > in
> > > > > > > common scenarios.
> > > > > > >
> > > > > >
> > > > > > So these are pretty fuzzy numbers, and seemed to be a decent
> > balance
> > > > > > between trade-offs. I've updated the target size to account for
> > > setups
> > > > > with
> > > > > > a large number of topics or a shorter refresh time, as well as
> > added
> > > > some
> > > > > > light rationale.
> > > > > >
> > > > > >
> > > > > > > 3. In the Urgent set condition, do you actually mean "with no
> > > cached
> > > > > > > metadata AND there are existing data buffered for the topic"?
> > > > > > >
> > > > > >
> > > > > > Yes, fixed.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > One concern I have is whether or not we may introduce a
> > regression,
> > > > > > > especially during producer startup such that since we only
> > require
> > > up
> > > > > to
> > > > > > 25
> > > > > > > topics each request, it may cause the send data to be buffered
> > more
> > > > > time
> > > > > > > than now due to metadata not available. I understand this is a
> > > > > > acknowledged
> > > > > > > trade-off in our design but any regression that may surface to
> > > users
> > > > > need
> > > > > > > to be very carefully considered. I'm wondering, e.g. if we can
> > > tweak
> > > > > our
> > > > > > > algorithm for the Urgent set, e.g. to consider those with non
> > > cached
> > > > > > > metadata have higher priority than those who have elapsed
> max.age
> > > but
> > > > > not
> > > > > > > yet have been called for sending. More specifically:
> > > > > > >
> > > > > > > Urgent: topics that have been requested for sending but no
> cached
> > > > > > metadata,
> > > > > > > and topics that have send request failed with e.g. NOT_LEADER.
> > > > > > > Non-urgent: topics that are not in Urgent but have expired
> > max.age.
> > > > > > >
> > > > > > > Then when sending metadata, we always send ALL in the urgent
> > (i.e.
> > > > > ignore
> > > > > > > the size limit), and only when they do not exceed the size
> limit,
> > > > > > consider
> > > > > > > fill in more topics from Non-urgent up to the size limit.
> > > > > > >
> > > > > >
> > > > > > I think we're on the same page here. Urgent topics ignore the
> > target
> > > > > > metadata RPC size, and are not bound by it in any way, i.e. if
> > > there's
> > > > > 100
> > > > > > urgent topics, we'll fetch all 100 in a single RPC. Like you say,
> > > > > however,
> > > > > > if a topic transitions to urgent and there's several non-urgent
> > ones,
> > > > > we'll
> > > > > > piggyback the non-urgent updates up to the target size.
> > > > > >
> > > > > > Thanks,
> > > > > >  Brian
> > > > > >
> > > > > >
> > > > > > On Wed, Nov 20, 2019 at 7:00 PM deng ziming <
> > > dengziming1993@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I think it's ok, and you can add another issue about
> > > `asynchronous
> > > > > > > > metadata` if `topic expiry` is not enough.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <
> > bbyrne@confluent.io
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello all,
> > > > > > > > >
> > > > > > > > > I've refactored the KIP to remove implementing asynchronous
> > > > > metadata
> > > > > > > > > fetching in the producer during send(). It's now
> exclusively
> > > > > focused
> > > > > > on
> > > > > > > > > reducing the topic metadata fetch payload and proposes
> > adding a
> > > > new
> > > > > > > > > configuration flag to control topic expiry behavior. Please
> > > take
> > > > a
> > > > > > look
> > > > > > > > > when possible.
> > > > > > > > >
> > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > > > > > >
> > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Brian
> > > > > > > > >
> > > > > > > > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <
> > > bbyrne@confluent.io
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Lucas, Guozhang,
> > > > > > > > > >
> > > > > > > > > > Thank you for the comments. Good point on
> > > > > METADATA_MAX_AGE_CONFIG -
> > > > > > > it
> > > > > > > > > > looks like the ProducerMetadata was differentiating
> between
> > > > > expiry
> > > > > > > and
> > > > > > > > > > refresh, but it should be unnecessary to do so once the
> > cost
> > > of
> > > > > > > > fetching
> > > > > > > > > a
> > > > > > > > > > single topic's metadata is reduced.
> > > > > > > > > >
> > > > > > > > > > I've updated the rejected alternatives and removed the
> > config
> > > > > > > > variables.
> > > > > > > > > >
> > > > > > > > > > Brian
> > > > > > > > > >
> > > > > > > > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <
> > > > wangguoz@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Hello Brian,
> > > > > > > > > >>
> > > > > > > > > >> Thanks for the KIP.
> > > > > > > > > >>
> > > > > > > > > >> I think using asynchronous metadata update to address 1)
> > > > > metadata
> > > > > > > > update
> > > > > > > > > >> blocking send, but for other issues, currently at
> producer
> > > we
> > > > do
> > > > > > > have
> > > > > > > > a
> > > > > > > > > >> configurable `METADATA_MAX_AGE_CONFIG` similar to
> > consumer,
> > > by
> > > > > > > default
> > > > > > > > > is
> > > > > > > > > >> 5min. So maybe we do not need to introduce new configs
> > here,
> > > > but
> > > > > > > only
> > > > > > > > > >> change the semantics of that config from global expiry
> > > (today
> > > > we
> > > > > > > just
> > > > > > > > > >> enforce a full metadata update for the whole cluster) to
> > > > > > > single-topic
> > > > > > > > > >> expiry, and we can also extend its expiry deadline
> > whenever
> > > > that
> > > > > > > > > metadata
> > > > > > > > > >> is successfully used to send a produce request.
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> Guozhang
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <
> > > > > > lucas@confluent.io
> > > > > > > >
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Hi Brian,
> > > > > > > > > >> >
> > > > > > > > > >> > This looks great, and should help reduce blocking and
> > high
> > > > > > > metadata
> > > > > > > > > >> request
> > > > > > > > > >> > volumes when the producer is sending to large numbers
> of
> > > > > topics,
> > > > > > > > > >> especially
> > > > > > > > > >> > at low volumes. I think the approach to make metadata
> > > > fetching
> > > > > > > > > >> asynchronous
> > > > > > > > > >> > and batch metadata requests together will help
> > > > significantly.
> > > > > > > > > >> >
> > > > > > > > > >> > The only other approach I can think of is to allow
> users
> > > to
> > > > > > supply
> > > > > > > > the
> > > > > > > > > >> > producer with the expected topics upfront, allowing
> the
> > > > > producer
> > > > > > > to
> > > > > > > > > >> perform
> > > > > > > > > >> > a single initial metadata request before any sends
> > occur.
> > > I
> > > > > see
> > > > > > no
> > > > > > > > > real
> > > > > > > > > >> > advantages to this approach compared to the async
> method
> > > > > you’ve
> > > > > > > > > >> proposed,
> > > > > > > > > >> > but maybe we could add it to the rejected alternatives
> > > > > section?
> > > > > > > > > >> >
> > > > > > > > > >> > Thanks,
> > > > > > > > > >> >
> > > > > > > > > >> > Lucas
> > > > > > > > > >> >
> > > > > > > > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <
> > > > > bbyrne@confluent.io>
> > > > > > > > > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > I've updated the 'Proposed Changes' to include two
> new
> > > > > > producer
> > > > > > > > > >> > > configuration variables: topic.expiry.ms and
> > > > > topic.refresh.ms
> > > > > > .
> > > > > > > > > Please
> > > > > > > > > >> > take
> > > > > > > > > >> > > a look.
> > > > > > > > > >> > >
> > > > > > > > > >> > > Thanks,
> > > > > > > > > >> > > Brian
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <
> > > > > > > bbyrne@confluent.io
> > > > > > > > >
> > > > > > > > > >> > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > > Dev team,
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Requesting discussion for improvement to the
> > producer
> > > > when
> > > > > > > > dealing
> > > > > > > > > >> > with a
> > > > > > > > > >> > > > large number of topics.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > KIP:
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > JIRA:
> > > https://issues.apache.org/jira/browse/KAFKA-8904
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Thoughts and feedback would be appreciated.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Thanks,
> > > > > > > > > >> > > > Brian
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> -- Guozhang
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
> --
> Best,
> Stanislav
>

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Stanislav Kozlovski <st...@confluent.io>.
Hey Brian,

1. Could we more explicitly clarify the behavior of the algorithm when `|T|
> TARGET_METADATA_FETCH SIZE` ? I assume we ignore the config in that
scenario
2. Should `targetMetadataFetchSize = Math.max(topicsPerSec / 10, 20)` be
`topicsPerSec * 10` ?
3. When is this new algorithm applied? To confirm my understanding - what
is the behavior of `metadata.max.age.ms` after this KIP? Are we adding a
new, more proactive metadata fetch for topics in |U|?

Thanks,
Stanislav

On Thu, Dec 19, 2019 at 11:37 PM Brian Byrne <bb...@confluent.io> wrote:

> Hello everyone,
>
> For all interested, please take a look at the proposed algorithm as I'd
> like to get more feedback. I'll call for a vote once the break is over.
>
> Thanks,
> Brian
>
> On Mon, Dec 9, 2019 at 10:18 PM Guozhang Wang <wa...@gmail.com> wrote:
>
> > Sounds good, I agree that should not make a big difference in practice.
> >
> > On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne <bb...@confluent.io> wrote:
> >
> > > Hi Guozhang,
> > >
> > > I see, we agree on the topic threshold not applying to urgent topics,
> but
> > > differ slightly on what should be considered urgent. I would argue that
> > we
> > > should consider topics nearing the metadata.max.age.ms to be urgent
> > since
> > > they may still be well within the metadata.expiry.ms. That is, the
> > client
> > > still considers these topics to be relevant (not expired), but doesn't
> > want
> > > to incur the latency bubble of having to wait for the metadata to be
> > > re-fetched if it's stale. This could be a frequent case if the
> > > metadata.max.age.ms << metadata.expiry.ms.
> > >
> > > In practice, I wouldn't expect this to make a noticeable difference so
> I
> > > don't have a strong leaning, but the current behavior today is to
> > > aggressively refresh the metadata of stale topics by ensuring a refresh
> > is
> > > triggered before that metadata.max.age.ms duration elapses.
> > >
> > > Thanks,
> > > Brian
> > >
> > >
> > > On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > > > Hello Brian,
> > > >
> > > > Thanks for your explanation, could you then update the wiki page for
> > the
> > > > algorithm part since when I read it, I thought it was different from
> > the
> > > > above, e.g. urgent topics should not be added just because of max.age
> > > > expiration, but should only be added if there are sending data
> pending.
> > > >
> > > >
> > > > Guozhang
> > > >
> > > > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne <bb...@confluent.io>
> > wrote:
> > > >
> > > > > Hi Guozhang,
> > > > >
> > > > > Thanks for the feedback!
> > > > >
> > > > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <wa...@gmail.com>
> > > wrote:
> > > > >
> > > > > > 1. The addition of *metadata.expiry.ms <
> http://metadata.expiry.ms>
> > > > > *should
> > > > > > be included in the public interface. Also its semantics needs
> more
> > > > > > clarification (since previously it is hard-coded internally we do
> > not
> > > > > need
> > > > > > to explain it publicly, but now with the configurable value we do
> > > > need).
> > > > > >
> > > > >
> > > > > This was an oversight. Done.
> > > > >
> > > > >
> > > > > > 2. There are a couple of hard-coded parameters like 25 and 0.5 in
> > the
> > > > > > proposal, maybe we need to explain why these magic values makes
> > sense
> > > > in
> > > > > > common scenarios.
> > > > > >
> > > > >
> > > > > So these are pretty fuzzy numbers, and seemed to be a decent
> balance
> > > > > between trade-offs. I've updated the target size to account for
> > setups
> > > > with
> > > > > a large number of topics or a shorter refresh time, as well as
> added
> > > some
> > > > > light rationale.
> > > > >
> > > > >
> > > > > > 3. In the Urgent set condition, do you actually mean "with no
> > cached
> > > > > > metadata AND there are existing data buffered for the topic"?
> > > > > >
> > > > >
> > > > > Yes, fixed.
> > > > >
> > > > >
> > > > >
> > > > > > One concern I have is whether or not we may introduce a
> regression,
> > > > > > especially during producer startup such that since we only
> require
> > up
> > > > to
> > > > > 25
> > > > > > topics each request, it may cause the send data to be buffered
> more
> > > > time
> > > > > > than now due to metadata not available. I understand this is a
> > > > > acknowledged
> > > > > > trade-off in our design but any regression that may surface to
> > users
> > > > need
> > > > > > to be very carefully considered. I'm wondering, e.g. if we can
> > tweak
> > > > our
> > > > > > algorithm for the Urgent set, e.g. to consider those with non
> > cached
> > > > > > metadata have higher priority than those who have elapsed max.age
> > but
> > > > not
> > > > > > yet have been called for sending. More specifically:
> > > > > >
> > > > > > Urgent: topics that have been requested for sending but no cached
> > > > > metadata,
> > > > > > and topics that have send request failed with e.g. NOT_LEADER.
> > > > > > Non-urgent: topics that are not in Urgent but have expired
> max.age.
> > > > > >
> > > > > > Then when sending metadata, we always send ALL in the urgent
> (i.e.
> > > > ignore
> > > > > > the size limit), and only when they do not exceed the size limit,
> > > > > consider
> > > > > > fill in more topics from Non-urgent up to the size limit.
> > > > > >
> > > > >
> > > > > I think we're on the same page here. Urgent topics ignore the
> target
> > > > > metadata RPC size, and are not bound by it in any way, i.e. if
> > there's
> > > > 100
> > > > > urgent topics, we'll fetch all 100 in a single RPC. Like you say,
> > > > however,
> > > > > if a topic transitions to urgent and there's several non-urgent
> ones,
> > > > we'll
> > > > > piggyback the non-urgent updates up to the target size.
> > > > >
> > > > > Thanks,
> > > > >  Brian
> > > > >
> > > > >
> > > > > On Wed, Nov 20, 2019 at 7:00 PM deng ziming <
> > dengziming1993@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I think it's ok, and you can add another issue about
> > `asynchronous
> > > > > > > metadata` if `topic expiry` is not enough.
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <
> bbyrne@confluent.io
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hello all,
> > > > > > > >
> > > > > > > > I've refactored the KIP to remove implementing asynchronous
> > > > metadata
> > > > > > > > fetching in the producer during send(). It's now exclusively
> > > > focused
> > > > > on
> > > > > > > > reducing the topic metadata fetch payload and proposes
> adding a
> > > new
> > > > > > > > configuration flag to control topic expiry behavior. Please
> > take
> > > a
> > > > > look
> > > > > > > > when possible.
> > > > > > > >
> > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > > > > >
> %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Brian
> > > > > > > >
> > > > > > > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <
> > bbyrne@confluent.io
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Lucas, Guozhang,
> > > > > > > > >
> > > > > > > > > Thank you for the comments. Good point on
> > > > METADATA_MAX_AGE_CONFIG -
> > > > > > it
> > > > > > > > > looks like the ProducerMetadata was differentiating between
> > > > expiry
> > > > > > and
> > > > > > > > > refresh, but it should be unnecessary to do so once the
> cost
> > of
> > > > > > > fetching
> > > > > > > > a
> > > > > > > > > single topic's metadata is reduced.
> > > > > > > > >
> > > > > > > > > I've updated the rejected alternatives and removed the
> config
> > > > > > > variables.
> > > > > > > > >
> > > > > > > > > Brian
> > > > > > > > >
> > > > > > > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <
> > > wangguoz@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hello Brian,
> > > > > > > > >>
> > > > > > > > >> Thanks for the KIP.
> > > > > > > > >>
> > > > > > > > >> I think using asynchronous metadata update to address 1)
> > > > metadata
> > > > > > > update
> > > > > > > > >> blocking send, but for other issues, currently at producer
> > we
> > > do
> > > > > > have
> > > > > > > a
> > > > > > > > >> configurable `METADATA_MAX_AGE_CONFIG` similar to
> consumer,
> > by
> > > > > > default
> > > > > > > > is
> > > > > > > > >> 5min. So maybe we do not need to introduce new configs
> here,
> > > but
> > > > > > only
> > > > > > > > >> change the semantics of that config from global expiry
> > (today
> > > we
> > > > > > just
> > > > > > > > >> enforce a full metadata update for the whole cluster) to
> > > > > > single-topic
> > > > > > > > >> expiry, and we can also extend its expiry deadline
> whenever
> > > that
> > > > > > > > metadata
> > > > > > > > >> is successfully used to send a produce request.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Guozhang
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <
> > > > > lucas@confluent.io
> > > > > > >
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >> > Hi Brian,
> > > > > > > > >> >
> > > > > > > > >> > This looks great, and should help reduce blocking and
> high
> > > > > > metadata
> > > > > > > > >> request
> > > > > > > > >> > volumes when the producer is sending to large numbers of
> > > > topics,
> > > > > > > > >> especially
> > > > > > > > >> > at low volumes. I think the approach to make metadata
> > > fetching
> > > > > > > > >> asynchronous
> > > > > > > > >> > and batch metadata requests together will help
> > > significantly.
> > > > > > > > >> >
> > > > > > > > >> > The only other approach I can think of is to allow users
> > to
> > > > > supply
> > > > > > > the
> > > > > > > > >> > producer with the expected topics upfront, allowing the
> > > > producer
> > > > > > to
> > > > > > > > >> perform
> > > > > > > > >> > a single initial metadata request before any sends
> occur.
> > I
> > > > see
> > > > > no
> > > > > > > > real
> > > > > > > > >> > advantages to this approach compared to the async method
> > > > you’ve
> > > > > > > > >> proposed,
> > > > > > > > >> > but maybe we could add it to the rejected alternatives
> > > > section?
> > > > > > > > >> >
> > > > > > > > >> > Thanks,
> > > > > > > > >> >
> > > > > > > > >> > Lucas
> > > > > > > > >> >
> > > > > > > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <
> > > > bbyrne@confluent.io>
> > > > > > > > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > I've updated the 'Proposed Changes' to include two new
> > > > > producer
> > > > > > > > >> > > configuration variables: topic.expiry.ms and
> > > > topic.refresh.ms
> > > > > .
> > > > > > > > Please
> > > > > > > > >> > take
> > > > > > > > >> > > a look.
> > > > > > > > >> > >
> > > > > > > > >> > > Thanks,
> > > > > > > > >> > > Brian
> > > > > > > > >> > >
> > > > > > > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <
> > > > > > bbyrne@confluent.io
> > > > > > > >
> > > > > > > > >> > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > > Dev team,
> > > > > > > > >> > > >
> > > > > > > > >> > > > Requesting discussion for improvement to the
> producer
> > > when
> > > > > > > dealing
> > > > > > > > >> > with a
> > > > > > > > >> > > > large number of topics.
> > > > > > > > >> > > >
> > > > > > > > >> > > > KIP:
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > > >> > > >
> > > > > > > > >> > > > JIRA:
> > https://issues.apache.org/jira/browse/KAFKA-8904
> > > > > > > > >> > > >
> > > > > > > > >> > > > Thoughts and feedback would be appreciated.
> > > > > > > > >> > > >
> > > > > > > > >> > > > Thanks,
> > > > > > > > >> > > > Brian
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> -- Guozhang
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
Best,
Stanislav

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Brian Byrne <bb...@confluent.io>.
Hello everyone,

For all interested, please take a look at the proposed algorithm as I'd
like to get more feedback. I'll call for a vote once the break is over.

Thanks,
Brian

On Mon, Dec 9, 2019 at 10:18 PM Guozhang Wang <wa...@gmail.com> wrote:

> Sounds good, I agree that should not make a big difference in practice.
>
> On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne <bb...@confluent.io> wrote:
>
> > Hi Guozhang,
> >
> > I see, we agree on the topic threshold not applying to urgent topics, but
> > differ slightly on what should be considered urgent. I would argue that
> we
> > should consider topics nearing the metadata.max.age.ms to be urgent
> since
> > they may still be well within the metadata.expiry.ms. That is, the
> client
> > still considers these topics to be relevant (not expired), but doesn't
> want
> > to incur the latency bubble of having to wait for the metadata to be
> > re-fetched if it's stale. This could be a frequent case if the
> > metadata.max.age.ms << metadata.expiry.ms.
> >
> > In practice, I wouldn't expect this to make a noticeable difference so I
> > don't have a strong leaning, but the current behavior today is to
> > aggressively refresh the metadata of stale topics by ensuring a refresh
> is
> > triggered before that metadata.max.age.ms duration elapses.
> >
> > Thanks,
> > Brian
> >
> >
> > On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> > > Hello Brian,
> > >
> > > Thanks for your explanation, could you then update the wiki page for
> the
> > > algorithm part since when I read it, I thought it was different from
> the
> > > above, e.g. urgent topics should not be added just because of max.age
> > > expiration, but should only be added if there are sending data pending.
> > >
> > >
> > > Guozhang
> > >
> > > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne <bb...@confluent.io>
> wrote:
> > >
> > > > Hi Guozhang,
> > > >
> > > > Thanks for the feedback!
> > > >
> > > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <wa...@gmail.com>
> > wrote:
> > > >
> > > > > 1. The addition of *metadata.expiry.ms <http://metadata.expiry.ms>
> > > > *should
> > > > > be included in the public interface. Also its semantics needs more
> > > > > clarification (since previously it is hard-coded internally we do
> not
> > > > need
> > > > > to explain it publicly, but now with the configurable value we do
> > > need).
> > > > >
> > > >
> > > > This was an oversight. Done.
> > > >
> > > >
> > > > > 2. There are a couple of hard-coded parameters like 25 and 0.5 in
> the
> > > > > proposal, maybe we need to explain why these magic values makes
> sense
> > > in
> > > > > common scenarios.
> > > > >
> > > >
> > > > So these are pretty fuzzy numbers, and seemed to be a decent balance
> > > > between trade-offs. I've updated the target size to account for
> setups
> > > with
> > > > a large number of topics or a shorter refresh time, as well as added
> > some
> > > > light rationale.
> > > >
> > > >
> > > > > 3. In the Urgent set condition, do you actually mean "with no
> cached
> > > > > metadata AND there are existing data buffered for the topic"?
> > > > >
> > > >
> > > > Yes, fixed.
> > > >
> > > >
> > > >
> > > > > One concern I have is whether or not we may introduce a regression,
> > > > > especially during producer startup such that since we only require
> up
> > > to
> > > > 25
> > > > > topics each request, it may cause the send data to be buffered more
> > > time
> > > > > than now due to metadata not available. I understand this is a
> > > > acknowledged
> > > > > trade-off in our design but any regression that may surface to
> users
> > > need
> > > > > to be very carefully considered. I'm wondering, e.g. if we can
> tweak
> > > our
> > > > > algorithm for the Urgent set, e.g. to consider those with non
> cached
> > > > > metadata have higher priority than those who have elapsed max.age
> but
> > > not
> > > > > yet have been called for sending. More specifically:
> > > > >
> > > > > Urgent: topics that have been requested for sending but no cached
> > > > metadata,
> > > > > and topics that have send request failed with e.g. NOT_LEADER.
> > > > > Non-urgent: topics that are not in Urgent but have expired max.age.
> > > > >
> > > > > Then when sending metadata, we always send ALL in the urgent (i.e.
> > > ignore
> > > > > the size limit), and only when they do not exceed the size limit,
> > > > consider
> > > > > fill in more topics from Non-urgent up to the size limit.
> > > > >
> > > >
> > > > I think we're on the same page here. Urgent topics ignore the target
> > > > metadata RPC size, and are not bound by it in any way, i.e. if
> there's
> > > 100
> > > > urgent topics, we'll fetch all 100 in a single RPC. Like you say,
> > > however,
> > > > if a topic transitions to urgent and there's several non-urgent ones,
> > > we'll
> > > > piggyback the non-urgent updates up to the target size.
> > > >
> > > > Thanks,
> > > >  Brian
> > > >
> > > >
> > > > On Wed, Nov 20, 2019 at 7:00 PM deng ziming <
> dengziming1993@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I think it's ok, and you can add another issue about
> `asynchronous
> > > > > > metadata` if `topic expiry` is not enough.
> > > > > >
> > > > > >
> > > > > > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <bbyrne@confluent.io
> >
> > > > wrote:
> > > > > >
> > > > > > > Hello all,
> > > > > > >
> > > > > > > I've refactored the KIP to remove implementing asynchronous
> > > metadata
> > > > > > > fetching in the producer during send(). It's now exclusively
> > > focused
> > > > on
> > > > > > > reducing the topic metadata fetch payload and proposes adding a
> > new
> > > > > > > configuration flag to control topic expiry behavior. Please
> take
> > a
> > > > look
> > > > > > > when possible.
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > > > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Brian
> > > > > > >
> > > > > > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <
> bbyrne@confluent.io
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Lucas, Guozhang,
> > > > > > > >
> > > > > > > > Thank you for the comments. Good point on
> > > METADATA_MAX_AGE_CONFIG -
> > > > > it
> > > > > > > > looks like the ProducerMetadata was differentiating between
> > > expiry
> > > > > and
> > > > > > > > refresh, but it should be unnecessary to do so once the cost
> of
> > > > > > fetching
> > > > > > > a
> > > > > > > > single topic's metadata is reduced.
> > > > > > > >
> > > > > > > > I've updated the rejected alternatives and removed the config
> > > > > > variables.
> > > > > > > >
> > > > > > > > Brian
> > > > > > > >
> > > > > > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <
> > wangguoz@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hello Brian,
> > > > > > > >>
> > > > > > > >> Thanks for the KIP.
> > > > > > > >>
> > > > > > > >> I think using asynchronous metadata update to address 1)
> > > metadata
> > > > > > update
> > > > > > > >> blocking send, but for other issues, currently at producer
> we
> > do
> > > > > have
> > > > > > a
> > > > > > > >> configurable `METADATA_MAX_AGE_CONFIG` similar to consumer,
> by
> > > > > default
> > > > > > > is
> > > > > > > >> 5min. So maybe we do not need to introduce new configs here,
> > but
> > > > > only
> > > > > > > >> change the semantics of that config from global expiry
> (today
> > we
> > > > > just
> > > > > > > >> enforce a full metadata update for the whole cluster) to
> > > > > single-topic
> > > > > > > >> expiry, and we can also extend its expiry deadline whenever
> > that
> > > > > > > metadata
> > > > > > > >> is successfully used to send a produce request.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Guozhang
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <
> > > > lucas@confluent.io
> > > > > >
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Hi Brian,
> > > > > > > >> >
> > > > > > > >> > This looks great, and should help reduce blocking and high
> > > > > metadata
> > > > > > > >> request
> > > > > > > >> > volumes when the producer is sending to large numbers of
> > > topics,
> > > > > > > >> especially
> > > > > > > >> > at low volumes. I think the approach to make metadata
> > fetching
> > > > > > > >> asynchronous
> > > > > > > >> > and batch metadata requests together will help
> > significantly.
> > > > > > > >> >
> > > > > > > >> > The only other approach I can think of is to allow users
> to
> > > > supply
> > > > > > the
> > > > > > > >> > producer with the expected topics upfront, allowing the
> > > producer
> > > > > to
> > > > > > > >> perform
> > > > > > > >> > a single initial metadata request before any sends occur.
> I
> > > see
> > > > no
> > > > > > > real
> > > > > > > >> > advantages to this approach compared to the async method
> > > you’ve
> > > > > > > >> proposed,
> > > > > > > >> > but maybe we could add it to the rejected alternatives
> > > section?
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> >
> > > > > > > >> > Lucas
> > > > > > > >> >
> > > > > > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <
> > > bbyrne@confluent.io>
> > > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > I've updated the 'Proposed Changes' to include two new
> > > > producer
> > > > > > > >> > > configuration variables: topic.expiry.ms and
> > > topic.refresh.ms
> > > > .
> > > > > > > Please
> > > > > > > >> > take
> > > > > > > >> > > a look.
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > > Brian
> > > > > > > >> > >
> > > > > > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <
> > > > > bbyrne@confluent.io
> > > > > > >
> > > > > > > >> > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Dev team,
> > > > > > > >> > > >
> > > > > > > >> > > > Requesting discussion for improvement to the producer
> > when
> > > > > > dealing
> > > > > > > >> > with a
> > > > > > > >> > > > large number of topics.
> > > > > > > >> > > >
> > > > > > > >> > > > KIP:
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > > >> > > >
> > > > > > > >> > > > JIRA:
> https://issues.apache.org/jira/browse/KAFKA-8904
> > > > > > > >> > > >
> > > > > > > >> > > > Thoughts and feedback would be appreciated.
> > > > > > > >> > > >
> > > > > > > >> > > > Thanks,
> > > > > > > >> > > > Brian
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> -- Guozhang
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Guozhang Wang <wa...@gmail.com>.
Sounds good, I agree that should not make a big difference in practice.

On Mon, Dec 9, 2019 at 2:07 PM Brian Byrne <bb...@confluent.io> wrote:

> Hi Guozhang,
>
> I see, we agree on the topic threshold not applying to urgent topics, but
> differ slightly on what should be considered urgent. I would argue that we
> should consider topics nearing the metadata.max.age.ms to be urgent since
> they may still be well within the metadata.expiry.ms. That is, the client
> still considers these topics to be relevant (not expired), but doesn't want
> to incur the latency bubble of having to wait for the metadata to be
> re-fetched if it's stale. This could be a frequent case if the
> metadata.max.age.ms << metadata.expiry.ms.
>
> In practice, I wouldn't expect this to make a noticeable difference so I
> don't have a strong leaning, but the current behavior today is to
> aggressively refresh the metadata of stale topics by ensuring a refresh is
> triggered before that metadata.max.age.ms duration elapses.
>
> Thanks,
> Brian
>
>
> On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang <wa...@gmail.com> wrote:
>
> > Hello Brian,
> >
> > Thanks for your explanation, could you then update the wiki page for the
> > algorithm part since when I read it, I thought it was different from the
> > above, e.g. urgent topics should not be added just because of max.age
> > expiration, but should only be added if there are sending data pending.
> >
> >
> > Guozhang
> >
> > On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne <bb...@confluent.io> wrote:
> >
> > > Hi Guozhang,
> > >
> > > Thanks for the feedback!
> > >
> > > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <wa...@gmail.com>
> wrote:
> > >
> > > > 1. The addition of *metadata.expiry.ms <http://metadata.expiry.ms>
> > > *should
> > > > be included in the public interface. Also its semantics needs more
> > > > clarification (since previously it is hard-coded internally we do not
> > > need
> > > > to explain it publicly, but now with the configurable value we do
> > need).
> > > >
> > >
> > > This was an oversight. Done.
> > >
> > >
> > > > 2. There are a couple of hard-coded parameters like 25 and 0.5 in the
> > > > proposal, maybe we need to explain why these magic values makes sense
> > in
> > > > common scenarios.
> > > >
> > >
> > > So these are pretty fuzzy numbers, and seemed to be a decent balance
> > > between trade-offs. I've updated the target size to account for setups
> > with
> > > a large number of topics or a shorter refresh time, as well as added
> some
> > > light rationale.
> > >
> > >
> > > > 3. In the Urgent set condition, do you actually mean "with no cached
> > > > metadata AND there are existing data buffered for the topic"?
> > > >
> > >
> > > Yes, fixed.
> > >
> > >
> > >
> > > > One concern I have is whether or not we may introduce a regression,
> > > > especially during producer startup such that since we only require up
> > to
> > > 25
> > > > topics each request, it may cause the send data to be buffered more
> > time
> > > > than now due to metadata not available. I understand this is a
> > > acknowledged
> > > > trade-off in our design but any regression that may surface to users
> > need
> > > > to be very carefully considered. I'm wondering, e.g. if we can tweak
> > our
> > > > algorithm for the Urgent set, e.g. to consider those with non cached
> > > > metadata have higher priority than those who have elapsed max.age but
> > not
> > > > yet have been called for sending. More specifically:
> > > >
> > > > Urgent: topics that have been requested for sending but no cached
> > > metadata,
> > > > and topics that have send request failed with e.g. NOT_LEADER.
> > > > Non-urgent: topics that are not in Urgent but have expired max.age.
> > > >
> > > > Then when sending metadata, we always send ALL in the urgent (i.e.
> > ignore
> > > > the size limit), and only when they do not exceed the size limit,
> > > consider
> > > > fill in more topics from Non-urgent up to the size limit.
> > > >
> > >
> > > I think we're on the same page here. Urgent topics ignore the target
> > > metadata RPC size, and are not bound by it in any way, i.e. if there's
> > 100
> > > urgent topics, we'll fetch all 100 in a single RPC. Like you say,
> > however,
> > > if a topic transitions to urgent and there's several non-urgent ones,
> > we'll
> > > piggyback the non-urgent updates up to the target size.
> > >
> > > Thanks,
> > >  Brian
> > >
> > >
> > > On Wed, Nov 20, 2019 at 7:00 PM deng ziming <de...@gmail.com>
> > > > wrote:
> > > >
> > > > > I think it's ok, and you can add another issue about `asynchronous
> > > > > metadata` if `topic expiry` is not enough.
> > > > >
> > > > >
> > > > > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <bb...@confluent.io>
> > > wrote:
> > > > >
> > > > > > Hello all,
> > > > > >
> > > > > > I've refactored the KIP to remove implementing asynchronous
> > metadata
> > > > > > fetching in the producer during send(). It's now exclusively
> > focused
> > > on
> > > > > > reducing the topic metadata fetch payload and proposes adding a
> new
> > > > > > configuration flag to control topic expiry behavior. Please take
> a
> > > look
> > > > > > when possible.
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > >
> > > > > > Thanks,
> > > > > > Brian
> > > > > >
> > > > > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <bbyrne@confluent.io
> >
> > > > wrote:
> > > > > >
> > > > > > > Lucas, Guozhang,
> > > > > > >
> > > > > > > Thank you for the comments. Good point on
> > METADATA_MAX_AGE_CONFIG -
> > > > it
> > > > > > > looks like the ProducerMetadata was differentiating between
> > expiry
> > > > and
> > > > > > > refresh, but it should be unnecessary to do so once the cost of
> > > > > fetching
> > > > > > a
> > > > > > > single topic's metadata is reduced.
> > > > > > >
> > > > > > > I've updated the rejected alternatives and removed the config
> > > > > variables.
> > > > > > >
> > > > > > > Brian
> > > > > > >
> > > > > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <
> wangguoz@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > >> Hello Brian,
> > > > > > >>
> > > > > > >> Thanks for the KIP.
> > > > > > >>
> > > > > > >> I think using asynchronous metadata update to address 1)
> > metadata
> > > > > update
> > > > > > >> blocking send, but for other issues, currently at producer we
> do
> > > > have
> > > > > a
> > > > > > >> configurable `METADATA_MAX_AGE_CONFIG` similar to consumer, by
> > > > default
> > > > > > is
> > > > > > >> 5min. So maybe we do not need to introduce new configs here,
> but
> > > > only
> > > > > > >> change the semantics of that config from global expiry (today
> we
> > > > just
> > > > > > >> enforce a full metadata update for the whole cluster) to
> > > > single-topic
> > > > > > >> expiry, and we can also extend its expiry deadline whenever
> that
> > > > > > metadata
> > > > > > >> is successfully used to send a produce request.
> > > > > > >>
> > > > > > >>
> > > > > > >> Guozhang
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <
> > > lucas@confluent.io
> > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hi Brian,
> > > > > > >> >
> > > > > > >> > This looks great, and should help reduce blocking and high
> > > > metadata
> > > > > > >> request
> > > > > > >> > volumes when the producer is sending to large numbers of
> > topics,
> > > > > > >> especially
> > > > > > >> > at low volumes. I think the approach to make metadata
> fetching
> > > > > > >> asynchronous
> > > > > > >> > and batch metadata requests together will help
> significantly.
> > > > > > >> >
> > > > > > >> > The only other approach I can think of is to allow users to
> > > supply
> > > > > the
> > > > > > >> > producer with the expected topics upfront, allowing the
> > producer
> > > > to
> > > > > > >> perform
> > > > > > >> > a single initial metadata request before any sends occur. I
> > see
> > > no
> > > > > > real
> > > > > > >> > advantages to this approach compared to the async method
> > you’ve
> > > > > > >> proposed,
> > > > > > >> > but maybe we could add it to the rejected alternatives
> > section?
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> >
> > > > > > >> > Lucas
> > > > > > >> >
> > > > > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <
> > bbyrne@confluent.io>
> > > > > > wrote:
> > > > > > >> >
> > > > > > >> > > I've updated the 'Proposed Changes' to include two new
> > > producer
> > > > > > >> > > configuration variables: topic.expiry.ms and
> > topic.refresh.ms
> > > .
> > > > > > Please
> > > > > > >> > take
> > > > > > >> > > a look.
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > > Brian
> > > > > > >> > >
> > > > > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <
> > > > bbyrne@confluent.io
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> > >
> > > > > > >> > > > Dev team,
> > > > > > >> > > >
> > > > > > >> > > > Requesting discussion for improvement to the producer
> when
> > > > > dealing
> > > > > > >> > with a
> > > > > > >> > > > large number of topics.
> > > > > > >> > > >
> > > > > > >> > > > KIP:
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > > >> > > >
> > > > > > >> > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-8904
> > > > > > >> > > >
> > > > > > >> > > > Thoughts and feedback would be appreciated.
> > > > > > >> > > >
> > > > > > >> > > > Thanks,
> > > > > > >> > > > Brian
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> -- Guozhang
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Brian Byrne <bb...@confluent.io>.
Hi Guozhang,

I see, we agree on the topic threshold not applying to urgent topics, but
differ slightly on what should be considered urgent. I would argue that we
should consider topics nearing the metadata.max.age.ms to be urgent since
they may still be well within the metadata.expiry.ms. That is, the client
still considers these topics to be relevant (not expired), but doesn't want
to incur the latency bubble of having to wait for the metadata to be
re-fetched if it's stale. This could be a frequent case if the
metadata.max.age.ms << metadata.expiry.ms.

In practice, I wouldn't expect this to make a noticeable difference so I
don't have a strong leaning, but the current behavior today is to
aggressively refresh the metadata of stale topics by ensuring a refresh is
triggered before that metadata.max.age.ms duration elapses.

Thanks,
Brian


On Mon, Dec 9, 2019 at 11:57 AM Guozhang Wang <wa...@gmail.com> wrote:

> Hello Brian,
>
> Thanks for your explanation, could you then update the wiki page for the
> algorithm part since when I read it, I thought it was different from the
> above, e.g. urgent topics should not be added just because of max.age
> expiration, but should only be added if there are sending data pending.
>
>
> Guozhang
>
> On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne <bb...@confluent.io> wrote:
>
> > Hi Guozhang,
> >
> > Thanks for the feedback!
> >
> > On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <wa...@gmail.com> wrote:
> >
> > > 1. The addition of *metadata.expiry.ms <http://metadata.expiry.ms>
> > *should
> > > be included in the public interface. Also its semantics needs more
> > > clarification (since previously it is hard-coded internally we do not
> > need
> > > to explain it publicly, but now with the configurable value we do
> need).
> > >
> >
> > This was an oversight. Done.
> >
> >
> > > 2. There are a couple of hard-coded parameters like 25 and 0.5 in the
> > > proposal, maybe we need to explain why these magic values makes sense
> in
> > > common scenarios.
> > >
> >
> > So these are pretty fuzzy numbers, and seemed to be a decent balance
> > between trade-offs. I've updated the target size to account for setups
> with
> > a large number of topics or a shorter refresh time, as well as added some
> > light rationale.
> >
> >
> > > 3. In the Urgent set condition, do you actually mean "with no cached
> > > metadata AND there are existing data buffered for the topic"?
> > >
> >
> > Yes, fixed.
> >
> >
> >
> > > One concern I have is whether or not we may introduce a regression,
> > > especially during producer startup such that since we only require up
> to
> > 25
> > > topics each request, it may cause the send data to be buffered more
> time
> > > than now due to metadata not available. I understand this is a
> > acknowledged
> > > trade-off in our design but any regression that may surface to users
> need
> > > to be very carefully considered. I'm wondering, e.g. if we can tweak
> our
> > > algorithm for the Urgent set, e.g. to consider those with non cached
> > > metadata have higher priority than those who have elapsed max.age but
> not
> > > yet have been called for sending. More specifically:
> > >
> > > Urgent: topics that have been requested for sending but no cached
> > metadata,
> > > and topics that have send request failed with e.g. NOT_LEADER.
> > > Non-urgent: topics that are not in Urgent but have expired max.age.
> > >
> > > Then when sending metadata, we always send ALL in the urgent (i.e.
> ignore
> > > the size limit), and only when they do not exceed the size limit,
> > consider
> > > fill in more topics from Non-urgent up to the size limit.
> > >
> >
> > I think we're on the same page here. Urgent topics ignore the target
> > metadata RPC size, and are not bound by it in any way, i.e. if there's
> 100
> > urgent topics, we'll fetch all 100 in a single RPC. Like you say,
> however,
> > if a topic transitions to urgent and there's several non-urgent ones,
> we'll
> > piggyback the non-urgent updates up to the target size.
> >
> > Thanks,
> >  Brian
> >
> >
> > On Wed, Nov 20, 2019 at 7:00 PM deng ziming <de...@gmail.com>
> > > wrote:
> > >
> > > > I think it's ok, and you can add another issue about `asynchronous
> > > > metadata` if `topic expiry` is not enough.
> > > >
> > > >
> > > > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <bb...@confluent.io>
> > wrote:
> > > >
> > > > > Hello all,
> > > > >
> > > > > I've refactored the KIP to remove implementing asynchronous
> metadata
> > > > > fetching in the producer during send(). It's now exclusively
> focused
> > on
> > > > > reducing the topic metadata fetch payload and proposes adding a new
> > > > > configuration flag to control topic expiry behavior. Please take a
> > look
> > > > > when possible.
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > >
> > > > > Thanks,
> > > > > Brian
> > > > >
> > > > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <bb...@confluent.io>
> > > wrote:
> > > > >
> > > > > > Lucas, Guozhang,
> > > > > >
> > > > > > Thank you for the comments. Good point on
> METADATA_MAX_AGE_CONFIG -
> > > it
> > > > > > looks like the ProducerMetadata was differentiating between
> expiry
> > > and
> > > > > > refresh, but it should be unnecessary to do so once the cost of
> > > > fetching
> > > > > a
> > > > > > single topic's metadata is reduced.
> > > > > >
> > > > > > I've updated the rejected alternatives and removed the config
> > > > variables.
> > > > > >
> > > > > > Brian
> > > > > >
> > > > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <wangguoz@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > >> Hello Brian,
> > > > > >>
> > > > > >> Thanks for the KIP.
> > > > > >>
> > > > > >> I think using asynchronous metadata update to address 1)
> metadata
> > > > update
> > > > > >> blocking send, but for other issues, currently at producer we do
> > > have
> > > > a
> > > > > >> configurable `METADATA_MAX_AGE_CONFIG` similar to consumer, by
> > > default
> > > > > is
> > > > > >> 5min. So maybe we do not need to introduce new configs here, but
> > > only
> > > > > >> change the semantics of that config from global expiry (today we
> > > just
> > > > > >> enforce a full metadata update for the whole cluster) to
> > > single-topic
> > > > > >> expiry, and we can also extend its expiry deadline whenever that
> > > > > metadata
> > > > > >> is successfully used to send a produce request.
> > > > > >>
> > > > > >>
> > > > > >> Guozhang
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <
> > lucas@confluent.io
> > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi Brian,
> > > > > >> >
> > > > > >> > This looks great, and should help reduce blocking and high
> > > metadata
> > > > > >> request
> > > > > >> > volumes when the producer is sending to large numbers of
> topics,
> > > > > >> especially
> > > > > >> > at low volumes. I think the approach to make metadata fetching
> > > > > >> asynchronous
> > > > > >> > and batch metadata requests together will help significantly.
> > > > > >> >
> > > > > >> > The only other approach I can think of is to allow users to
> > supply
> > > > the
> > > > > >> > producer with the expected topics upfront, allowing the
> producer
> > > to
> > > > > >> perform
> > > > > >> > a single initial metadata request before any sends occur. I
> see
> > no
> > > > > real
> > > > > >> > advantages to this approach compared to the async method
> you’ve
> > > > > >> proposed,
> > > > > >> > but maybe we could add it to the rejected alternatives
> section?
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> > Lucas
> > > > > >> >
> > > > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <
> bbyrne@confluent.io>
> > > > > wrote:
> > > > > >> >
> > > > > >> > > I've updated the 'Proposed Changes' to include two new
> > producer
> > > > > >> > > configuration variables: topic.expiry.ms and
> topic.refresh.ms
> > .
> > > > > Please
> > > > > >> > take
> > > > > >> > > a look.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > Brian
> > > > > >> > >
> > > > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <
> > > bbyrne@confluent.io
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > Dev team,
> > > > > >> > > >
> > > > > >> > > > Requesting discussion for improvement to the producer when
> > > > dealing
> > > > > >> > with a
> > > > > >> > > > large number of topics.
> > > > > >> > > >
> > > > > >> > > > KIP:
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > > >> > > >
> > > > > >> > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-8904
> > > > > >> > > >
> > > > > >> > > > Thoughts and feedback would be appreciated.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > > Brian
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> -- Guozhang
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Brian,

Thanks for your explanation, could you then update the wiki page for the
algorithm part since when I read it, I thought it was different from the
above, e.g. urgent topics should not be added just because of max.age
expiration, but should only be added if there are sending data pending.


Guozhang

On Mon, Dec 9, 2019 at 10:57 AM Brian Byrne <bb...@confluent.io> wrote:

> Hi Guozhang,
>
> Thanks for the feedback!
>
> On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <wa...@gmail.com> wrote:
>
> > 1. The addition of *metadata.expiry.ms <http://metadata.expiry.ms>
> *should
> > be included in the public interface. Also its semantics needs more
> > clarification (since previously it is hard-coded internally we do not
> need
> > to explain it publicly, but now with the configurable value we do need).
> >
>
> This was an oversight. Done.
>
>
> > 2. There are a couple of hard-coded parameters like 25 and 0.5 in the
> > proposal, maybe we need to explain why these magic values makes sense in
> > common scenarios.
> >
>
> So these are pretty fuzzy numbers, and seemed to be a decent balance
> between trade-offs. I've updated the target size to account for setups with
> a large number of topics or a shorter refresh time, as well as added some
> light rationale.
>
>
> > 3. In the Urgent set condition, do you actually mean "with no cached
> > metadata AND there are existing data buffered for the topic"?
> >
>
> Yes, fixed.
>
>
>
> > One concern I have is whether or not we may introduce a regression,
> > especially during producer startup such that since we only require up to
> 25
> > topics each request, it may cause the send data to be buffered more time
> > than now due to metadata not available. I understand this is a
> acknowledged
> > trade-off in our design but any regression that may surface to users need
> > to be very carefully considered. I'm wondering, e.g. if we can tweak our
> > algorithm for the Urgent set, e.g. to consider those with non cached
> > metadata have higher priority than those who have elapsed max.age but not
> > yet have been called for sending. More specifically:
> >
> > Urgent: topics that have been requested for sending but no cached
> metadata,
> > and topics that have send request failed with e.g. NOT_LEADER.
> > Non-urgent: topics that are not in Urgent but have expired max.age.
> >
> > Then when sending metadata, we always send ALL in the urgent (i.e. ignore
> > the size limit), and only when they do not exceed the size limit,
> consider
> > fill in more topics from Non-urgent up to the size limit.
> >
>
> I think we're on the same page here. Urgent topics ignore the target
> metadata RPC size, and are not bound by it in any way, i.e. if there's 100
> urgent topics, we'll fetch all 100 in a single RPC. Like you say, however,
> if a topic transitions to urgent and there's several non-urgent ones, we'll
> piggyback the non-urgent updates up to the target size.
>
> Thanks,
>  Brian
>
>
> On Wed, Nov 20, 2019 at 7:00 PM deng ziming <de...@gmail.com>
> > wrote:
> >
> > > I think it's ok, and you can add another issue about `asynchronous
> > > metadata` if `topic expiry` is not enough.
> > >
> > >
> > > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <bb...@confluent.io>
> wrote:
> > >
> > > > Hello all,
> > > >
> > > > I've refactored the KIP to remove implementing asynchronous metadata
> > > > fetching in the producer during send(). It's now exclusively focused
> on
> > > > reducing the topic metadata fetch payload and proposes adding a new
> > > > configuration flag to control topic expiry behavior. Please take a
> look
> > > > when possible.
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > >
> > > > Thanks,
> > > > Brian
> > > >
> > > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <bb...@confluent.io>
> > wrote:
> > > >
> > > > > Lucas, Guozhang,
> > > > >
> > > > > Thank you for the comments. Good point on METADATA_MAX_AGE_CONFIG -
> > it
> > > > > looks like the ProducerMetadata was differentiating between expiry
> > and
> > > > > refresh, but it should be unnecessary to do so once the cost of
> > > fetching
> > > > a
> > > > > single topic's metadata is reduced.
> > > > >
> > > > > I've updated the rejected alternatives and removed the config
> > > variables.
> > > > >
> > > > > Brian
> > > > >
> > > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <wa...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Hello Brian,
> > > > >>
> > > > >> Thanks for the KIP.
> > > > >>
> > > > >> I think using asynchronous metadata update to address 1) metadata
> > > update
> > > > >> blocking send, but for other issues, currently at producer we do
> > have
> > > a
> > > > >> configurable `METADATA_MAX_AGE_CONFIG` similar to consumer, by
> > default
> > > > is
> > > > >> 5min. So maybe we do not need to introduce new configs here, but
> > only
> > > > >> change the semantics of that config from global expiry (today we
> > just
> > > > >> enforce a full metadata update for the whole cluster) to
> > single-topic
> > > > >> expiry, and we can also extend its expiry deadline whenever that
> > > > metadata
> > > > >> is successfully used to send a produce request.
> > > > >>
> > > > >>
> > > > >> Guozhang
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <
> lucas@confluent.io
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Brian,
> > > > >> >
> > > > >> > This looks great, and should help reduce blocking and high
> > metadata
> > > > >> request
> > > > >> > volumes when the producer is sending to large numbers of topics,
> > > > >> especially
> > > > >> > at low volumes. I think the approach to make metadata fetching
> > > > >> asynchronous
> > > > >> > and batch metadata requests together will help significantly.
> > > > >> >
> > > > >> > The only other approach I can think of is to allow users to
> supply
> > > the
> > > > >> > producer with the expected topics upfront, allowing the producer
> > to
> > > > >> perform
> > > > >> > a single initial metadata request before any sends occur. I see
> no
> > > > real
> > > > >> > advantages to this approach compared to the async method you’ve
> > > > >> proposed,
> > > > >> > but maybe we could add it to the rejected alternatives section?
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > Lucas
> > > > >> >
> > > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <bb...@confluent.io>
> > > > wrote:
> > > > >> >
> > > > >> > > I've updated the 'Proposed Changes' to include two new
> producer
> > > > >> > > configuration variables: topic.expiry.ms and topic.refresh.ms
> .
> > > > Please
> > > > >> > take
> > > > >> > > a look.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Brian
> > > > >> > >
> > > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <
> > bbyrne@confluent.io
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Dev team,
> > > > >> > > >
> > > > >> > > > Requesting discussion for improvement to the producer when
> > > dealing
> > > > >> > with a
> > > > >> > > > large number of topics.
> > > > >> > > >
> > > > >> > > > KIP:
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > > >> > > >
> > > > >> > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-8904
> > > > >> > > >
> > > > >> > > > Thoughts and feedback would be appreciated.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Brian
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >> -- Guozhang
> > > > >>
> > > > >
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Brian Byrne <bb...@confluent.io>.
Hi Guozhang,

Thanks for the feedback!

On Sun, Dec 8, 2019 at 6:25 PM Guozhang Wang <wa...@gmail.com> wrote:

> 1. The addition of *metadata.expiry.ms <http://metadata.expiry.ms> *should
> be included in the public interface. Also its semantics needs more
> clarification (since previously it is hard-coded internally we do not need
> to explain it publicly, but now with the configurable value we do need).
>

This was an oversight. Done.


> 2. There are a couple of hard-coded parameters like 25 and 0.5 in the
> proposal, maybe we need to explain why these magic values makes sense in
> common scenarios.
>

So these are pretty fuzzy numbers, and seemed to be a decent balance
between trade-offs. I've updated the target size to account for setups with
a large number of topics or a shorter refresh time, as well as added some
light rationale.


> 3. In the Urgent set condition, do you actually mean "with no cached
> metadata AND there are existing data buffered for the topic"?
>

Yes, fixed.



> One concern I have is whether or not we may introduce a regression,
> especially during producer startup such that since we only require up to 25
> topics each request, it may cause the send data to be buffered more time
> than now due to metadata not available. I understand this is a acknowledged
> trade-off in our design but any regression that may surface to users need
> to be very carefully considered. I'm wondering, e.g. if we can tweak our
> algorithm for the Urgent set, e.g. to consider those with non cached
> metadata have higher priority than those who have elapsed max.age but not
> yet have been called for sending. More specifically:
>
> Urgent: topics that have been requested for sending but no cached metadata,
> and topics that have send request failed with e.g. NOT_LEADER.
> Non-urgent: topics that are not in Urgent but have expired max.age.
>
> Then when sending metadata, we always send ALL in the urgent (i.e. ignore
> the size limit), and only when they do not exceed the size limit, consider
> fill in more topics from Non-urgent up to the size limit.
>

I think we're on the same page here. Urgent topics ignore the target
metadata RPC size, and are not bound by it in any way, i.e. if there's 100
urgent topics, we'll fetch all 100 in a single RPC. Like you say, however,
if a topic transitions to urgent and there's several non-urgent ones, we'll
piggyback the non-urgent updates up to the target size.

Thanks,
 Brian


On Wed, Nov 20, 2019 at 7:00 PM deng ziming <de...@gmail.com>
> wrote:
>
> > I think it's ok, and you can add another issue about `asynchronous
> > metadata` if `topic expiry` is not enough.
> >
> >
> > On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <bb...@confluent.io> wrote:
> >
> > > Hello all,
> > >
> > > I've refactored the KIP to remove implementing asynchronous metadata
> > > fetching in the producer during send(). It's now exclusively focused on
> > > reducing the topic metadata fetch payload and proposes adding a new
> > > configuration flag to control topic expiry behavior. Please take a look
> > > when possible.
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > >
> > > Thanks,
> > > Brian
> > >
> > > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <bb...@confluent.io>
> wrote:
> > >
> > > > Lucas, Guozhang,
> > > >
> > > > Thank you for the comments. Good point on METADATA_MAX_AGE_CONFIG -
> it
> > > > looks like the ProducerMetadata was differentiating between expiry
> and
> > > > refresh, but it should be unnecessary to do so once the cost of
> > fetching
> > > a
> > > > single topic's metadata is reduced.
> > > >
> > > > I've updated the rejected alternatives and removed the config
> > variables.
> > > >
> > > > Brian
> > > >
> > > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <wa...@gmail.com>
> > wrote:
> > > >
> > > >> Hello Brian,
> > > >>
> > > >> Thanks for the KIP.
> > > >>
> > > >> I think using asynchronous metadata update to address 1) metadata
> > update
> > > >> blocking send, but for other issues, currently at producer we do
> have
> > a
> > > >> configurable `METADATA_MAX_AGE_CONFIG` similar to consumer, by
> default
> > > is
> > > >> 5min. So maybe we do not need to introduce new configs here, but
> only
> > > >> change the semantics of that config from global expiry (today we
> just
> > > >> enforce a full metadata update for the whole cluster) to
> single-topic
> > > >> expiry, and we can also extend its expiry deadline whenever that
> > > metadata
> > > >> is successfully used to send a produce request.
> > > >>
> > > >>
> > > >> Guozhang
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <lucas@confluent.io
> >
> > > >> wrote:
> > > >>
> > > >> > Hi Brian,
> > > >> >
> > > >> > This looks great, and should help reduce blocking and high
> metadata
> > > >> request
> > > >> > volumes when the producer is sending to large numbers of topics,
> > > >> especially
> > > >> > at low volumes. I think the approach to make metadata fetching
> > > >> asynchronous
> > > >> > and batch metadata requests together will help significantly.
> > > >> >
> > > >> > The only other approach I can think of is to allow users to supply
> > the
> > > >> > producer with the expected topics upfront, allowing the producer
> to
> > > >> perform
> > > >> > a single initial metadata request before any sends occur. I see no
> > > real
> > > >> > advantages to this approach compared to the async method you’ve
> > > >> proposed,
> > > >> > but maybe we could add it to the rejected alternatives section?
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Lucas
> > > >> >
> > > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <bb...@confluent.io>
> > > wrote:
> > > >> >
> > > >> > > I've updated the 'Proposed Changes' to include two new producer
> > > >> > > configuration variables: topic.expiry.ms and topic.refresh.ms.
> > > Please
> > > >> > take
> > > >> > > a look.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Brian
> > > >> > >
> > > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <
> bbyrne@confluent.io
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Dev team,
> > > >> > > >
> > > >> > > > Requesting discussion for improvement to the producer when
> > dealing
> > > >> > with a
> > > >> > > > large number of topics.
> > > >> > > >
> > > >> > > > KIP:
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > >> > > >
> > > >> > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-8904
> > > >> > > >
> > > >> > > > Thoughts and feedback would be appreciated.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Brian
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> -- Guozhang
> > > >>
> > > >
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Brian,

Thanks for the updated PR and sorry for the late reply. I reviewed the page
again and here are some more comments:

Minor:

1. The addition of *metadata.expiry.ms <http://metadata.expiry.ms> *should
be included in the public interface. Also its semantics needs more
clarification (since previously it is hard-coded internally we do not need
to explain it publicly, but now with the configurable value we do need).
2. There are a couple of hard-coded parameters like 25 and 0.5 in the
proposal, maybe we need to explain why these magic values makes sense in
common scenarios.
3. In the Urgent set condition, do you actually mean "with no cached
metadata AND there are existing data buffered for the topic"?

Meta:

One concern I have is whether or not we may introduce a regression,
especially during producer startup such that since we only require up to 25
topics each request, it may cause the send data to be buffered more time
than now due to metadata not available. I understand this is a acknowledged
trade-off in our design but any regression that may surface to users need
to be very carefully considered. I'm wondering, e.g. if we can tweak our
algorithm for the Urgent set, e.g. to consider those with non cached
metadata have higher priority than those who have elapsed max.age but not
yet have been called for sending. More specifically:

Urgent: topics that have been requested for sending but no cached metadata,
and topics that have send request failed with e.g. NOT_LEADER.
Non-urgent: topics that are not in Urgent but have expired max.age.

Then when sending metadata, we always send ALL in the urgent (i.e. ignore
the size limit), and only when they do not exceed the size limit, consider
fill in more topics from Non-urgent up to the size limit.


Guozhang



On Wed, Nov 20, 2019 at 7:00 PM deng ziming <de...@gmail.com>
wrote:

> I think it's ok, and you can add another issue about `asynchronous
> metadata` if `topic expiry` is not enough.
>
>
> On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <bb...@confluent.io> wrote:
>
> > Hello all,
> >
> > I've refactored the KIP to remove implementing asynchronous metadata
> > fetching in the producer during send(). It's now exclusively focused on
> > reducing the topic metadata fetch payload and proposes adding a new
> > configuration flag to control topic expiry behavior. Please take a look
> > when possible.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> >
> > Thanks,
> > Brian
> >
> > On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <bb...@confluent.io> wrote:
> >
> > > Lucas, Guozhang,
> > >
> > > Thank you for the comments. Good point on METADATA_MAX_AGE_CONFIG - it
> > > looks like the ProducerMetadata was differentiating between expiry and
> > > refresh, but it should be unnecessary to do so once the cost of
> fetching
> > a
> > > single topic's metadata is reduced.
> > >
> > > I've updated the rejected alternatives and removed the config
> variables.
> > >
> > > Brian
> > >
> > > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <wa...@gmail.com>
> wrote:
> > >
> > >> Hello Brian,
> > >>
> > >> Thanks for the KIP.
> > >>
> > >> I think using asynchronous metadata update to address 1) metadata
> update
> > >> blocking send, but for other issues, currently at producer we do have
> a
> > >> configurable `METADATA_MAX_AGE_CONFIG` similar to consumer, by default
> > is
> > >> 5min. So maybe we do not need to introduce new configs here, but only
> > >> change the semantics of that config from global expiry (today we just
> > >> enforce a full metadata update for the whole cluster) to single-topic
> > >> expiry, and we can also extend its expiry deadline whenever that
> > metadata
> > >> is successfully used to send a produce request.
> > >>
> > >>
> > >> Guozhang
> > >>
> > >>
> > >>
> > >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <lu...@confluent.io>
> > >> wrote:
> > >>
> > >> > Hi Brian,
> > >> >
> > >> > This looks great, and should help reduce blocking and high metadata
> > >> request
> > >> > volumes when the producer is sending to large numbers of topics,
> > >> especially
> > >> > at low volumes. I think the approach to make metadata fetching
> > >> asynchronous
> > >> > and batch metadata requests together will help significantly.
> > >> >
> > >> > The only other approach I can think of is to allow users to supply
> the
> > >> > producer with the expected topics upfront, allowing the producer to
> > >> perform
> > >> > a single initial metadata request before any sends occur. I see no
> > real
> > >> > advantages to this approach compared to the async method you’ve
> > >> proposed,
> > >> > but maybe we could add it to the rejected alternatives section?
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Lucas
> > >> >
> > >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <bb...@confluent.io>
> > wrote:
> > >> >
> > >> > > I've updated the 'Proposed Changes' to include two new producer
> > >> > > configuration variables: topic.expiry.ms and topic.refresh.ms.
> > Please
> > >> > take
> > >> > > a look.
> > >> > >
> > >> > > Thanks,
> > >> > > Brian
> > >> > >
> > >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <bbyrne@confluent.io
> >
> > >> > wrote:
> > >> > >
> > >> > > > Dev team,
> > >> > > >
> > >> > > > Requesting discussion for improvement to the producer when
> dealing
> > >> > with a
> > >> > > > large number of topics.
> > >> > > >
> > >> > > > KIP:
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > >> > > >
> > >> > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-8904
> > >> > > >
> > >> > > > Thoughts and feedback would be appreciated.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Brian
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> > >
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by deng ziming <de...@gmail.com>.
I think it's ok, and you can add another issue about `asynchronous
metadata` if `topic expiry` is not enough.


On Thu, Nov 21, 2019 at 6:20 AM Brian Byrne <bb...@confluent.io> wrote:

> Hello all,
>
> I've refactored the KIP to remove implementing asynchronous metadata
> fetching in the producer during send(). It's now exclusively focused on
> reducing the topic metadata fetch payload and proposes adding a new
> configuration flag to control topic expiry behavior. Please take a look
> when possible.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
>
> Thanks,
> Brian
>
> On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <bb...@confluent.io> wrote:
>
> > Lucas, Guozhang,
> >
> > Thank you for the comments. Good point on METADATA_MAX_AGE_CONFIG - it
> > looks like the ProducerMetadata was differentiating between expiry and
> > refresh, but it should be unnecessary to do so once the cost of fetching
> a
> > single topic's metadata is reduced.
> >
> > I've updated the rejected alternatives and removed the config variables.
> >
> > Brian
> >
> > On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <wa...@gmail.com> wrote:
> >
> >> Hello Brian,
> >>
> >> Thanks for the KIP.
> >>
> >> I think using asynchronous metadata update to address 1) metadata update
> >> blocking send, but for other issues, currently at producer we do have a
> >> configurable `METADATA_MAX_AGE_CONFIG` similar to consumer, by default
> is
> >> 5min. So maybe we do not need to introduce new configs here, but only
> >> change the semantics of that config from global expiry (today we just
> >> enforce a full metadata update for the whole cluster) to single-topic
> >> expiry, and we can also extend its expiry deadline whenever that
> metadata
> >> is successfully used to send a produce request.
> >>
> >>
> >> Guozhang
> >>
> >>
> >>
> >> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <lu...@confluent.io>
> >> wrote:
> >>
> >> > Hi Brian,
> >> >
> >> > This looks great, and should help reduce blocking and high metadata
> >> request
> >> > volumes when the producer is sending to large numbers of topics,
> >> especially
> >> > at low volumes. I think the approach to make metadata fetching
> >> asynchronous
> >> > and batch metadata requests together will help significantly.
> >> >
> >> > The only other approach I can think of is to allow users to supply the
> >> > producer with the expected topics upfront, allowing the producer to
> >> perform
> >> > a single initial metadata request before any sends occur. I see no
> real
> >> > advantages to this approach compared to the async method you’ve
> >> proposed,
> >> > but maybe we could add it to the rejected alternatives section?
> >> >
> >> > Thanks,
> >> >
> >> > Lucas
> >> >
> >> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <bb...@confluent.io>
> wrote:
> >> >
> >> > > I've updated the 'Proposed Changes' to include two new producer
> >> > > configuration variables: topic.expiry.ms and topic.refresh.ms.
> Please
> >> > take
> >> > > a look.
> >> > >
> >> > > Thanks,
> >> > > Brian
> >> > >
> >> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <bb...@confluent.io>
> >> > wrote:
> >> > >
> >> > > > Dev team,
> >> > > >
> >> > > > Requesting discussion for improvement to the producer when dealing
> >> > with a
> >> > > > large number of topics.
> >> > > >
> >> > > > KIP:
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> >> > > >
> >> > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-8904
> >> > > >
> >> > > > Thoughts and feedback would be appreciated.
> >> > > >
> >> > > > Thanks,
> >> > > > Brian
> >> > > >
> >> > >
> >> >
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
>

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Brian Byrne <bb...@confluent.io>.
Hello all,

I've refactored the KIP to remove implementing asynchronous metadata
fetching in the producer during send(). It's now exclusively focused on
reducing the topic metadata fetch payload and proposes adding a new
configuration flag to control topic expiry behavior. Please take a look
when possible.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics

Thanks,
Brian

On Fri, Oct 4, 2019 at 10:04 AM Brian Byrne <bb...@confluent.io> wrote:

> Lucas, Guozhang,
>
> Thank you for the comments. Good point on METADATA_MAX_AGE_CONFIG - it
> looks like the ProducerMetadata was differentiating between expiry and
> refresh, but it should be unnecessary to do so once the cost of fetching a
> single topic's metadata is reduced.
>
> I've updated the rejected alternatives and removed the config variables.
>
> Brian
>
> On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <wa...@gmail.com> wrote:
>
>> Hello Brian,
>>
>> Thanks for the KIP.
>>
>> I think using asynchronous metadata update to address 1) metadata update
>> blocking send, but for other issues, currently at producer we do have a
>> configurable `METADATA_MAX_AGE_CONFIG` similar to consumer, by default is
>> 5min. So maybe we do not need to introduce new configs here, but only
>> change the semantics of that config from global expiry (today we just
>> enforce a full metadata update for the whole cluster) to single-topic
>> expiry, and we can also extend its expiry deadline whenever that metadata
>> is successfully used to send a produce request.
>>
>>
>> Guozhang
>>
>>
>>
>> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <lu...@confluent.io>
>> wrote:
>>
>> > Hi Brian,
>> >
>> > This looks great, and should help reduce blocking and high metadata
>> request
>> > volumes when the producer is sending to large numbers of topics,
>> especially
>> > at low volumes. I think the approach to make metadata fetching
>> asynchronous
>> > and batch metadata requests together will help significantly.
>> >
>> > The only other approach I can think of is to allow users to supply the
>> > producer with the expected topics upfront, allowing the producer to
>> perform
>> > a single initial metadata request before any sends occur. I see no real
>> > advantages to this approach compared to the async method you’ve
>> proposed,
>> > but maybe we could add it to the rejected alternatives section?
>> >
>> > Thanks,
>> >
>> > Lucas
>> >
>> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <bb...@confluent.io> wrote:
>> >
>> > > I've updated the 'Proposed Changes' to include two new producer
>> > > configuration variables: topic.expiry.ms and topic.refresh.ms. Please
>> > take
>> > > a look.
>> > >
>> > > Thanks,
>> > > Brian
>> > >
>> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <bb...@confluent.io>
>> > wrote:
>> > >
>> > > > Dev team,
>> > > >
>> > > > Requesting discussion for improvement to the producer when dealing
>> > with a
>> > > > large number of topics.
>> > > >
>> > > > KIP:
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
>> > > >
>> > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-8904
>> > > >
>> > > > Thoughts and feedback would be appreciated.
>> > > >
>> > > > Thanks,
>> > > > Brian
>> > > >
>> > >
>> >
>>
>>
>> --
>> -- Guozhang
>>
>

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Brian Byrne <bb...@confluent.io>.
Lucas, Guozhang,

Thank you for the comments. Good point on METADATA_MAX_AGE_CONFIG - it
looks like the ProducerMetadata was differentiating between expiry and
refresh, but it should be unnecessary to do so once the cost of fetching a
single topic's metadata is reduced.

I've updated the rejected alternatives and removed the config variables.

Brian

On Fri, Oct 4, 2019 at 9:20 AM Guozhang Wang <wa...@gmail.com> wrote:

> Hello Brian,
>
> Thanks for the KIP.
>
> I think using asynchronous metadata update to address 1) metadata update
> blocking send, but for other issues, currently at producer we do have a
> configurable `METADATA_MAX_AGE_CONFIG` similar to consumer, by default is
> 5min. So maybe we do not need to introduce new configs here, but only
> change the semantics of that config from global expiry (today we just
> enforce a full metadata update for the whole cluster) to single-topic
> expiry, and we can also extend its expiry deadline whenever that metadata
> is successfully used to send a produce request.
>
>
> Guozhang
>
>
>
> On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <lu...@confluent.io>
> wrote:
>
> > Hi Brian,
> >
> > This looks great, and should help reduce blocking and high metadata
> request
> > volumes when the producer is sending to large numbers of topics,
> especially
> > at low volumes. I think the approach to make metadata fetching
> asynchronous
> > and batch metadata requests together will help significantly.
> >
> > The only other approach I can think of is to allow users to supply the
> > producer with the expected topics upfront, allowing the producer to
> perform
> > a single initial metadata request before any sends occur. I see no real
> > advantages to this approach compared to the async method you’ve proposed,
> > but maybe we could add it to the rejected alternatives section?
> >
> > Thanks,
> >
> > Lucas
> >
> > On Fri, 20 Sep 2019 at 11:46, Brian Byrne <bb...@confluent.io> wrote:
> >
> > > I've updated the 'Proposed Changes' to include two new producer
> > > configuration variables: topic.expiry.ms and topic.refresh.ms. Please
> > take
> > > a look.
> > >
> > > Thanks,
> > > Brian
> > >
> > > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <bb...@confluent.io>
> > wrote:
> > >
> > > > Dev team,
> > > >
> > > > Requesting discussion for improvement to the producer when dealing
> > with a
> > > > large number of topics.
> > > >
> > > > KIP:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > >
> > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-8904
> > > >
> > > > Thoughts and feedback would be appreciated.
> > > >
> > > > Thanks,
> > > > Brian
> > > >
> > >
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Brian,

Thanks for the KIP.

I think using asynchronous metadata update to address 1) metadata update
blocking send, but for other issues, currently at producer we do have a
configurable `METADATA_MAX_AGE_CONFIG` similar to consumer, by default is
5min. So maybe we do not need to introduce new configs here, but only
change the semantics of that config from global expiry (today we just
enforce a full metadata update for the whole cluster) to single-topic
expiry, and we can also extend its expiry deadline whenever that metadata
is successfully used to send a produce request.


Guozhang



On Thu, Oct 3, 2019 at 6:51 PM Lucas Bradstreet <lu...@confluent.io> wrote:

> Hi Brian,
>
> This looks great, and should help reduce blocking and high metadata request
> volumes when the producer is sending to large numbers of topics, especially
> at low volumes. I think the approach to make metadata fetching asynchronous
> and batch metadata requests together will help significantly.
>
> The only other approach I can think of is to allow users to supply the
> producer with the expected topics upfront, allowing the producer to perform
> a single initial metadata request before any sends occur. I see no real
> advantages to this approach compared to the async method you’ve proposed,
> but maybe we could add it to the rejected alternatives section?
>
> Thanks,
>
> Lucas
>
> On Fri, 20 Sep 2019 at 11:46, Brian Byrne <bb...@confluent.io> wrote:
>
> > I've updated the 'Proposed Changes' to include two new producer
> > configuration variables: topic.expiry.ms and topic.refresh.ms. Please
> take
> > a look.
> >
> > Thanks,
> > Brian
> >
> > On Tue, Sep 17, 2019 at 12:59 PM Brian Byrne <bb...@confluent.io>
> wrote:
> >
> > > Dev team,
> > >
> > > Requesting discussion for improvement to the producer when dealing
> with a
> > > large number of topics.
> > >
> > > KIP:
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > >
> > > JIRA: https://issues.apache.org/jira/browse/KAFKA-8904
> > >
> > > Thoughts and feedback would be appreciated.
> > >
> > > Thanks,
> > > Brian
> > >
> >
>


-- 
-- Guozhang