You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Steven Wu <st...@gmail.com> on 2015/01/05 19:47:09 UTC

Re: Kafka 0.8.2 new producer blocking on metadata

" preinitialize.metadata=true/false" can help to certain extent. if the
kafka cluster is down, then metadata won't be available for a long time
(not just the first msg). so to be safe, we have to set "
metadata.fetch.timeout.ms=1" to fail fast as Paul mentioned. I can also
echo Jay's comment that on-demand fetch of metadata might be more
efficient, since cluster may have many topics that a particular producer
may not care.

so I plan to do sth similar to what Paul described.
- metadata.fetch.timeout.ms=1
- enqueue msg to a pending queue when topic metadata not available.
- have a background thread check when metadata become available and drain
the pending queue
- optionally, prime topic metadata asynchronously during init (if
configured)

Just wondering whether above should be the default behavior  of best-effort
non-blocking delivery in kafka clients. then we don't have to reinvent the
wheels.

Thanks,
Steven



On Mon, Dec 29, 2014 at 11:48 AM, Jay Kreps <ja...@gmail.com> wrote:

> I don't think a separate queue will be a very simple solution to implement.
>
> Could you describe your use case a little bit more. It does seem to me that
> as long as the metadata fetch happens only once and the blocking has a
> tight time bound this should be okay in any use case I can imagine. And, of
> course, by default the client blocks anyway whenever you exhaust the memory
> buffer space. But it sounds like you feel it isn't. Maybe you could
> describe the scenario a bit?
>
> I think one thing we could do is what was discussed in another thread,
> namely add an option like
>   preinitialize.metadata=true/false
> which would default to false. When true this would cause the producer to
> just initialize metadata for all topics when it is created. Note that this
> then brings back the opposite problem--doing remote communication during
> initialization which tends to bite a lot of people. But since this would be
> an option that would default to false perhaps it would be less likely to
> come as a surprise.
>
> -Jay
>
> On Mon, Dec 29, 2014 at 8:38 AM, Steven Wu <st...@gmail.com> wrote:
>
> > +1. it should be truly async in all cases.
> >
> > I understand some challenges that Jay listed in the other thread. But we
> > need a solution nonetheless. e.g. can we maintain a separate
> > list/queue/buffer for pending messages without metadata.
> >
> > On Tue, Dec 23, 2014 at 12:57 PM, John Boardman <boardmanjohnw@gmail.com
> >
> > wrote:
> >
> > > I was just fighting this same situation. I never expected the new
> > producer
> > > send() method to block as it returns a Future and accepts a Callback.
> > > However, when I tried my unit test, just replacing the old producer
> with
> > > the new, I immediately started getting timeouts waiting for metadata. I
> > > struggled with this until I went into the source code and found the
> > wait()
> > > that waits for the metadata.
> > >
> > > At that point I realized that this new "async" producer would have to
> be
> > > executed on its own thread, unlike the old producer, which complicates
> my
> > > code unnecessarily. I totally agree with Paul that the contract of
> send()
> > > is being completely violated with internal code that can block.
> > >
> > > I did try fetching the metadata first, but that only worked for a few
> > calls
> > > before the producer decided it was time to update the metadata again.
> > >
> > > Again, I agree with Paul that this API should be fixed so that it is
> > truly
> > > asynchronous in all cases. Otherwise, it cannot be used on the main
> > thread
> > > of an application as it will block and fail.
> > >
> >
>

Re: Kafka 0.8.2 new producer blocking on metadata

Posted by Paul Pearcy <pp...@gmail.com>.
Hi Steven,
  Speaking only for myself, I agree with you. I think these settings/tweaks
are the easiest short term way to get some proper non-blocking behavior.
Long term, it seems like having a secondary queue in the client to hold raw
messages until meta is available and then start blocking or dropping
messages once too many are queued.

For those interested, I submitted a patch to add the following options:
pre.initialize.topics
pre.initialize.timeout.ms

And then a new public method isInitialized() that the caller can check and
make a decision to blow up or accept the failure and continue. If
initialized is false, any sends will fast fail until the initialization
completes.

Patch is attached here:
https://issues.apache.org/jira/browse/KAFKA-1835

Not familiar with Kafka's processes, so any feedback welcome.

Thanks,
Paul

On Mon, Jan 5, 2015 at 1:47 PM, Steven Wu <st...@gmail.com> wrote:

> " preinitialize.metadata=true/false" can help to certain extent. if the
> kafka cluster is down, then metadata won't be available for a long time
> (not just the first msg). so to be safe, we have to set "
> metadata.fetch.timeout.ms=1" to fail fast as Paul mentioned. I can also
> echo Jay's comment that on-demand fetch of metadata might be more
> efficient, since cluster may have many topics that a particular producer
> may not care.
>
> so I plan to do sth similar to what Paul described.
> - metadata.fetch.timeout.ms=1
> - enqueue msg to a pending queue when topic metadata not available.
> - have a background thread check when metadata become available and drain
> the pending queue
> - optionally, prime topic metadata asynchronously during init (if
> configured)
>
> Just wondering whether above should be the default behavior  of best-effort
> non-blocking delivery in kafka clients. then we don't have to reinvent the
> wheels.
>
> Thanks,
> Steven
>
>
>
> On Mon, Dec 29, 2014 at 11:48 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > I don't think a separate queue will be a very simple solution to
> implement.
> >
> > Could you describe your use case a little bit more. It does seem to me
> that
> > as long as the metadata fetch happens only once and the blocking has a
> > tight time bound this should be okay in any use case I can imagine. And,
> of
> > course, by default the client blocks anyway whenever you exhaust the
> memory
> > buffer space. But it sounds like you feel it isn't. Maybe you could
> > describe the scenario a bit?
> >
> > I think one thing we could do is what was discussed in another thread,
> > namely add an option like
> >   preinitialize.metadata=true/false
> > which would default to false. When true this would cause the producer to
> > just initialize metadata for all topics when it is created. Note that
> this
> > then brings back the opposite problem--doing remote communication during
> > initialization which tends to bite a lot of people. But since this would
> be
> > an option that would default to false perhaps it would be less likely to
> > come as a surprise.
> >
> > -Jay
> >
> > On Mon, Dec 29, 2014 at 8:38 AM, Steven Wu <st...@gmail.com> wrote:
> >
> > > +1. it should be truly async in all cases.
> > >
> > > I understand some challenges that Jay listed in the other thread. But
> we
> > > need a solution nonetheless. e.g. can we maintain a separate
> > > list/queue/buffer for pending messages without metadata.
> > >
> > > On Tue, Dec 23, 2014 at 12:57 PM, John Boardman <
> boardmanjohnw@gmail.com
> > >
> > > wrote:
> > >
> > > > I was just fighting this same situation. I never expected the new
> > > producer
> > > > send() method to block as it returns a Future and accepts a Callback.
> > > > However, when I tried my unit test, just replacing the old producer
> > with
> > > > the new, I immediately started getting timeouts waiting for
> metadata. I
> > > > struggled with this until I went into the source code and found the
> > > wait()
> > > > that waits for the metadata.
> > > >
> > > > At that point I realized that this new "async" producer would have to
> > be
> > > > executed on its own thread, unlike the old producer, which
> complicates
> > my
> > > > code unnecessarily. I totally agree with Paul that the contract of
> > send()
> > > > is being completely violated with internal code that can block.
> > > >
> > > > I did try fetching the metadata first, but that only worked for a few
> > > calls
> > > > before the producer decided it was time to update the metadata again.
> > > >
> > > > Again, I agree with Paul that this API should be fixed so that it is
> > > truly
> > > > asynchronous in all cases. Otherwise, it cannot be used on the main
> > > thread
> > > > of an application as it will block and fail.
> > > >
> > >
> >
>