You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Jiangjie Qin <jq...@linkedin.com.INVALID> on 2015/03/07 21:25:53 UTC

[DISCUSSION] KIP-15 close(timeout) for producer

Hi,

I just created a KIP for adding a close(timeout) to new producer. Most of the previous discussions are in KAFKA-1660 where Parth Brahmbhatt has already done a lot of work.
Since this is an interface change so we are going through the KIP process.
Here is the KIP link:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53739782

Thanks.

Jiangjie (Becket) Qin

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Hi Jay,

I have updated the KIP as you suggested.

-Jiangjie (Becket) Qin

On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:

>Hey Jiangjie,
>
>Can you capture the full motivation and use cases for the feature? This
>mentions your interest in having a way of aborting from inside the
>Callback. But it doesn't really explain that usage or why other people
>would want to do that. It also doesn't list the primary use case for
>having
>close with a bounded timeout which was to avoid blocking too long on
>shutdown.
>
>-Jay
>
>
>
>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
>wrote:
>
>> Hi,
>>
>> I just created a KIP for adding a close(timeout) to new producer. Most
>>of
>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt has
>> already done a lot of work.
>> Since this is an interface change so we are going through the KIP
>>process.
>> Here is the KIP link:
>> 
>>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53739782
>>
>> Thanks.
>>
>> Jiangjie (Becket) Qin
>>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

OK, we are going to follow (1) then. It looks the KIP does not need
further updates.
Should we resume the voting process?

On 3/19/15, 3:28 PM, "Jay Kreps" <ja...@gmail.com> wrote:

>Yeah guys, I think this is one where the cure is worse than the disease.
>Let's just have the two close methods, that is confusing enough. Adding a
>third won't help. I understand that having a few microseconds before the
>thread shuts down could be unexpected but I think there is nothing
>inherently wrong with that.
>
>-Jay
>
>On Thu, Mar 19, 2015 at 3:09 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
>wrote:
>
>> Hi Neha,
>>
>> As Joel said, (1) and (2) provides almost exact same function as far as
>>I
>> can see. I actually have no strong preference.
>> The only difference is that (2) provides the close() semantic meaning of
>> making sure all the resources have gone away, at the cost of adding
>> another abort() interface.
>> The simplicity of (1) is also attractive but at a slight cost of
>>clarity.
>> I am slightly leaning towards (2) but would also be OK if we pursue (1).
>>
>> Thanks.
>>
>> Jiangjie (Becket) Qin
>>
>>
>> On 3/19/15, 2:46 PM, "Joel Koshy" <jj...@gmail.com> wrote:
>>
>> >(1) should work, but as Jun suggested earlier in the thread it is
>> >slightly misleading. The (intuitive) post-condition of "close" is that
>> >the producer has shutdown - i.e., its sender thread, closed its
>> >metrics, serializer/deserializer, etc. That is not necessarily a
>> >post-condition of "close(0)" although one can contend that if you call
>> >the method in non-blocking mode (zero timeout) then it is reasonable
>> >to not expect that post-condition.
>> >
>> >So I think that although (2) adds one more API it brings "simplicity
>> >by virtue of overall clarity".
>> >
>> >I would be in favor of (2) but not strongly opposed to (1).
>> >
>> >Thanks,
>> >
>> >Joel
>> >
>> >On Thu, Mar 19, 2015 at 10:05:04AM -0700, Neha Narkhede wrote:
>> >> I'm in favor of (1) for the sake of simplicity and as Jay mentions to
>> >> reduce the number of different APIs. Can you explain when (1) does
>>not
>> >>work?
>> >>
>> >> On Wed, Mar 18, 2015 at 2:52 PM, Jay Kreps <ja...@gmail.com>
>>wrote:
>> >>
>> >> > Personally I'm in favor of (1) just to reduce the number of
>>different
>> >>APIs.
>> >> > People will find the difference between abort and close subtle and
>> >> > confusing and the only instance where you want it is this somewhat
>> >>unusual
>> >> > case you guys are pursuing, right?
>> >> >
>> >> > -Jay
>> >> >
>> >> > On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin
>> >><jq...@linkedin.com.invalid>
>> >> > wrote:
>> >> >
>> >> > > It looks we have another option and are now deciding between the
>> >> > following
>> >> > > two interfaces:
>> >> > >
>> >> > > 1. Close() + close(timeout)
>> >> > >   - timeout could be either positive or zero.
>> >> > >   - only close(0) can be called from sender thread
>> >> > >
>> >> > > 2. Close() + abort() + close(timeout)
>> >> > >   - timeout can either be positive or zero
>> >> > >   - only abort() can be called from sender thread
>> >> > >
>> >> > >   - abort() is equivalent to close(0) in 1) but does not join
>>sender
>> >> > > thread and does not close metrics.
>> >> > >   - Another thread has to call close() or close(timeout) in order
>> >>to make
>> >> > > sure the resources in producer are gone.
>> >> > >
>> >> > > The tow approach provides the same function we need, the
>>difference
>> >>is
>> >> > > approach 2) follows convention of close() and abort(). On the
>>other
>> >>hand,
>> >> > > approach 1) saves one interface compared with approach 2) but
>>does
>> >>not
>> >> > > follow the convention.
>> >> > >
>> >> > > When the two approaches come to user code, it is probably
>>something
>> >>like
>> >> > > this:
>> >> > >
>> >> > > Try {
>> >> > >   While(!finished)
>> >> > >     Producer.send(record, callback)
>> >> > > } catch (Exception e) {
>> >> > >   Producer.close(5)
>> >> > > }
>> >> > >
>> >> > > Class CallbackImpl implements Callback {
>> >> > >   onCompletion(RecordMetadata metadata Exception e) {
>> >> > >     If (e != null)
>> >> > >       Abort() / close()
>> >> > >   }
>> >> > > }
>> >> > >
>> >> > > Because the two approach leads to almost the same user code,
>> >>assuming
>> >> > > users are always calling producer.close() as a clean up step,
>> >>personally
>> >> > I
>> >> > > prefer approach 2) as it follows convention.
>> >> > >
>> >> > > Any thoughts?
>> >> > >
>> >> > > Jiangjie (Becket) Qin
>> >> > >
>> >> > >
>> >> > > On 3/17/15, 10:25 AM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>> >> > >
>> >> > > >Hi Jun,
>> >> > > >
>> >> > > >Yes, as Guozhang said, the main reason we set a flag is because
>> >>close(0)
>> >> > > >is expected to be called by sender thread itself.
>> >> > > >If we want to maintain the semantic meaning of close(), one
>> >>alternative
>> >> > is
>> >> > > >to have an abort() method does the same thing as close(0) except
>> >> > cleanup.
>> >> > > >And in close(timeout), after timeout we call abort() and join
>>the
>> >>sender
>> >> > > >thread. This was one of the previous proposal. We merged abort
>>to
>> >> > close(0)
>> >> > > >because they are almost doing the same thing. But from what you
>> >> > mentioned,
>> >> > > >it might make sense to have two separate methods.
>> >> > > >
>> >> > > >Thanks.
>> >> > > >
>> >> > > >Jiangjie (Becket) Qin
>> >> > > >
>> >> > > >On 3/16/15, 10:31 PM, "Guozhang Wang" <wa...@gmail.com>
>>wrote:
>> >> > > >
>> >> > > >>Yeah in this sense the sender thread will not exist immediately
>> >>in the
>> >> > > >>close(0) call, but will only terminate after the current
>>response
>> >>batch
>> >> > > >>has
>> >> > > >>been processed, as will the producer instance itself.
>> >> > > >>
>> >> > > >>There is a reason for this though: for a clean shutdown the
>>caller
>> >> > thread
>> >> > > >>has to wait for the sender thread to join before closing the
>> >>producer
>> >> > > >>instance, but this cannot be achieve if close(0) is called by
>>the
>> >> > sender
>> >> > > >>thread itself (for example in KAFKA-1659, there is a proposal
>>from
>> >> > Andrew
>> >> > > >>Stein on using thread.interrupt and thread.stop, but if it is
>> >>called by
>> >> > > >>the
>> >> > > >>ioThread itself the stop call will fail). Hence we came up with
>> >>the
>> >> > flag
>> >> > > >>approach to let the sender thread to close as soon as it is at
>>the
>> >> > > >>barrier
>> >> > > >>of the run loop.
>> >> > > >>
>> >> > > >>Guozhang
>> >> > > >>
>> >> > > >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <ju...@confluent.io>
>> wrote:
>> >> > > >>
>> >> > > >>> Hmm, does that mean that after close(0), the sender thread is
>> >>not
>> >> > > >>>necessary
>> >> > > >>> gone? Normally, after closing an entity, we expect all
>>internal
>> >> > threads
>> >> > > >>> associated with the entity are shut down completely.
>> >> > > >>>
>> >> > > >>> Thanks,
>> >> > > >>>
>> >> > > >>> Jun
>> >> > > >>>
>> >> > > >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
>> >> > > >>><jq...@linkedin.com.invalid>
>> >> > > >>> wrote:
>> >> > > >>>
>> >> > > >>> > Hi Jun,
>> >> > > >>> >
>> >> > > >>> > Close(0) will set two flags in sender. Running=false and a
>> >>newly
>> >> > > >>>added
>> >> > > >>> > forceClose=true. It will also set accumulator.closed=true
>>so
>> >>no
>> >> > > >>>further
>> >> > > >>> > producer.send() will succeed.
>> >> > > >>> > The sender thread will finish executing all the callbacks
>>in
>> >> > current
>> >> > > >>> batch
>> >> > > >>> > of responses, then it will see the forceClose flag. It will
>> >>just
>> >> > fail
>> >> > > >>>all
>> >> > > >>> > the incomplete batches in the producer and exit.
>> >> > > >>> > So close(0) is a non-blocking call and sender thread will
>>not
>> >>try
>> >> > to
>> >> > > >>>join
>> >> > > >>> > itself in close(0).
>> >> > > >>> >
>> >> > > >>> > Thanks.
>> >> > > >>> >
>> >> > > >>> > Jiangjie (Becket) Qin
>> >> > > >>> >
>> >> > > >>> > On 3/16/15, 2:50 PM, "Jun Rao" <ju...@confluent.io> wrote:
>> >> > > >>> >
>> >> > > >>> > >How does close(0) work if it's called from the sender
>> >>thread? If
>> >> > > >>> close(0)
>> >> > > >>> > >needs to wait for the sender thread to join, wouldn't this
>> >>cause a
>> >> > > >>> > >deadlock?
>> >> > > >>> > >
>> >> > > >>> > >Thanks,
>> >> > > >>> > >
>> >> > > >>> > >Jun
>> >> > > >>> > >
>> >> > > >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
>> >> > > >>><jqin@linkedin.com.invalid
>> >> > > >>> >
>> >> > > >>> > >wrote:
>> >> > > >>> > >
>> >> > > >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered
>> >>without
>> >> > > >>> > >> discussing with you :)
>> >> > > >>> > >>
>> >> > > >>> > >> Jiangjie (Becket) Qin
>> >> > > >>> > >>
>> >> > > >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang"
>><wa...@gmail.com>
>> >> > wrote:
>> >> > > >>> > >>
>> >> > > >>> > >> >Thanks Jiangjie,
>> >> > > >>> > >> >
>> >> > > >>> > >> >After talking to you offline on this, I have been
>> >>convinced and
>> >> > > >>> > >>changed my
>> >> > > >>> > >> >preference to blocking. The immediate shutdown approach
>> >>does
>> >> > have
>> >> > > >>> some
>> >> > > >>> > >> >unsafeness in some cases.
>> >> > > >>> > >> >
>> >> > > >>> > >> >Guozhang
>> >> > > >>> > >> >
>> >> > > >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
>> >> > > >>> > >><jqin@linkedin.com.invalid
>> >> > > >>> > >> >
>> >> > > >>> > >> >wrote:
>> >> > > >>> > >> >
>> >> > > >>> > >> >> It looks that the problem we want to solve and the
>> >>purpose we
>> >> > > >>>want
>> >> > > >>> to
>> >> > > >>> > >> >> achieve is:
>> >> > > >>> > >> >> If user uses close() in callback, we want to let
>>user be
>> >> > aware
>> >> > > >>>that
>> >> > > >>> > >>they
>> >> > > >>> > >> >> should use close(0) instead of close() in the
>>callback.
>> >> > > >>> > >> >>
>> >> > > >>> > >> >> We have agreed that we will have an error log to
>>inform
>> >>user
>> >> > > >>>about
>> >> > > >>> > >>this
>> >> > > >>> > >> >> mis-usage. The options differ in the way how we can
>> >>force
>> >> > user
>> >> > > >>>to
>> >> > > >>> > >>take a
>> >> > > >>> > >> >> look at that error log.
>> >> > > >>> > >> >> There are two scenarios:
>> >> > > >>> > >> >> 1. User does not expect the program to exit.
>> >> > > >>> > >> >> 2. User expect the program to exit.
>> >> > > >>> > >> >>
>> >> > > >>> > >> >> For scenario 1), blocking will probably delay the
>> >>discovery
>> >> > of
>> >> > > >>>the
>> >> > > >>> > >> >> problem. Calling close(0) exposes the problem
>>quicker.
>> >>In
>> >> > this
>> >> > > >>> > >>scenario
>> >> > > >>> > >> >> producer just encounter a send failure when running
>> >>normally.
>> >> > > >>> > >> >> For scenario 2), blocking will expose the problem
>>quick.
>> >> > > >>>Calling
>> >> > > >>> > >> >>close(-1)
>> >> > > >>> > >> >> might hide the problem. This scenario might include:
>>a)
>> >>Unit
>> >> > > >>>test
>> >> > > >>> > >>for a
>> >> > > >>> > >> >> send failure. b) Message sending during a close()
>>call
>> >>from a
>> >> > > >>>user
>> >> > > >>> > >> >>thread.
>> >> > > >>> > >> >>
>> >> > > >>> > >> >> So as a summary table:
>> >> > > >>> > >> >>
>> >> > > >>> > >> >>                   Scenario 1)
>> >> >  Scenario
>> >> > > >>>2)
>> >> > > >>> > >> >>
>> >> > > >>> > >> >> Blocking      Delay problem discovery
>>Guaranteed
>> >> > > >>>problem
>> >> > > >>> > >> >>discovery
>> >> > > >>> > >> >>
>> >> > > >>> > >> >> Close(-1)     Immediate problem discovery     Problem
>> >>might
>> >> > be
>> >> > > >>> hidden
>> >> > > >>> > >> >>
>> >> > > >>> > >> >>
>> >> > > >>> > >> >> Personally I prefer blocking because it seems
>>providing
>> >>more
>> >> > > >>> > >>guarantees
>> >> > > >>> > >> >> and safer.
>> >> > > >>> > >> >>
>> >> > > >>> > >> >> Thanks.
>> >> > > >>> > >> >>
>> >> > > >>> > >> >> Jiangjie (Becket) Qin
>> >> > > >>> > >> >>
>> >> > > >>> > >> >>
>> >> > > >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang"
>> >><wa...@gmail.com>
>> >> > > >>>wrote:
>> >> > > >>> > >> >>
>> >> > > >>> > >> >> >HI Jiangjie,
>> >> > > >>> > >> >> >
>> >> > > >>> > >> >> >As far as I understand calling close() in the
>>ioThread
>> >>is
>> >> > not
>> >> > > >>> > >>common,
>> >> > > >>> > >> >>as
>> >> > > >>> > >> >> >it
>> >> > > >>> > >> >> >may only trigger when we saw some non-retriable
>>error.
>> >>Hence
>> >> > > >>>when
>> >> > > >>> > >>user
>> >> > > >>> > >> >>run
>> >> > > >>> > >> >> >their program it is unlikely that close() will be
>> >>triggered
>> >> > > >>>and
>> >> > > >>> > >>problem
>> >> > > >>> > >> >> >will be detected. So it seems to me that from the
>>error
>> >> > > >>>detection
>> >> > > >>> > >> >>aspect
>> >> > > >>> > >> >> >these two options seems to be the same as people
>>will
>> >> > usually
>> >> > > >>> > >>detect it
>> >> > > >>> > >> >> >from the producer metrics all dropping to 0.
>> >> > > >>> > >> >> >
>> >> > > >>> > >> >> >Guozhang
>> >> > > >>> > >> >> >
>> >> > > >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
>> >> > > >>> > >> >><jq...@linkedin.com.invalid>
>> >> > > >>> > >> >> >wrote:
>> >> > > >>> > >> >> >
>> >> > > >>> > >> >> >> It seems there are two options we can choose from
>> >>when
>> >> > > >>>close()
>> >> > > >>> is
>> >> > > >>> > >> >>called
>> >> > > >>> > >> >> >> from sender thread (callback):
>> >> > > >>> > >> >> >> 1. Log an error and close the producer using
>> >>close(-1)
>> >> > > >>> > >> >> >> 2. Log an error and block.
>> >> > > >>> > >> >> >> (Throwing an exception will not work because we
>> >>catch all
>> >> > > >>>the
>> >> > > >>> > >> >>exception
>> >> > > >>> > >> >> >> thrown from user callback. It will just lead to an
>> >>error
>> >> > > >>>log.)
>> >> > > >>> > >> >> >>
>> >> > > >>> > >> >> >> My concern for the first option is that the
>>producer
>> >>will
>> >> > be
>> >> > > >>> > >>closed
>> >> > > >>> > >> >>even
>> >> > > >>> > >> >> >> if we logged and error. I am wondering if some
>>user
>> >>would
>> >> > > >>>not
>> >> > > >>> even
>> >> > > >>> > >> >>take
>> >> > > >>> > >> >> >>a
>> >> > > >>> > >> >> >> look at the log if producer is closed normally.
>> >>Because
>> >> > from
>> >> > > >>>the
>> >> > > >>> > >> >> >>programs
>> >> > > >>> > >> >> >> behavior, everything looks good. If that is the
>> >>case, the
>> >> > > >>>error
>> >> > > >>> > >> >>message
>> >> > > >>> > >> >> >>we
>> >> > > >>> > >> >> >> logged probably will just be ignored until some
>>day
>> >>when
>> >> > > >>>people
>> >> > > >>> > >>check
>> >> > > >>> > >> >> >>the
>> >> > > >>> > >> >> >> log and see it.
>> >> > > >>> > >> >> >>
>> >> > > >>> > >> >> >> As for the second option, because producer does
>>not
>> >>close
>> >> > > >>>but
>> >> > > >>> > >>blocks.
>> >> > > >>> > >> >> >>User
>> >> > > >>> > >> >> >> will notice this the first time they run the
>> >>program. They
>> >> > > >>> > >>probably
>> >> > > >>> > >> >>will
>> >> > > >>> > >> >> >> look at the log to see why producer could not be
>> >>closed
>> >> > and
>> >> > > >>>they
>> >> > > >>> > >>will
>> >> > > >>> > >> >> >>see
>> >> > > >>> > >> >> >> the error log we put there. So they will get
>>informed
>> >> > about
>> >> > > >>>this
>> >> > > >>> > >> >> >>mis-usage
>> >> > > >>> > >> >> >> of close() in sender thread the first time they
>>run
>> >>the
>> >> > code
>> >> > > >>> > >>instead
>> >> > > >>> > >> >>of
>> >> > > >>> > >> >> >> some time later.
>> >> > > >>> > >> >> >>
>> >> > > >>> > >> >> >> Personally I prefer the second one because it is
>>more
>> >> > > >>>obvious
>> >> > > >>> that
>> >> > > >>> > >> >> >> something was wrong.
>> >> > > >>> > >> >> >>
>> >> > > >>> > >> >> >> Jiangjie (Becket) Qin
>> >> > > >>> > >> >> >>
>> >> > > >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang"
>> >><wa...@gmail.com>
>> >> > > >>> wrote:
>> >> > > >>> > >> >> >>
>> >> > > >>> > >> >> >> >Yeah I agree we should not silently change the
>> >>behavior
>> >> > of
>> >> > > >>>the
>> >> > > >>> > >> >>function
>> >> > > >>> > >> >> >> >with the given parameters; and I would prefer
>> >> > > >>> > >> >> >>error-logging-and-shutdown
>> >> > > >>> > >> >> >> >over blocking when close(>0) is used, since as
>>Neha
>> >> > > >>>suggested
>> >> > > >>> > >> >>blocking
>> >> > > >>> > >> >> >> >would also not proceed with sending any data, bu
>> >>will
>> >> > just
>> >> > > >>>let
>> >> > > >>> > >> >>users to
>> >> > > >>> > >> >> >> >realize the issue later than sooner.
>> >> > > >>> > >> >> >> >
>> >> > > >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
>> >> > > >>> > >><ne...@confluent.io>
>> >> > > >>> > >> >> >>wrote:
>> >> > > >>> > >> >> >> >
>> >> > > >>> > >> >> >> >> >
>> >> > > >>> > >> >> >> >> > And I also agree it is better if we can make
>> >>producer
>> >> > > >>>block
>> >> > > >>> > >>when
>> >> > > >>> > >> >> >> >> > close() is called from sender thread so user
>> >>will
>> >> > > >>>notice
>> >> > > >>> > >> >>something
>> >> > > >>> > >> >> >> >>went
>> >> > > >>> > >> >> >> >> > wrong.
>> >> > > >>> > >> >> >> >>
>> >> > > >>> > >> >> >> >>
>> >> > > >>> > >> >> >> >> This isn't a great experience either. Why can't
>> >>we just
>> >> > > >>>throw
>> >> > > >>> > >>an
>> >> > > >>> > >> >> >> >>exception
>> >> > > >>> > >> >> >> >> for a behavior we know is incorrect and we'd
>>like
>> >>the
>> >> > > >>>user to
>> >> > > >>> > >> >>know.
>> >> > > >>> > >> >> >> >> Blocking as a means of doing that seems wrong
>>and
>> >> > > >>>annoying.
>> >> > > >>> > >> >> >> >>
>> >> > > >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
>> >> > > >>> > >><ja...@gmail.com>
>> >> > > >>> > >> >> >> wrote:
>> >> > > >>> > >> >> >> >>
>> >> > > >>> > >> >> >> >> > Cool.
>> >> > > >>> > >> >> >> >> >
>> >> > > >>> > >> >> >> >> > I think blocking is good or alternately
>> >>throwing an
>> >> > > >>> exception
>> >> > > >>> > >> >> >>directly
>> >> > > >>> > >> >> >> >> from
>> >> > > >>> > >> >> >> >> > close(). Basically I would just worry about
>> >>subtly
>> >> > > >>>doing
>> >> > > >>> > >> >>something
>> >> > > >>> > >> >> >> >> slightly
>> >> > > >>> > >> >> >> >> > different from what the user asked for as it
>> >>will be
>> >> > > >>>hard
>> >> > > >>> to
>> >> > > >>> > >> >>notice
>> >> > > >>> > >> >> >> >>that
>> >> > > >>> > >> >> >> >> > behavior difference.
>> >> > > >>> > >> >> >> >> >
>> >> > > >>> > >> >> >> >> > -Jay
>> >> > > >>> > >> >> >> >> >
>> >> > > >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
>> >> > > >>> > >> >> >> >><jqin@linkedin.com.invalid
>> >> > > >>> > >> >> >> >> >
>> >> > > >>> > >> >> >> >> > wrote:
>> >> > > >>> > >> >> >> >> >
>> >> > > >>> > >> >> >> >> > > Hi Jay,
>> >> > > >>> > >> >> >> >> > >
>> >> > > >>> > >> >> >> >> > > I have modified the KIP as you suggested. I
>> >>thinks
>> >> > as
>> >> > > >>> long
>> >> > > >>> > >>as
>> >> > > >>> > >> >>we
>> >> > > >>> > >> >> >> >>have
>> >> > > >>> > >> >> >> >> > > consistent define for timeout across Kafka
>> >> > interface,
>> >> > > >>> there
>> >> > > >>> > >> >>would
>> >> > > >>> > >> >> >> >>be no
>> >> > > >>> > >> >> >> >> > > problem. And I also agree it is better if
>>we
>> >>can
>> >> > make
>> >> > > >>> > >>producer
>> >> > > >>> > >> >> >>block
>> >> > > >>> > >> >> >> >> when
>> >> > > >>> > >> >> >> >> > > close() is called from sender thread so
>>user
>> >>will
>> >> > > >>>notice
>> >> > > >>> > >> >> >>something
>> >> > > >>> > >> >> >> >>went
>> >> > > >>> > >> >> >> >> > > wrong.
>> >> > > >>> > >> >> >> >> > >
>> >> > > >>> > >> >> >> >> > > Thanks.
>> >> > > >>> > >> >> >> >> > >
>> >> > > >>> > >> >> >> >> > > Jiangjie (Becket) Qin
>> >> > > >>> > >> >> >> >> > >
>> >> > > >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
>> >> > > >>><ja...@gmail.com>
>> >> > > >>> > >> wrote:
>> >> > > >>> > >> >> >> >> > >
>> >> > > >>> > >> >> >> >> > > >Hey Jiangjie,
>> >> > > >>> > >> >> >> >> > > >
>> >> > > >>> > >> >> >> >> > > >I think this is going to be very confusing
>> >>that
>> >> > > >>> > >> >> >> >> > > >  close(0) waits indefinitely and
>> >> > > >>> > >> >> >> >> > > >  close(-1) waits for 0.
>> >> > > >>> > >> >> >> >> > > >I understand this appears in other apis,
>>but
>> >>it
>> >> > is a
>> >> > > >>> > >>constant
>> >> > > >>> > >> >> >> >>cause of
>> >> > > >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
>> >> > > >>> > >> >> >> >> > > >
>> >> > > >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't
>> >>need a
>> >> > way
>> >> > > >>>to
>> >> > > >>> > >>wait
>> >> > > >>> > >> >> >> >> indefinitely
>> >> > > >>> > >> >> >> >> > > >as
>> >> > > >>> > >> >> >> >> > > >we already have close() so having a
>>magical
>> >> > constant
>> >> > > >>>for
>> >> > > >>> > >> >>that is
>> >> > > >>> > >> >> >> >> > > >redundant.
>> >> > > >>> > >> >> >> >> > > >
>> >> > > >>> > >> >> >> >> > > >Calling close() from the I/O thread was
>> >>already
>> >> > > >>>possible
>> >> > > >>> > >>and
>> >> > > >>> > >> >> >>would
>> >> > > >>> > >> >> >> >> block
>> >> > > >>> > >> >> >> >> > > >indefinitely. I think trying to silently
>> >>change
>> >> > the
>> >> > > >>> > >>behavior
>> >> > > >>> > >> >>is
>> >> > > >>> > >> >> >> >> probably
>> >> > > >>> > >> >> >> >> > > >not right. I.e. if the user calls close()
>>in
>> >>the
>> >> > > >>> callback
>> >> > > >>> > >> >>there
>> >> > > >>> > >> >> >>is
>> >> > > >>> > >> >> >> >> > > >actually
>> >> > > >>> > >> >> >> >> > > >some misunderstanding and they need to
>>think
>> >>more,
>> >> > > >>> > >>silently
>> >> > > >>> > >> >> >>making
>> >> > > >>> > >> >> >> >> this
>> >> > > >>> > >> >> >> >> > > >not
>> >> > > >>> > >> >> >> >> > > >block will hide the problem from them
>>which
>> >>is the
>> >> > > >>> > >>opposite
>> >> > > >>> > >> >>of
>> >> > > >>> > >> >> >> >>what we
>> >> > > >>> > >> >> >> >> > > >want.
>> >> > > >>> > >> >> >> >> > > >
>> >> > > >>> > >> >> >> >> > > >-Jay
>> >> > > >>> > >> >> >> >> > > >
>> >> > > >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie
>>Qin
>> >> > > >>> > >> >> >> >> <jqin@linkedin.com.invalid
>> >> > > >>> > >> >> >> >> > >
>> >> > > >>> > >> >> >> >> > > >wrote:
>> >> > > >>> > >> >> >> >> > > >
>> >> > > >>> > >> >> >> >> > > >> Hey Joe & Jay,
>> >> > > >>> > >> >> >> >> > > >>
>> >> > > >>> > >> >> >> >> > > >> Thanks for the comments on the voting
>> >>thread.
>> >> > > >>>Since it
>> >> > > >>> > >> >>seems
>> >> > > >>> > >> >> >>we
>> >> > > >>> > >> >> >> >> > probably
>> >> > > >>> > >> >> >> >> > > >> will have more discussion on this, I am
>> >>just
>> >> > > >>>replying
>> >> > > >>> > >>from
>> >> > > >>> > >> >>the
>> >> > > >>> > >> >> >> >> > > >>discussion
>> >> > > >>> > >> >> >> >> > > >> thread here.
>> >> > > >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it
>>less
>> >>like
>> >> > > >>> > >>half-baked,
>> >> > > >>> > >> >> >> >>apologize
>> >> > > >>> > >> >> >> >> > for
>> >> > > >>> > >> >> >> >> > > >> the rush...
>> >> > > >>> > >> >> >> >> > > >>
>> >> > > >>> > >> >> >> >> > > >> The contract in current KIP is:
>> >> > > >>> > >> >> >> >> > > >>   1. close() - wait until all requests
>> >>either
>> >> > are
>> >> > > >>>sent
>> >> > > >>> > >>or
>> >> > > >>> > >> >> >>reach
>> >> > > >>> > >> >> >> >> > request
>> >> > > >>> > >> >> >> >> > > >> timeout.
>> >> > > >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) -
>> >>close
>> >> > > >>> > >>immediately
>> >> > > >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) -
>> >> > equivalent
>> >> > > >>>to
>> >> > > >>> > >> >>close(),
>> >> > > >>> > >> >> >> >>i.e.
>> >> > > >>> > >> >> >> >> > Wait
>> >> > > >>> > >> >> >> >> > > >> until all requests are sent or reach
>> >>request
>> >> > > >>>timeout
>> >> > > >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) -
>>try
>> >>the
>> >> > > >>>best to
>> >> > > >>> > >> >>finish
>> >> > > >>> > >> >> >> >> sending
>> >> > > >>> > >> >> >> >> > > >>in 5
>> >> > > >>> > >> >> >> >> > > >> milliseconds, if something went wrong,
>>just
>> >> > > >>>shutdown
>> >> > > >>> the
>> >> > > >>> > >> >> >>producer
>> >> > > >>> > >> >> >> >> > > >>anyway,
>> >> > > >>> > >> >> >> >> > > >> my callback will handle the failures.
>> >> > > >>> > >> >> >> >> > > >>
>> >> > > >>> > >> >> >> >> > > >> About how we define what timeout value
>> >>stands
>> >> > for,
>> >> > > >>>I
>> >> > > >>> > >> >>actually
>> >> > > >>> > >> >> >> >> > struggled
>> >> > > >>> > >> >> >> >> > > >>a
>> >> > > >>> > >> >> >> >> > > >> little bit when wrote the patch.
>> >>Intuitively,
>> >> > > >>>close(0)
>> >> > > >>> > >> >>should
>> >> > > >>> > >> >> >> >>mean
>> >> > > >>> > >> >> >> >> > > >> immediately, however it seems that all
>>the
>> >> > > >>>existing
>> >> > > >>> java
>> >> > > >>> > >> >>class
>> >> > > >>> > >> >> >> >>have
>> >> > > >>> > >> >> >> >> > this
>> >> > > >>> > >> >> >> >> > > >> convention of timeout=0 means no
>>timeout or
>> >> > never
>> >> > > >>> > >>timeout
>> >> > > >>> > >> >> >> >> > > >>(Thread.join(0),
>> >> > > >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the
>>dilemma
>> >>is
>> >> > > >>>either we
>> >> > > >>> > >> >>follow
>> >> > > >>> > >> >> >>the
>> >> > > >>> > >> >> >> >> > > >> intuition or we follow the convention.
>> >>What I
>> >> > > >>>chose is
>> >> > > >>> > >>to
>> >> > > >>> > >> >> >>follow
>> >> > > >>> > >> >> >> >>the
>> >> > > >>> > >> >> >> >> > > >> convention but document the interface to
>> >>let
>> >> > user
>> >> > > >>>be
>> >> > > >>> > >>aware
>> >> > > >>> > >> >>of
>> >> > > >>> > >> >> >>the
>> >> > > >>> > >> >> >> >> > usage.
>> >> > > >>> > >> >> >> >> > > >> The reason is that I think
>> >>producer.close() is a
>> >> > > >>> public
>> >> > > >>> > >> >> >> >>interface so
>> >> > > >>> > >> >> >> >> > it
>> >> > > >>> > >> >> >> >> > > >> might be better to follow java
>>convention.
>> >> > Whereas
>> >> > > >>> > >> >>selector is
>> >> > > >>> > >> >> >> >>not a
>> >> > > >>> > >> >> >> >> > > >> public interface that used by user, so
>>as
>> >>long
>> >> > as
>> >> > > >>>it
>> >> > > >>> > >>makes
>> >> > > >>> > >> >> >>sense
>> >> > > >>> > >> >> >> >>to
>> >> > > >>> > >> >> >> >> > us,
>> >> > > >>> > >> >> >> >> > > >>it
>> >> > > >>> > >> >> >> >> > > >> is less a problem to be different from
>>java
>> >> > > >>> convention.
>> >> > > >>> > >> >>That
>> >> > > >>> > >> >> >>said
>> >> > > >>> > >> >> >> >> > since
>> >> > > >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public
>> >> > interface,
>> >> > > >>>I
>> >> > > >>> > >>think
>> >> > > >>> > >> >>it
>> >> > > >>> > >> >> >> >>also
>> >> > > >>> > >> >> >> >> > makes
>> >> > > >>> > >> >> >> >> > > >> sense to make producer.close() to have
>>the
>> >>same
>> >> > > >>> > >>definition
>> >> > > >>> > >> >>of
>> >> > > >>> > >> >> >> >> > > >> consumer.poll(timeout).
>> >> > > >>> > >> >> >> >> > > >>
>> >> > > >>> > >> >> >> >> > > >> The main argument for keeping a timeout
>>in
>> >>close
>> >> > > >>>would
>> >> > > >>> > >>be
>> >> > > >>> > >> >> >> >>separating
>> >> > > >>> > >> >> >> >> > the
>> >> > > >>> > >> >> >> >> > > >> close timeout from request timeout,
>>which
>> >> > probably
>> >> > > >>> makes
>> >> > > >>> > >> >> >>sense. I
>> >> > > >>> > >> >> >> >> > would
>> >> > > >>> > >> >> >> >> > > >> guess typically the request timeout
>>would
>> >>be
>> >> > long
>> >> > > >>> (e.g.
>> >> > > >>> > >>60
>> >> > > >>> > >> >> >> >>seconds)
>> >> > > >>> > >> >> >> >> > > >> because we might want to consider
>>retries
>> >>with
>> >> > > >>>back
>> >> > > >>> off
>> >> > > >>> > >> >>time.
>> >> > > >>> > >> >> >>If
>> >> > > >>> > >> >> >> >>we
>> >> > > >>> > >> >> >> >> > have
>> >> > > >>> > >> >> >> >> > > >> multiple batches in accumulator, in
>>worst
>> >>case
>> >> > > >>>that
>> >> > > >>> > >>could
>> >> > > >>> > >> >>take
>> >> > > >>> > >> >> >> >>up to
>> >> > > >>> > >> >> >> >> > > >> several minutes to complete all the
>> >>requests.
>> >> > But
>> >> > > >>>when
>> >> > > >>> > >>we
>> >> > > >>> > >> >> >>close a
>> >> > > >>> > >> >> >> >> > > >> producer, we might not want to wait for
>> >>that
>> >> > long
>> >> > > >>>as
>> >> > > >>> it
>> >> > > >>> > >> >>might
>> >> > > >>> > >> >> >> >>cause
>> >> > > >>> > >> >> >> >> > some
>> >> > > >>> > >> >> >> >> > > >> other problem like deployment tool
>>timeout.
>> >> > > >>> > >> >> >> >> > > >>
>> >> > > >>> > >> >> >> >> > > >> There is also a subtle difference
>>between
>> >> > > >>> close(timeout)
>> >> > > >>> > >> >>and
>> >> > > >>> > >> >> >> >> > > >> flush(timeout). The only purpose for
>> >>flush() is
>> >> > to
>> >> > > >>> write
>> >> > > >>> > >> >>data
>> >> > > >>> > >> >> >>to
>> >> > > >>> > >> >> >> >>the
>> >> > > >>> > >> >> >> >> > > >> broker, so it makes perfect sense to
>>wait
>> >>until
>> >> > > >>> request
>> >> > > >>> > >> >> >>timeout.
>> >> > > >>> > >> >> >> >>I
>> >> > > >>> > >> >> >> >> > think
>> >> > > >>> > >> >> >> >> > > >> that is why flush(timeout) looks
>>strange.
>> >>On the
>> >> > > >>>other
>> >> > > >>> > >> >>hand,
>> >> > > >>> > >> >> >>the
>> >> > > >>> > >> >> >> >>top
>> >> > > >>> > >> >> >> >> > > >> priority for close() is to close the
>> >>producer
>> >> > > >>>rather
>> >> > > >>> > >>than
>> >> > > >>> > >> >> >>flush()
>> >> > > >>> > >> >> >> >> > data,
>> >> > > >>> > >> >> >> >> > > >>so
>> >> > > >>> > >> >> >> >> > > >> close(timeout) gives guarantee on
>>bounded
>> >> > waiting
>> >> > > >>>for
>> >> > > >>> > >>its
>> >> > > >>> > >> >>main
>> >> > > >>> > >> >> >> >>job.
>> >> > > >>> > >> >> >> >> > > >>
>> >> > > >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose
>> >>flag.
>> >> > It
>> >> > > >>>is
>> >> > > >>> > >>not a
>> >> > > >>> > >> >> >>public
>> >> > > >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed
>> >>Changes
>> >> > > >>>section
>> >> > > >>> > >> >>which I
>> >> > > >>> > >> >> >> >> thought
>> >> > > >>> > >> >> >> >> > > >>was
>> >> > > >>> > >> >> >> >> > > >> supposed to provide implementation
>>details.
>> >> > > >>> > >> >> >> >> > > >>
>> >> > > >>> > >> >> >> >> > > >> Thanks again for all the comments and
>> >> > suggestions!
>> >> > > >>> > >> >> >> >> > > >>
>> >> > > >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
>> >> > > >>> > >> >> >> >> > > >>
>> >> > > >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
>> >> > > >>> jqin@linkedin.com>
>> >> > > >>> > >> >> wrote:
>> >> > > >>> > >> >> >> >> > > >>
>> >> > > >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s
>> >> > comments.
>> >> > > >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process
>> >>if no
>> >> > > >>>further
>> >> > > >>> > >> >> >>comments
>> >> > > >>> > >> >> >> >>are
>> >> > > >>> > >> >> >> >> > > >> >received by tomorrow.
>> >> > > >>> > >> >> >> >> > > >> >
>> >> > > >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
>> >> > > >>> > >> >> >> >> > > >> >
>> >> > > >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
>> >> > > >>><jay.kreps@gmail.com
>> >> > > >>> >
>> >> > > >>> > >> >>wrote:
>> >> > > >>> > >> >> >> >> > > >> >
>> >> > > >>> > >> >> >> >> > > >> >>Hey Jiangjie,
>> >> > > >>> > >> >> >> >> > > >> >>
>> >> > > >>> > >> >> >> >> > > >> >>Can you capture the full motivation
>>and
>> >>use
>> >> > > >>>cases
>> >> > > >>> for
>> >> > > >>> > >>the
>> >> > > >>> > >> >> >> >>feature?
>> >> > > >>> > >> >> >> >> > > >>This
>> >> > > >>> > >> >> >> >> > > >> >>mentions your interest in having a
>>way of
>> >> > > >>>aborting
>> >> > > >>> > >>from
>> >> > > >>> > >> >> >>inside
>> >> > > >>> > >> >> >> >>the
>> >> > > >>> > >> >> >> >> > > >> >>Callback. But it doesn't really
>>explain
>> >>that
>> >> > > >>>usage
>> >> > > >>> or
>> >> > > >>> > >>why
>> >> > > >>> > >> >> >>other
>> >> > > >>> > >> >> >> >> > people
>> >> > > >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't
>> >>list
>> >> > the
>> >> > > >>> > >>primary
>> >> > > >>> > >> >>use
>> >> > > >>> > >> >> >> >>case
>> >> > > >>> > >> >> >> >> for
>> >> > > >>> > >> >> >> >> > > >> >>having
>> >> > > >>> > >> >> >> >> > > >> >>close with a bounded timeout which
>>was to
>> >> > avoid
>> >> > > >>> > >>blocking
>> >> > > >>> > >> >>too
>> >> > > >>> > >> >> >> >>long
>> >> > > >>> > >> >> >> >> on
>> >> > > >>> > >> >> >> >> > > >> >>shutdown.
>> >> > > >>> > >> >> >> >> > > >> >>
>> >> > > >>> > >> >> >> >> > > >> >>-Jay
>> >> > > >>> > >> >> >> >> > > >> >>
>> >> > > >>> > >> >> >> >> > > >> >>
>> >> > > >>> > >> >> >> >> > > >> >>
>> >> > > >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM,
>> >>Jiangjie Qin
>> >> > > >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
>> >> > > >>> > >> >> >> >> > > >> >
>> >> > > >>> > >> >> >> >> > > >> >>wrote:
>> >> > > >>> > >> >> >> >> > > >> >>
>> >> > > >>> > >> >> >> >> > > >> >>> Hi,
>> >> > > >>> > >> >> >> >> > > >> >>>
>> >> > > >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
>> >> > > >>>close(timeout)
>> >> > > >>> to
>> >> > > >>> > >>new
>> >> > > >>> > >> >> >> >> producer.
>> >> > > >>> > >> >> >> >> > > >>Most
>> >> > > >>> > >> >> >> >> > > >> >>>of
>> >> > > >>> > >> >> >> >> > > >> >>> the previous discussions are in
>> >>KAFKA-1660
>> >> > > >>>where
>> >> > > >>> > >>Parth
>> >> > > >>> > >> >> >> >> Brahmbhatt
>> >> > > >>> > >> >> >> >> > > >>has
>> >> > > >>> > >> >> >> >> > > >> >>> already done a lot of work.
>> >> > > >>> > >> >> >> >> > > >> >>> Since this is an interface change so
>> >>we are
>> >> > > >>>going
>> >> > > >>> > >> >>through
>> >> > > >>> > >> >> >>the
>> >> > > >>> > >> >> >> >> KIP
>> >> > > >>> > >> >> >> >> > > >> >>>process.
>> >> > > >>> > >> >> >> >> > > >> >>> Here is the KIP link:
>> >> > > >>> > >> >> >> >> > > >> >>>
>> >> > > >>> > >> >> >> >> > > >> >>>
>> >> > > >>> > >> >> >> >> > > >>
>> >> > > >>> > >> >> >> >> > >
>> >> > > >>> > >> >> >> >>
>> >> > > >>> > >> >> >>
>> >> > > >>> > >> >>
>> >> > > >>> > >>
>> >> > > >>> >
>> >> > > >>>
>> >> > >
>> 
>>>>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=537397
>>>>8
>> >> > > >>> > >> >> >> >> > > >> >>>2
>> >> > > >>> > >> >> >> >> > > >> >>>
>> >> > > >>> > >> >> >> >> > > >> >>> Thanks.
>> >> > > >>> > >> >> >> >> > > >> >>>
>> >> > > >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
>> >> > > >>> > >> >> >> >> > > >> >>>
>> >> > > >>> > >> >> >> >> > > >> >
>> >> > > >>> > >> >> >> >> > > >>
>> >> > > >>> > >> >> >> >> > > >>
>> >> > > >>> > >> >> >> >> > >
>> >> > > >>> > >> >> >> >> > >
>> >> > > >>> > >> >> >> >> >
>> >> > > >>> > >> >> >> >>
>> >> > > >>> > >> >> >> >>
>> >> > > >>> > >> >> >> >>
>> >> > > >>> > >> >> >> >> --
>> >> > > >>> > >> >> >> >> Thanks,
>> >> > > >>> > >> >> >> >> Neha
>> >> > > >>> > >> >> >> >>
>> >> > > >>> > >> >> >> >
>> >> > > >>> > >> >> >> >
>> >> > > >>> > >> >> >> >
>> >> > > >>> > >> >> >> >--
>> >> > > >>> > >> >> >> >-- Guozhang
>> >> > > >>> > >> >> >>
>> >> > > >>> > >> >> >>
>> >> > > >>> > >> >> >
>> >> > > >>> > >> >> >
>> >> > > >>> > >> >> >--
>> >> > > >>> > >> >> >-- Guozhang
>> >> > > >>> > >> >>
>> >> > > >>> > >> >>
>> >> > > >>> > >> >
>> >> > > >>> > >> >
>> >> > > >>> > >> >--
>> >> > > >>> > >> >-- Guozhang
>> >> > > >>> > >>
>> >> > > >>> > >>
>> >> > > >>> >
>> >> > > >>> >
>> >> > > >>>
>> >> > > >>
>> >> > > >>
>> >> > > >>
>> >> > > >>--
>> >> > > >>-- Guozhang
>> >> > > >
>> >> > >
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Thanks,
>> >> Neha
>> >
>>
>>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jay Kreps <ja...@gmail.com>.

Yeah guys, I think this is one where the cure is worse than the disease.
Let's just have the two close methods, that is confusing enough. Adding a
third won't help. I understand that having a few microseconds before the
thread shuts down could be unexpected but I think there is nothing
inherently wrong with that.

-Jay

On Thu, Mar 19, 2015 at 3:09 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> Hi Neha,
>
> As Joel said, (1) and (2) provides almost exact same function as far as I
> can see. I actually have no strong preference.
> The only difference is that (2) provides the close() semantic meaning of
> making sure all the resources have gone away, at the cost of adding
> another abort() interface.
> The simplicity of (1) is also attractive but at a slight cost of clarity.
> I am slightly leaning towards (2) but would also be OK if we pursue (1).
>
> Thanks.
>
> Jiangjie (Becket) Qin
>
>
> On 3/19/15, 2:46 PM, "Joel Koshy" <jj...@gmail.com> wrote:
>
> >(1) should work, but as Jun suggested earlier in the thread it is
> >slightly misleading. The (intuitive) post-condition of "close" is that
> >the producer has shutdown - i.e., its sender thread, closed its
> >metrics, serializer/deserializer, etc. That is not necessarily a
> >post-condition of "close(0)" although one can contend that if you call
> >the method in non-blocking mode (zero timeout) then it is reasonable
> >to not expect that post-condition.
> >
> >So I think that although (2) adds one more API it brings "simplicity
> >by virtue of overall clarity".
> >
> >I would be in favor of (2) but not strongly opposed to (1).
> >
> >Thanks,
> >
> >Joel
> >
> >On Thu, Mar 19, 2015 at 10:05:04AM -0700, Neha Narkhede wrote:
> >> I'm in favor of (1) for the sake of simplicity and as Jay mentions to
> >> reduce the number of different APIs. Can you explain when (1) does not
> >>work?
> >>
> >> On Wed, Mar 18, 2015 at 2:52 PM, Jay Kreps <ja...@gmail.com> wrote:
> >>
> >> > Personally I'm in favor of (1) just to reduce the number of different
> >>APIs.
> >> > People will find the difference between abort and close subtle and
> >> > confusing and the only instance where you want it is this somewhat
> >>unusual
> >> > case you guys are pursuing, right?
> >> >
> >> > -Jay
> >> >
> >> > On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin
> >><jq...@linkedin.com.invalid>
> >> > wrote:
> >> >
> >> > > It looks we have another option and are now deciding between the
> >> > following
> >> > > two interfaces:
> >> > >
> >> > > 1. Close() + close(timeout)
> >> > >   - timeout could be either positive or zero.
> >> > >   - only close(0) can be called from sender thread
> >> > >
> >> > > 2. Close() + abort() + close(timeout)
> >> > >   - timeout can either be positive or zero
> >> > >   - only abort() can be called from sender thread
> >> > >
> >> > >   - abort() is equivalent to close(0) in 1) but does not join sender
> >> > > thread and does not close metrics.
> >> > >   - Another thread has to call close() or close(timeout) in order
> >>to make
> >> > > sure the resources in producer are gone.
> >> > >
> >> > > The tow approach provides the same function we need, the difference
> >>is
> >> > > approach 2) follows convention of close() and abort(). On the other
> >>hand,
> >> > > approach 1) saves one interface compared with approach 2) but does
> >>not
> >> > > follow the convention.
> >> > >
> >> > > When the two approaches come to user code, it is probably something
> >>like
> >> > > this:
> >> > >
> >> > > Try {
> >> > >   While(!finished)
> >> > >     Producer.send(record, callback)
> >> > > } catch (Exception e) {
> >> > >   Producer.close(5)
> >> > > }
> >> > >
> >> > > Class CallbackImpl implements Callback {
> >> > >   onCompletion(RecordMetadata metadata Exception e) {
> >> > >     If (e != null)
> >> > >       Abort() / close()
> >> > >   }
> >> > > }
> >> > >
> >> > > Because the two approach leads to almost the same user code,
> >>assuming
> >> > > users are always calling producer.close() as a clean up step,
> >>personally
> >> > I
> >> > > prefer approach 2) as it follows convention.
> >> > >
> >> > > Any thoughts?
> >> > >
> >> > > Jiangjie (Becket) Qin
> >> > >
> >> > >
> >> > > On 3/17/15, 10:25 AM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
> >> > >
> >> > > >Hi Jun,
> >> > > >
> >> > > >Yes, as Guozhang said, the main reason we set a flag is because
> >>close(0)
> >> > > >is expected to be called by sender thread itself.
> >> > > >If we want to maintain the semantic meaning of close(), one
> >>alternative
> >> > is
> >> > > >to have an abort() method does the same thing as close(0) except
> >> > cleanup.
> >> > > >And in close(timeout), after timeout we call abort() and join the
> >>sender
> >> > > >thread. This was one of the previous proposal. We merged abort to
> >> > close(0)
> >> > > >because they are almost doing the same thing. But from what you
> >> > mentioned,
> >> > > >it might make sense to have two separate methods.
> >> > > >
> >> > > >Thanks.
> >> > > >
> >> > > >Jiangjie (Becket) Qin
> >> > > >
> >> > > >On 3/16/15, 10:31 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
> >> > > >
> >> > > >>Yeah in this sense the sender thread will not exist immediately
> >>in the
> >> > > >>close(0) call, but will only terminate after the current response
> >>batch
> >> > > >>has
> >> > > >>been processed, as will the producer instance itself.
> >> > > >>
> >> > > >>There is a reason for this though: for a clean shutdown the caller
> >> > thread
> >> > > >>has to wait for the sender thread to join before closing the
> >>producer
> >> > > >>instance, but this cannot be achieve if close(0) is called by the
> >> > sender
> >> > > >>thread itself (for example in KAFKA-1659, there is a proposal from
> >> > Andrew
> >> > > >>Stein on using thread.interrupt and thread.stop, but if it is
> >>called by
> >> > > >>the
> >> > > >>ioThread itself the stop call will fail). Hence we came up with
> >>the
> >> > flag
> >> > > >>approach to let the sender thread to close as soon as it is at the
> >> > > >>barrier
> >> > > >>of the run loop.
> >> > > >>
> >> > > >>Guozhang
> >> > > >>
> >> > > >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <ju...@confluent.io>
> wrote:
> >> > > >>
> >> > > >>> Hmm, does that mean that after close(0), the sender thread is
> >>not
> >> > > >>>necessary
> >> > > >>> gone? Normally, after closing an entity, we expect all internal
> >> > threads
> >> > > >>> associated with the entity are shut down completely.
> >> > > >>>
> >> > > >>> Thanks,
> >> > > >>>
> >> > > >>> Jun
> >> > > >>>
> >> > > >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
> >> > > >>><jq...@linkedin.com.invalid>
> >> > > >>> wrote:
> >> > > >>>
> >> > > >>> > Hi Jun,
> >> > > >>> >
> >> > > >>> > Close(0) will set two flags in sender. Running=false and a
> >>newly
> >> > > >>>added
> >> > > >>> > forceClose=true. It will also set accumulator.closed=true so
> >>no
> >> > > >>>further
> >> > > >>> > producer.send() will succeed.
> >> > > >>> > The sender thread will finish executing all the callbacks in
> >> > current
> >> > > >>> batch
> >> > > >>> > of responses, then it will see the forceClose flag. It will
> >>just
> >> > fail
> >> > > >>>all
> >> > > >>> > the incomplete batches in the producer and exit.
> >> > > >>> > So close(0) is a non-blocking call and sender thread will not
> >>try
> >> > to
> >> > > >>>join
> >> > > >>> > itself in close(0).
> >> > > >>> >
> >> > > >>> > Thanks.
> >> > > >>> >
> >> > > >>> > Jiangjie (Becket) Qin
> >> > > >>> >
> >> > > >>> > On 3/16/15, 2:50 PM, "Jun Rao" <ju...@confluent.io> wrote:
> >> > > >>> >
> >> > > >>> > >How does close(0) work if it's called from the sender
> >>thread? If
> >> > > >>> close(0)
> >> > > >>> > >needs to wait for the sender thread to join, wouldn't this
> >>cause a
> >> > > >>> > >deadlock?
> >> > > >>> > >
> >> > > >>> > >Thanks,
> >> > > >>> > >
> >> > > >>> > >Jun
> >> > > >>> > >
> >> > > >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
> >> > > >>><jqin@linkedin.com.invalid
> >> > > >>> >
> >> > > >>> > >wrote:
> >> > > >>> > >
> >> > > >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered
> >>without
> >> > > >>> > >> discussing with you :)
> >> > > >>> > >>
> >> > > >>> > >> Jiangjie (Becket) Qin
> >> > > >>> > >>
> >> > > >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wa...@gmail.com>
> >> > wrote:
> >> > > >>> > >>
> >> > > >>> > >> >Thanks Jiangjie,
> >> > > >>> > >> >
> >> > > >>> > >> >After talking to you offline on this, I have been
> >>convinced and
> >> > > >>> > >>changed my
> >> > > >>> > >> >preference to blocking. The immediate shutdown approach
> >>does
> >> > have
> >> > > >>> some
> >> > > >>> > >> >unsafeness in some cases.
> >> > > >>> > >> >
> >> > > >>> > >> >Guozhang
> >> > > >>> > >> >
> >> > > >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
> >> > > >>> > >><jqin@linkedin.com.invalid
> >> > > >>> > >> >
> >> > > >>> > >> >wrote:
> >> > > >>> > >> >
> >> > > >>> > >> >> It looks that the problem we want to solve and the
> >>purpose we
> >> > > >>>want
> >> > > >>> to
> >> > > >>> > >> >> achieve is:
> >> > > >>> > >> >> If user uses close() in callback, we want to let user be
> >> > aware
> >> > > >>>that
> >> > > >>> > >>they
> >> > > >>> > >> >> should use close(0) instead of close() in the callback.
> >> > > >>> > >> >>
> >> > > >>> > >> >> We have agreed that we will have an error log to inform
> >>user
> >> > > >>>about
> >> > > >>> > >>this
> >> > > >>> > >> >> mis-usage. The options differ in the way how we can
> >>force
> >> > user
> >> > > >>>to
> >> > > >>> > >>take a
> >> > > >>> > >> >> look at that error log.
> >> > > >>> > >> >> There are two scenarios:
> >> > > >>> > >> >> 1. User does not expect the program to exit.
> >> > > >>> > >> >> 2. User expect the program to exit.
> >> > > >>> > >> >>
> >> > > >>> > >> >> For scenario 1), blocking will probably delay the
> >>discovery
> >> > of
> >> > > >>>the
> >> > > >>> > >> >> problem. Calling close(0) exposes the problem quicker.
> >>In
> >> > this
> >> > > >>> > >>scenario
> >> > > >>> > >> >> producer just encounter a send failure when running
> >>normally.
> >> > > >>> > >> >> For scenario 2), blocking will expose the problem quick.
> >> > > >>>Calling
> >> > > >>> > >> >>close(-1)
> >> > > >>> > >> >> might hide the problem. This scenario might include: a)
> >>Unit
> >> > > >>>test
> >> > > >>> > >>for a
> >> > > >>> > >> >> send failure. b) Message sending during a close() call
> >>from a
> >> > > >>>user
> >> > > >>> > >> >>thread.
> >> > > >>> > >> >>
> >> > > >>> > >> >> So as a summary table:
> >> > > >>> > >> >>
> >> > > >>> > >> >>                   Scenario 1)
> >> >  Scenario
> >> > > >>>2)
> >> > > >>> > >> >>
> >> > > >>> > >> >> Blocking      Delay problem discovery         Guaranteed
> >> > > >>>problem
> >> > > >>> > >> >>discovery
> >> > > >>> > >> >>
> >> > > >>> > >> >> Close(-1)     Immediate problem discovery     Problem
> >>might
> >> > be
> >> > > >>> hidden
> >> > > >>> > >> >>
> >> > > >>> > >> >>
> >> > > >>> > >> >> Personally I prefer blocking because it seems providing
> >>more
> >> > > >>> > >>guarantees
> >> > > >>> > >> >> and safer.
> >> > > >>> > >> >>
> >> > > >>> > >> >> Thanks.
> >> > > >>> > >> >>
> >> > > >>> > >> >> Jiangjie (Becket) Qin
> >> > > >>> > >> >>
> >> > > >>> > >> >>
> >> > > >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang"
> >><wa...@gmail.com>
> >> > > >>>wrote:
> >> > > >>> > >> >>
> >> > > >>> > >> >> >HI Jiangjie,
> >> > > >>> > >> >> >
> >> > > >>> > >> >> >As far as I understand calling close() in the ioThread
> >>is
> >> > not
> >> > > >>> > >>common,
> >> > > >>> > >> >>as
> >> > > >>> > >> >> >it
> >> > > >>> > >> >> >may only trigger when we saw some non-retriable error.
> >>Hence
> >> > > >>>when
> >> > > >>> > >>user
> >> > > >>> > >> >>run
> >> > > >>> > >> >> >their program it is unlikely that close() will be
> >>triggered
> >> > > >>>and
> >> > > >>> > >>problem
> >> > > >>> > >> >> >will be detected. So it seems to me that from the error
> >> > > >>>detection
> >> > > >>> > >> >>aspect
> >> > > >>> > >> >> >these two options seems to be the same as people will
> >> > usually
> >> > > >>> > >>detect it
> >> > > >>> > >> >> >from the producer metrics all dropping to 0.
> >> > > >>> > >> >> >
> >> > > >>> > >> >> >Guozhang
> >> > > >>> > >> >> >
> >> > > >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> >> > > >>> > >> >><jq...@linkedin.com.invalid>
> >> > > >>> > >> >> >wrote:
> >> > > >>> > >> >> >
> >> > > >>> > >> >> >> It seems there are two options we can choose from
> >>when
> >> > > >>>close()
> >> > > >>> is
> >> > > >>> > >> >>called
> >> > > >>> > >> >> >> from sender thread (callback):
> >> > > >>> > >> >> >> 1. Log an error and close the producer using
> >>close(-1)
> >> > > >>> > >> >> >> 2. Log an error and block.
> >> > > >>> > >> >> >> (Throwing an exception will not work because we
> >>catch all
> >> > > >>>the
> >> > > >>> > >> >>exception
> >> > > >>> > >> >> >> thrown from user callback. It will just lead to an
> >>error
> >> > > >>>log.)
> >> > > >>> > >> >> >>
> >> > > >>> > >> >> >> My concern for the first option is that the producer
> >>will
> >> > be
> >> > > >>> > >>closed
> >> > > >>> > >> >>even
> >> > > >>> > >> >> >> if we logged and error. I am wondering if some user
> >>would
> >> > > >>>not
> >> > > >>> even
> >> > > >>> > >> >>take
> >> > > >>> > >> >> >>a
> >> > > >>> > >> >> >> look at the log if producer is closed normally.
> >>Because
> >> > from
> >> > > >>>the
> >> > > >>> > >> >> >>programs
> >> > > >>> > >> >> >> behavior, everything looks good. If that is the
> >>case, the
> >> > > >>>error
> >> > > >>> > >> >>message
> >> > > >>> > >> >> >>we
> >> > > >>> > >> >> >> logged probably will just be ignored until some day
> >>when
> >> > > >>>people
> >> > > >>> > >>check
> >> > > >>> > >> >> >>the
> >> > > >>> > >> >> >> log and see it.
> >> > > >>> > >> >> >>
> >> > > >>> > >> >> >> As for the second option, because producer does not
> >>close
> >> > > >>>but
> >> > > >>> > >>blocks.
> >> > > >>> > >> >> >>User
> >> > > >>> > >> >> >> will notice this the first time they run the
> >>program. They
> >> > > >>> > >>probably
> >> > > >>> > >> >>will
> >> > > >>> > >> >> >> look at the log to see why producer could not be
> >>closed
> >> > and
> >> > > >>>they
> >> > > >>> > >>will
> >> > > >>> > >> >> >>see
> >> > > >>> > >> >> >> the error log we put there. So they will get informed
> >> > about
> >> > > >>>this
> >> > > >>> > >> >> >>mis-usage
> >> > > >>> > >> >> >> of close() in sender thread the first time they run
> >>the
> >> > code
> >> > > >>> > >>instead
> >> > > >>> > >> >>of
> >> > > >>> > >> >> >> some time later.
> >> > > >>> > >> >> >>
> >> > > >>> > >> >> >> Personally I prefer the second one because it is more
> >> > > >>>obvious
> >> > > >>> that
> >> > > >>> > >> >> >> something was wrong.
> >> > > >>> > >> >> >>
> >> > > >>> > >> >> >> Jiangjie (Becket) Qin
> >> > > >>> > >> >> >>
> >> > > >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang"
> >><wa...@gmail.com>
> >> > > >>> wrote:
> >> > > >>> > >> >> >>
> >> > > >>> > >> >> >> >Yeah I agree we should not silently change the
> >>behavior
> >> > of
> >> > > >>>the
> >> > > >>> > >> >>function
> >> > > >>> > >> >> >> >with the given parameters; and I would prefer
> >> > > >>> > >> >> >>error-logging-and-shutdown
> >> > > >>> > >> >> >> >over blocking when close(>0) is used, since as Neha
> >> > > >>>suggested
> >> > > >>> > >> >>blocking
> >> > > >>> > >> >> >> >would also not proceed with sending any data, bu
> >>will
> >> > just
> >> > > >>>let
> >> > > >>> > >> >>users to
> >> > > >>> > >> >> >> >realize the issue later than sooner.
> >> > > >>> > >> >> >> >
> >> > > >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
> >> > > >>> > >><ne...@confluent.io>
> >> > > >>> > >> >> >>wrote:
> >> > > >>> > >> >> >> >
> >> > > >>> > >> >> >> >> >
> >> > > >>> > >> >> >> >> > And I also agree it is better if we can make
> >>producer
> >> > > >>>block
> >> > > >>> > >>when
> >> > > >>> > >> >> >> >> > close() is called from sender thread so user
> >>will
> >> > > >>>notice
> >> > > >>> > >> >>something
> >> > > >>> > >> >> >> >>went
> >> > > >>> > >> >> >> >> > wrong.
> >> > > >>> > >> >> >> >>
> >> > > >>> > >> >> >> >>
> >> > > >>> > >> >> >> >> This isn't a great experience either. Why can't
> >>we just
> >> > > >>>throw
> >> > > >>> > >>an
> >> > > >>> > >> >> >> >>exception
> >> > > >>> > >> >> >> >> for a behavior we know is incorrect and we'd like
> >>the
> >> > > >>>user to
> >> > > >>> > >> >>know.
> >> > > >>> > >> >> >> >> Blocking as a means of doing that seems wrong and
> >> > > >>>annoying.
> >> > > >>> > >> >> >> >>
> >> > > >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
> >> > > >>> > >><ja...@gmail.com>
> >> > > >>> > >> >> >> wrote:
> >> > > >>> > >> >> >> >>
> >> > > >>> > >> >> >> >> > Cool.
> >> > > >>> > >> >> >> >> >
> >> > > >>> > >> >> >> >> > I think blocking is good or alternately
> >>throwing an
> >> > > >>> exception
> >> > > >>> > >> >> >>directly
> >> > > >>> > >> >> >> >> from
> >> > > >>> > >> >> >> >> > close(). Basically I would just worry about
> >>subtly
> >> > > >>>doing
> >> > > >>> > >> >>something
> >> > > >>> > >> >> >> >> slightly
> >> > > >>> > >> >> >> >> > different from what the user asked for as it
> >>will be
> >> > > >>>hard
> >> > > >>> to
> >> > > >>> > >> >>notice
> >> > > >>> > >> >> >> >>that
> >> > > >>> > >> >> >> >> > behavior difference.
> >> > > >>> > >> >> >> >> >
> >> > > >>> > >> >> >> >> > -Jay
> >> > > >>> > >> >> >> >> >
> >> > > >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> >> > > >>> > >> >> >> >><jqin@linkedin.com.invalid
> >> > > >>> > >> >> >> >> >
> >> > > >>> > >> >> >> >> > wrote:
> >> > > >>> > >> >> >> >> >
> >> > > >>> > >> >> >> >> > > Hi Jay,
> >> > > >>> > >> >> >> >> > >
> >> > > >>> > >> >> >> >> > > I have modified the KIP as you suggested. I
> >>thinks
> >> > as
> >> > > >>> long
> >> > > >>> > >>as
> >> > > >>> > >> >>we
> >> > > >>> > >> >> >> >>have
> >> > > >>> > >> >> >> >> > > consistent define for timeout across Kafka
> >> > interface,
> >> > > >>> there
> >> > > >>> > >> >>would
> >> > > >>> > >> >> >> >>be no
> >> > > >>> > >> >> >> >> > > problem. And I also agree it is better if we
> >>can
> >> > make
> >> > > >>> > >>producer
> >> > > >>> > >> >> >>block
> >> > > >>> > >> >> >> >> when
> >> > > >>> > >> >> >> >> > > close() is called from sender thread so user
> >>will
> >> > > >>>notice
> >> > > >>> > >> >> >>something
> >> > > >>> > >> >> >> >>went
> >> > > >>> > >> >> >> >> > > wrong.
> >> > > >>> > >> >> >> >> > >
> >> > > >>> > >> >> >> >> > > Thanks.
> >> > > >>> > >> >> >> >> > >
> >> > > >>> > >> >> >> >> > > Jiangjie (Becket) Qin
> >> > > >>> > >> >> >> >> > >
> >> > > >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
> >> > > >>><ja...@gmail.com>
> >> > > >>> > >> wrote:
> >> > > >>> > >> >> >> >> > >
> >> > > >>> > >> >> >> >> > > >Hey Jiangjie,
> >> > > >>> > >> >> >> >> > > >
> >> > > >>> > >> >> >> >> > > >I think this is going to be very confusing
> >>that
> >> > > >>> > >> >> >> >> > > >  close(0) waits indefinitely and
> >> > > >>> > >> >> >> >> > > >  close(-1) waits for 0.
> >> > > >>> > >> >> >> >> > > >I understand this appears in other apis, but
> >>it
> >> > is a
> >> > > >>> > >>constant
> >> > > >>> > >> >> >> >>cause of
> >> > > >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
> >> > > >>> > >> >> >> >> > > >
> >> > > >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't
> >>need a
> >> > way
> >> > > >>>to
> >> > > >>> > >>wait
> >> > > >>> > >> >> >> >> indefinitely
> >> > > >>> > >> >> >> >> > > >as
> >> > > >>> > >> >> >> >> > > >we already have close() so having a magical
> >> > constant
> >> > > >>>for
> >> > > >>> > >> >>that is
> >> > > >>> > >> >> >> >> > > >redundant.
> >> > > >>> > >> >> >> >> > > >
> >> > > >>> > >> >> >> >> > > >Calling close() from the I/O thread was
> >>already
> >> > > >>>possible
> >> > > >>> > >>and
> >> > > >>> > >> >> >>would
> >> > > >>> > >> >> >> >> block
> >> > > >>> > >> >> >> >> > > >indefinitely. I think trying to silently
> >>change
> >> > the
> >> > > >>> > >>behavior
> >> > > >>> > >> >>is
> >> > > >>> > >> >> >> >> probably
> >> > > >>> > >> >> >> >> > > >not right. I.e. if the user calls close() in
> >>the
> >> > > >>> callback
> >> > > >>> > >> >>there
> >> > > >>> > >> >> >>is
> >> > > >>> > >> >> >> >> > > >actually
> >> > > >>> > >> >> >> >> > > >some misunderstanding and they need to think
> >>more,
> >> > > >>> > >>silently
> >> > > >>> > >> >> >>making
> >> > > >>> > >> >> >> >> this
> >> > > >>> > >> >> >> >> > > >not
> >> > > >>> > >> >> >> >> > > >block will hide the problem from them which
> >>is the
> >> > > >>> > >>opposite
> >> > > >>> > >> >>of
> >> > > >>> > >> >> >> >>what we
> >> > > >>> > >> >> >> >> > > >want.
> >> > > >>> > >> >> >> >> > > >
> >> > > >>> > >> >> >> >> > > >-Jay
> >> > > >>> > >> >> >> >> > > >
> >> > > >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> >> > > >>> > >> >> >> >> <jqin@linkedin.com.invalid
> >> > > >>> > >> >> >> >> > >
> >> > > >>> > >> >> >> >> > > >wrote:
> >> > > >>> > >> >> >> >> > > >
> >> > > >>> > >> >> >> >> > > >> Hey Joe & Jay,
> >> > > >>> > >> >> >> >> > > >>
> >> > > >>> > >> >> >> >> > > >> Thanks for the comments on the voting
> >>thread.
> >> > > >>>Since it
> >> > > >>> > >> >>seems
> >> > > >>> > >> >> >>we
> >> > > >>> > >> >> >> >> > probably
> >> > > >>> > >> >> >> >> > > >> will have more discussion on this, I am
> >>just
> >> > > >>>replying
> >> > > >>> > >>from
> >> > > >>> > >> >>the
> >> > > >>> > >> >> >> >> > > >>discussion
> >> > > >>> > >> >> >> >> > > >> thread here.
> >> > > >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less
> >>like
> >> > > >>> > >>half-baked,
> >> > > >>> > >> >> >> >>apologize
> >> > > >>> > >> >> >> >> > for
> >> > > >>> > >> >> >> >> > > >> the rush...
> >> > > >>> > >> >> >> >> > > >>
> >> > > >>> > >> >> >> >> > > >> The contract in current KIP is:
> >> > > >>> > >> >> >> >> > > >>   1. close() - wait until all requests
> >>either
> >> > are
> >> > > >>>sent
> >> > > >>> > >>or
> >> > > >>> > >> >> >>reach
> >> > > >>> > >> >> >> >> > request
> >> > > >>> > >> >> >> >> > > >> timeout.
> >> > > >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) -
> >>close
> >> > > >>> > >>immediately
> >> > > >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) -
> >> > equivalent
> >> > > >>>to
> >> > > >>> > >> >>close(),
> >> > > >>> > >> >> >> >>i.e.
> >> > > >>> > >> >> >> >> > Wait
> >> > > >>> > >> >> >> >> > > >> until all requests are sent or reach
> >>request
> >> > > >>>timeout
> >> > > >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try
> >>the
> >> > > >>>best to
> >> > > >>> > >> >>finish
> >> > > >>> > >> >> >> >> sending
> >> > > >>> > >> >> >> >> > > >>in 5
> >> > > >>> > >> >> >> >> > > >> milliseconds, if something went wrong, just
> >> > > >>>shutdown
> >> > > >>> the
> >> > > >>> > >> >> >>producer
> >> > > >>> > >> >> >> >> > > >>anyway,
> >> > > >>> > >> >> >> >> > > >> my callback will handle the failures.
> >> > > >>> > >> >> >> >> > > >>
> >> > > >>> > >> >> >> >> > > >> About how we define what timeout value
> >>stands
> >> > for,
> >> > > >>>I
> >> > > >>> > >> >>actually
> >> > > >>> > >> >> >> >> > struggled
> >> > > >>> > >> >> >> >> > > >>a
> >> > > >>> > >> >> >> >> > > >> little bit when wrote the patch.
> >>Intuitively,
> >> > > >>>close(0)
> >> > > >>> > >> >>should
> >> > > >>> > >> >> >> >>mean
> >> > > >>> > >> >> >> >> > > >> immediately, however it seems that all the
> >> > > >>>existing
> >> > > >>> java
> >> > > >>> > >> >>class
> >> > > >>> > >> >> >> >>have
> >> > > >>> > >> >> >> >> > this
> >> > > >>> > >> >> >> >> > > >> convention of timeout=0 means no timeout or
> >> > never
> >> > > >>> > >>timeout
> >> > > >>> > >> >> >> >> > > >>(Thread.join(0),
> >> > > >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma
> >>is
> >> > > >>>either we
> >> > > >>> > >> >>follow
> >> > > >>> > >> >> >>the
> >> > > >>> > >> >> >> >> > > >> intuition or we follow the convention.
> >>What I
> >> > > >>>chose is
> >> > > >>> > >>to
> >> > > >>> > >> >> >>follow
> >> > > >>> > >> >> >> >>the
> >> > > >>> > >> >> >> >> > > >> convention but document the interface to
> >>let
> >> > user
> >> > > >>>be
> >> > > >>> > >>aware
> >> > > >>> > >> >>of
> >> > > >>> > >> >> >>the
> >> > > >>> > >> >> >> >> > usage.
> >> > > >>> > >> >> >> >> > > >> The reason is that I think
> >>producer.close() is a
> >> > > >>> public
> >> > > >>> > >> >> >> >>interface so
> >> > > >>> > >> >> >> >> > it
> >> > > >>> > >> >> >> >> > > >> might be better to follow java convention.
> >> > Whereas
> >> > > >>> > >> >>selector is
> >> > > >>> > >> >> >> >>not a
> >> > > >>> > >> >> >> >> > > >> public interface that used by user, so as
> >>long
> >> > as
> >> > > >>>it
> >> > > >>> > >>makes
> >> > > >>> > >> >> >>sense
> >> > > >>> > >> >> >> >>to
> >> > > >>> > >> >> >> >> > us,
> >> > > >>> > >> >> >> >> > > >>it
> >> > > >>> > >> >> >> >> > > >> is less a problem to be different from java
> >> > > >>> convention.
> >> > > >>> > >> >>That
> >> > > >>> > >> >> >>said
> >> > > >>> > >> >> >> >> > since
> >> > > >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public
> >> > interface,
> >> > > >>>I
> >> > > >>> > >>think
> >> > > >>> > >> >>it
> >> > > >>> > >> >> >> >>also
> >> > > >>> > >> >> >> >> > makes
> >> > > >>> > >> >> >> >> > > >> sense to make producer.close() to have the
> >>same
> >> > > >>> > >>definition
> >> > > >>> > >> >>of
> >> > > >>> > >> >> >> >> > > >> consumer.poll(timeout).
> >> > > >>> > >> >> >> >> > > >>
> >> > > >>> > >> >> >> >> > > >> The main argument for keeping a timeout in
> >>close
> >> > > >>>would
> >> > > >>> > >>be
> >> > > >>> > >> >> >> >>separating
> >> > > >>> > >> >> >> >> > the
> >> > > >>> > >> >> >> >> > > >> close timeout from request timeout, which
> >> > probably
> >> > > >>> makes
> >> > > >>> > >> >> >>sense. I
> >> > > >>> > >> >> >> >> > would
> >> > > >>> > >> >> >> >> > > >> guess typically the request timeout would
> >>be
> >> > long
> >> > > >>> (e.g.
> >> > > >>> > >>60
> >> > > >>> > >> >> >> >>seconds)
> >> > > >>> > >> >> >> >> > > >> because we might want to consider retries
> >>with
> >> > > >>>back
> >> > > >>> off
> >> > > >>> > >> >>time.
> >> > > >>> > >> >> >>If
> >> > > >>> > >> >> >> >>we
> >> > > >>> > >> >> >> >> > have
> >> > > >>> > >> >> >> >> > > >> multiple batches in accumulator, in worst
> >>case
> >> > > >>>that
> >> > > >>> > >>could
> >> > > >>> > >> >>take
> >> > > >>> > >> >> >> >>up to
> >> > > >>> > >> >> >> >> > > >> several minutes to complete all the
> >>requests.
> >> > But
> >> > > >>>when
> >> > > >>> > >>we
> >> > > >>> > >> >> >>close a
> >> > > >>> > >> >> >> >> > > >> producer, we might not want to wait for
> >>that
> >> > long
> >> > > >>>as
> >> > > >>> it
> >> > > >>> > >> >>might
> >> > > >>> > >> >> >> >>cause
> >> > > >>> > >> >> >> >> > some
> >> > > >>> > >> >> >> >> > > >> other problem like deployment tool timeout.
> >> > > >>> > >> >> >> >> > > >>
> >> > > >>> > >> >> >> >> > > >> There is also a subtle difference between
> >> > > >>> close(timeout)
> >> > > >>> > >> >>and
> >> > > >>> > >> >> >> >> > > >> flush(timeout). The only purpose for
> >>flush() is
> >> > to
> >> > > >>> write
> >> > > >>> > >> >>data
> >> > > >>> > >> >> >>to
> >> > > >>> > >> >> >> >>the
> >> > > >>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait
> >>until
> >> > > >>> request
> >> > > >>> > >> >> >>timeout.
> >> > > >>> > >> >> >> >>I
> >> > > >>> > >> >> >> >> > think
> >> > > >>> > >> >> >> >> > > >> that is why flush(timeout) looks strange.
> >>On the
> >> > > >>>other
> >> > > >>> > >> >>hand,
> >> > > >>> > >> >> >>the
> >> > > >>> > >> >> >> >>top
> >> > > >>> > >> >> >> >> > > >> priority for close() is to close the
> >>producer
> >> > > >>>rather
> >> > > >>> > >>than
> >> > > >>> > >> >> >>flush()
> >> > > >>> > >> >> >> >> > data,
> >> > > >>> > >> >> >> >> > > >>so
> >> > > >>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded
> >> > waiting
> >> > > >>>for
> >> > > >>> > >>its
> >> > > >>> > >> >>main
> >> > > >>> > >> >> >> >>job.
> >> > > >>> > >> >> >> >> > > >>
> >> > > >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose
> >>flag.
> >> > It
> >> > > >>>is
> >> > > >>> > >>not a
> >> > > >>> > >> >> >>public
> >> > > >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed
> >>Changes
> >> > > >>>section
> >> > > >>> > >> >>which I
> >> > > >>> > >> >> >> >> thought
> >> > > >>> > >> >> >> >> > > >>was
> >> > > >>> > >> >> >> >> > > >> supposed to provide implementation details.
> >> > > >>> > >> >> >> >> > > >>
> >> > > >>> > >> >> >> >> > > >> Thanks again for all the comments and
> >> > suggestions!
> >> > > >>> > >> >> >> >> > > >>
> >> > > >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
> >> > > >>> > >> >> >> >> > > >>
> >> > > >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
> >> > > >>> jqin@linkedin.com>
> >> > > >>> > >> >> wrote:
> >> > > >>> > >> >> >> >> > > >>
> >> > > >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s
> >> > comments.
> >> > > >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process
> >>if no
> >> > > >>>further
> >> > > >>> > >> >> >>comments
> >> > > >>> > >> >> >> >>are
> >> > > >>> > >> >> >> >> > > >> >received by tomorrow.
> >> > > >>> > >> >> >> >> > > >> >
> >> > > >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
> >> > > >>> > >> >> >> >> > > >> >
> >> > > >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
> >> > > >>><jay.kreps@gmail.com
> >> > > >>> >
> >> > > >>> > >> >>wrote:
> >> > > >>> > >> >> >> >> > > >> >
> >> > > >>> > >> >> >> >> > > >> >>Hey Jiangjie,
> >> > > >>> > >> >> >> >> > > >> >>
> >> > > >>> > >> >> >> >> > > >> >>Can you capture the full motivation and
> >>use
> >> > > >>>cases
> >> > > >>> for
> >> > > >>> > >>the
> >> > > >>> > >> >> >> >>feature?
> >> > > >>> > >> >> >> >> > > >>This
> >> > > >>> > >> >> >> >> > > >> >>mentions your interest in having a way of
> >> > > >>>aborting
> >> > > >>> > >>from
> >> > > >>> > >> >> >>inside
> >> > > >>> > >> >> >> >>the
> >> > > >>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain
> >>that
> >> > > >>>usage
> >> > > >>> or
> >> > > >>> > >>why
> >> > > >>> > >> >> >>other
> >> > > >>> > >> >> >> >> > people
> >> > > >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't
> >>list
> >> > the
> >> > > >>> > >>primary
> >> > > >>> > >> >>use
> >> > > >>> > >> >> >> >>case
> >> > > >>> > >> >> >> >> for
> >> > > >>> > >> >> >> >> > > >> >>having
> >> > > >>> > >> >> >> >> > > >> >>close with a bounded timeout which was to
> >> > avoid
> >> > > >>> > >>blocking
> >> > > >>> > >> >>too
> >> > > >>> > >> >> >> >>long
> >> > > >>> > >> >> >> >> on
> >> > > >>> > >> >> >> >> > > >> >>shutdown.
> >> > > >>> > >> >> >> >> > > >> >>
> >> > > >>> > >> >> >> >> > > >> >>-Jay
> >> > > >>> > >> >> >> >> > > >> >>
> >> > > >>> > >> >> >> >> > > >> >>
> >> > > >>> > >> >> >> >> > > >> >>
> >> > > >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM,
> >>Jiangjie Qin
> >> > > >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
> >> > > >>> > >> >> >> >> > > >> >
> >> > > >>> > >> >> >> >> > > >> >>wrote:
> >> > > >>> > >> >> >> >> > > >> >>
> >> > > >>> > >> >> >> >> > > >> >>> Hi,
> >> > > >>> > >> >> >> >> > > >> >>>
> >> > > >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
> >> > > >>>close(timeout)
> >> > > >>> to
> >> > > >>> > >>new
> >> > > >>> > >> >> >> >> producer.
> >> > > >>> > >> >> >> >> > > >>Most
> >> > > >>> > >> >> >> >> > > >> >>>of
> >> > > >>> > >> >> >> >> > > >> >>> the previous discussions are in
> >>KAFKA-1660
> >> > > >>>where
> >> > > >>> > >>Parth
> >> > > >>> > >> >> >> >> Brahmbhatt
> >> > > >>> > >> >> >> >> > > >>has
> >> > > >>> > >> >> >> >> > > >> >>> already done a lot of work.
> >> > > >>> > >> >> >> >> > > >> >>> Since this is an interface change so
> >>we are
> >> > > >>>going
> >> > > >>> > >> >>through
> >> > > >>> > >> >> >>the
> >> > > >>> > >> >> >> >> KIP
> >> > > >>> > >> >> >> >> > > >> >>>process.
> >> > > >>> > >> >> >> >> > > >> >>> Here is the KIP link:
> >> > > >>> > >> >> >> >> > > >> >>>
> >> > > >>> > >> >> >> >> > > >> >>>
> >> > > >>> > >> >> >> >> > > >>
> >> > > >>> > >> >> >> >> > >
> >> > > >>> > >> >> >> >>
> >> > > >>> > >> >> >>
> >> > > >>> > >> >>
> >> > > >>> > >>
> >> > > >>> >
> >> > > >>>
> >> > >
> >>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >> > > >>> > >> >> >> >> > > >> >>>2
> >> > > >>> > >> >> >> >> > > >> >>>
> >> > > >>> > >> >> >> >> > > >> >>> Thanks.
> >> > > >>> > >> >> >> >> > > >> >>>
> >> > > >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> >> > > >>> > >> >> >> >> > > >> >>>
> >> > > >>> > >> >> >> >> > > >> >
> >> > > >>> > >> >> >> >> > > >>
> >> > > >>> > >> >> >> >> > > >>
> >> > > >>> > >> >> >> >> > >
> >> > > >>> > >> >> >> >> > >
> >> > > >>> > >> >> >> >> >
> >> > > >>> > >> >> >> >>
> >> > > >>> > >> >> >> >>
> >> > > >>> > >> >> >> >>
> >> > > >>> > >> >> >> >> --
> >> > > >>> > >> >> >> >> Thanks,
> >> > > >>> > >> >> >> >> Neha
> >> > > >>> > >> >> >> >>
> >> > > >>> > >> >> >> >
> >> > > >>> > >> >> >> >
> >> > > >>> > >> >> >> >
> >> > > >>> > >> >> >> >--
> >> > > >>> > >> >> >> >-- Guozhang
> >> > > >>> > >> >> >>
> >> > > >>> > >> >> >>
> >> > > >>> > >> >> >
> >> > > >>> > >> >> >
> >> > > >>> > >> >> >--
> >> > > >>> > >> >> >-- Guozhang
> >> > > >>> > >> >>
> >> > > >>> > >> >>
> >> > > >>> > >> >
> >> > > >>> > >> >
> >> > > >>> > >> >--
> >> > > >>> > >> >-- Guozhang
> >> > > >>> > >>
> >> > > >>> > >>
> >> > > >>> >
> >> > > >>> >
> >> > > >>>
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >>--
> >> > > >>-- Guozhang
> >> > > >
> >> > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks,
> >> Neha
> >
>
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Hi Neha,

As Joel said, (1) and (2) provides almost exact same function as far as I
can see. I actually have no strong preference.
The only difference is that (2) provides the close() semantic meaning of
making sure all the resources have gone away, at the cost of adding
another abort() interface.
The simplicity of (1) is also attractive but at a slight cost of clarity.
I am slightly leaning towards (2) but would also be OK if we pursue (1).

Thanks.

Jiangjie (Becket) Qin


On 3/19/15, 2:46 PM, "Joel Koshy" <jj...@gmail.com> wrote:

>(1) should work, but as Jun suggested earlier in the thread it is
>slightly misleading. The (intuitive) post-condition of "close" is that
>the producer has shutdown - i.e., its sender thread, closed its
>metrics, serializer/deserializer, etc. That is not necessarily a
>post-condition of "close(0)" although one can contend that if you call
>the method in non-blocking mode (zero timeout) then it is reasonable
>to not expect that post-condition.
>
>So I think that although (2) adds one more API it brings "simplicity
>by virtue of overall clarity".
>
>I would be in favor of (2) but not strongly opposed to (1).
>
>Thanks,
>
>Joel
>
>On Thu, Mar 19, 2015 at 10:05:04AM -0700, Neha Narkhede wrote:
>> I'm in favor of (1) for the sake of simplicity and as Jay mentions to
>> reduce the number of different APIs. Can you explain when (1) does not
>>work?
>> 
>> On Wed, Mar 18, 2015 at 2:52 PM, Jay Kreps <ja...@gmail.com> wrote:
>> 
>> > Personally I'm in favor of (1) just to reduce the number of different
>>APIs.
>> > People will find the difference between abort and close subtle and
>> > confusing and the only instance where you want it is this somewhat
>>unusual
>> > case you guys are pursuing, right?
>> >
>> > -Jay
>> >
>> > On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin
>><jq...@linkedin.com.invalid>
>> > wrote:
>> >
>> > > It looks we have another option and are now deciding between the
>> > following
>> > > two interfaces:
>> > >
>> > > 1. Close() + close(timeout)
>> > >   - timeout could be either positive or zero.
>> > >   - only close(0) can be called from sender thread
>> > >
>> > > 2. Close() + abort() + close(timeout)
>> > >   - timeout can either be positive or zero
>> > >   - only abort() can be called from sender thread
>> > >
>> > >   - abort() is equivalent to close(0) in 1) but does not join sender
>> > > thread and does not close metrics.
>> > >   - Another thread has to call close() or close(timeout) in order
>>to make
>> > > sure the resources in producer are gone.
>> > >
>> > > The tow approach provides the same function we need, the difference
>>is
>> > > approach 2) follows convention of close() and abort(). On the other
>>hand,
>> > > approach 1) saves one interface compared with approach 2) but does
>>not
>> > > follow the convention.
>> > >
>> > > When the two approaches come to user code, it is probably something
>>like
>> > > this:
>> > >
>> > > Try {
>> > >   While(!finished)
>> > >     Producer.send(record, callback)
>> > > } catch (Exception e) {
>> > >   Producer.close(5)
>> > > }
>> > >
>> > > Class CallbackImpl implements Callback {
>> > >   onCompletion(RecordMetadata metadata Exception e) {
>> > >     If (e != null)
>> > >       Abort() / close()
>> > >   }
>> > > }
>> > >
>> > > Because the two approach leads to almost the same user code,
>>assuming
>> > > users are always calling producer.close() as a clean up step,
>>personally
>> > I
>> > > prefer approach 2) as it follows convention.
>> > >
>> > > Any thoughts?
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > >
>> > > On 3/17/15, 10:25 AM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>> > >
>> > > >Hi Jun,
>> > > >
>> > > >Yes, as Guozhang said, the main reason we set a flag is because
>>close(0)
>> > > >is expected to be called by sender thread itself.
>> > > >If we want to maintain the semantic meaning of close(), one
>>alternative
>> > is
>> > > >to have an abort() method does the same thing as close(0) except
>> > cleanup.
>> > > >And in close(timeout), after timeout we call abort() and join the
>>sender
>> > > >thread. This was one of the previous proposal. We merged abort to
>> > close(0)
>> > > >because they are almost doing the same thing. But from what you
>> > mentioned,
>> > > >it might make sense to have two separate methods.
>> > > >
>> > > >Thanks.
>> > > >
>> > > >Jiangjie (Becket) Qin
>> > > >
>> > > >On 3/16/15, 10:31 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
>> > > >
>> > > >>Yeah in this sense the sender thread will not exist immediately
>>in the
>> > > >>close(0) call, but will only terminate after the current response
>>batch
>> > > >>has
>> > > >>been processed, as will the producer instance itself.
>> > > >>
>> > > >>There is a reason for this though: for a clean shutdown the caller
>> > thread
>> > > >>has to wait for the sender thread to join before closing the
>>producer
>> > > >>instance, but this cannot be achieve if close(0) is called by the
>> > sender
>> > > >>thread itself (for example in KAFKA-1659, there is a proposal from
>> > Andrew
>> > > >>Stein on using thread.interrupt and thread.stop, but if it is
>>called by
>> > > >>the
>> > > >>ioThread itself the stop call will fail). Hence we came up with
>>the
>> > flag
>> > > >>approach to let the sender thread to close as soon as it is at the
>> > > >>barrier
>> > > >>of the run loop.
>> > > >>
>> > > >>Guozhang
>> > > >>
>> > > >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <ju...@confluent.io> wrote:
>> > > >>
>> > > >>> Hmm, does that mean that after close(0), the sender thread is
>>not
>> > > >>>necessary
>> > > >>> gone? Normally, after closing an entity, we expect all internal
>> > threads
>> > > >>> associated with the entity are shut down completely.
>> > > >>>
>> > > >>> Thanks,
>> > > >>>
>> > > >>> Jun
>> > > >>>
>> > > >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
>> > > >>><jq...@linkedin.com.invalid>
>> > > >>> wrote:
>> > > >>>
>> > > >>> > Hi Jun,
>> > > >>> >
>> > > >>> > Close(0) will set two flags in sender. Running=false and a
>>newly
>> > > >>>added
>> > > >>> > forceClose=true. It will also set accumulator.closed=true so
>>no
>> > > >>>further
>> > > >>> > producer.send() will succeed.
>> > > >>> > The sender thread will finish executing all the callbacks in
>> > current
>> > > >>> batch
>> > > >>> > of responses, then it will see the forceClose flag. It will
>>just
>> > fail
>> > > >>>all
>> > > >>> > the incomplete batches in the producer and exit.
>> > > >>> > So close(0) is a non-blocking call and sender thread will not
>>try
>> > to
>> > > >>>join
>> > > >>> > itself in close(0).
>> > > >>> >
>> > > >>> > Thanks.
>> > > >>> >
>> > > >>> > Jiangjie (Becket) Qin
>> > > >>> >
>> > > >>> > On 3/16/15, 2:50 PM, "Jun Rao" <ju...@confluent.io> wrote:
>> > > >>> >
>> > > >>> > >How does close(0) work if it's called from the sender
>>thread? If
>> > > >>> close(0)
>> > > >>> > >needs to wait for the sender thread to join, wouldn't this
>>cause a
>> > > >>> > >deadlock?
>> > > >>> > >
>> > > >>> > >Thanks,
>> > > >>> > >
>> > > >>> > >Jun
>> > > >>> > >
>> > > >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
>> > > >>><jqin@linkedin.com.invalid
>> > > >>> >
>> > > >>> > >wrote:
>> > > >>> > >
>> > > >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered
>>without
>> > > >>> > >> discussing with you :)
>> > > >>> > >>
>> > > >>> > >> Jiangjie (Becket) Qin
>> > > >>> > >>
>> > > >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wa...@gmail.com>
>> > wrote:
>> > > >>> > >>
>> > > >>> > >> >Thanks Jiangjie,
>> > > >>> > >> >
>> > > >>> > >> >After talking to you offline on this, I have been
>>convinced and
>> > > >>> > >>changed my
>> > > >>> > >> >preference to blocking. The immediate shutdown approach
>>does
>> > have
>> > > >>> some
>> > > >>> > >> >unsafeness in some cases.
>> > > >>> > >> >
>> > > >>> > >> >Guozhang
>> > > >>> > >> >
>> > > >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
>> > > >>> > >><jqin@linkedin.com.invalid
>> > > >>> > >> >
>> > > >>> > >> >wrote:
>> > > >>> > >> >
>> > > >>> > >> >> It looks that the problem we want to solve and the
>>purpose we
>> > > >>>want
>> > > >>> to
>> > > >>> > >> >> achieve is:
>> > > >>> > >> >> If user uses close() in callback, we want to let user be
>> > aware
>> > > >>>that
>> > > >>> > >>they
>> > > >>> > >> >> should use close(0) instead of close() in the callback.
>> > > >>> > >> >>
>> > > >>> > >> >> We have agreed that we will have an error log to inform
>>user
>> > > >>>about
>> > > >>> > >>this
>> > > >>> > >> >> mis-usage. The options differ in the way how we can
>>force
>> > user
>> > > >>>to
>> > > >>> > >>take a
>> > > >>> > >> >> look at that error log.
>> > > >>> > >> >> There are two scenarios:
>> > > >>> > >> >> 1. User does not expect the program to exit.
>> > > >>> > >> >> 2. User expect the program to exit.
>> > > >>> > >> >>
>> > > >>> > >> >> For scenario 1), blocking will probably delay the
>>discovery
>> > of
>> > > >>>the
>> > > >>> > >> >> problem. Calling close(0) exposes the problem quicker.
>>In
>> > this
>> > > >>> > >>scenario
>> > > >>> > >> >> producer just encounter a send failure when running
>>normally.
>> > > >>> > >> >> For scenario 2), blocking will expose the problem quick.
>> > > >>>Calling
>> > > >>> > >> >>close(-1)
>> > > >>> > >> >> might hide the problem. This scenario might include: a)
>>Unit
>> > > >>>test
>> > > >>> > >>for a
>> > > >>> > >> >> send failure. b) Message sending during a close() call
>>from a
>> > > >>>user
>> > > >>> > >> >>thread.
>> > > >>> > >> >>
>> > > >>> > >> >> So as a summary table:
>> > > >>> > >> >>
>> > > >>> > >> >>                   Scenario 1)
>> >  Scenario
>> > > >>>2)
>> > > >>> > >> >>
>> > > >>> > >> >> Blocking      Delay problem discovery         Guaranteed
>> > > >>>problem
>> > > >>> > >> >>discovery
>> > > >>> > >> >>
>> > > >>> > >> >> Close(-1)     Immediate problem discovery     Problem
>>might
>> > be
>> > > >>> hidden
>> > > >>> > >> >>
>> > > >>> > >> >>
>> > > >>> > >> >> Personally I prefer blocking because it seems providing
>>more
>> > > >>> > >>guarantees
>> > > >>> > >> >> and safer.
>> > > >>> > >> >>
>> > > >>> > >> >> Thanks.
>> > > >>> > >> >>
>> > > >>> > >> >> Jiangjie (Becket) Qin
>> > > >>> > >> >>
>> > > >>> > >> >>
>> > > >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang"
>><wa...@gmail.com>
>> > > >>>wrote:
>> > > >>> > >> >>
>> > > >>> > >> >> >HI Jiangjie,
>> > > >>> > >> >> >
>> > > >>> > >> >> >As far as I understand calling close() in the ioThread
>>is
>> > not
>> > > >>> > >>common,
>> > > >>> > >> >>as
>> > > >>> > >> >> >it
>> > > >>> > >> >> >may only trigger when we saw some non-retriable error.
>>Hence
>> > > >>>when
>> > > >>> > >>user
>> > > >>> > >> >>run
>> > > >>> > >> >> >their program it is unlikely that close() will be
>>triggered
>> > > >>>and
>> > > >>> > >>problem
>> > > >>> > >> >> >will be detected. So it seems to me that from the error
>> > > >>>detection
>> > > >>> > >> >>aspect
>> > > >>> > >> >> >these two options seems to be the same as people will
>> > usually
>> > > >>> > >>detect it
>> > > >>> > >> >> >from the producer metrics all dropping to 0.
>> > > >>> > >> >> >
>> > > >>> > >> >> >Guozhang
>> > > >>> > >> >> >
>> > > >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
>> > > >>> > >> >><jq...@linkedin.com.invalid>
>> > > >>> > >> >> >wrote:
>> > > >>> > >> >> >
>> > > >>> > >> >> >> It seems there are two options we can choose from
>>when
>> > > >>>close()
>> > > >>> is
>> > > >>> > >> >>called
>> > > >>> > >> >> >> from sender thread (callback):
>> > > >>> > >> >> >> 1. Log an error and close the producer using
>>close(-1)
>> > > >>> > >> >> >> 2. Log an error and block.
>> > > >>> > >> >> >> (Throwing an exception will not work because we
>>catch all
>> > > >>>the
>> > > >>> > >> >>exception
>> > > >>> > >> >> >> thrown from user callback. It will just lead to an
>>error
>> > > >>>log.)
>> > > >>> > >> >> >>
>> > > >>> > >> >> >> My concern for the first option is that the producer
>>will
>> > be
>> > > >>> > >>closed
>> > > >>> > >> >>even
>> > > >>> > >> >> >> if we logged and error. I am wondering if some user
>>would
>> > > >>>not
>> > > >>> even
>> > > >>> > >> >>take
>> > > >>> > >> >> >>a
>> > > >>> > >> >> >> look at the log if producer is closed normally.
>>Because
>> > from
>> > > >>>the
>> > > >>> > >> >> >>programs
>> > > >>> > >> >> >> behavior, everything looks good. If that is the
>>case, the
>> > > >>>error
>> > > >>> > >> >>message
>> > > >>> > >> >> >>we
>> > > >>> > >> >> >> logged probably will just be ignored until some day
>>when
>> > > >>>people
>> > > >>> > >>check
>> > > >>> > >> >> >>the
>> > > >>> > >> >> >> log and see it.
>> > > >>> > >> >> >>
>> > > >>> > >> >> >> As for the second option, because producer does not
>>close
>> > > >>>but
>> > > >>> > >>blocks.
>> > > >>> > >> >> >>User
>> > > >>> > >> >> >> will notice this the first time they run the
>>program. They
>> > > >>> > >>probably
>> > > >>> > >> >>will
>> > > >>> > >> >> >> look at the log to see why producer could not be
>>closed
>> > and
>> > > >>>they
>> > > >>> > >>will
>> > > >>> > >> >> >>see
>> > > >>> > >> >> >> the error log we put there. So they will get informed
>> > about
>> > > >>>this
>> > > >>> > >> >> >>mis-usage
>> > > >>> > >> >> >> of close() in sender thread the first time they run
>>the
>> > code
>> > > >>> > >>instead
>> > > >>> > >> >>of
>> > > >>> > >> >> >> some time later.
>> > > >>> > >> >> >>
>> > > >>> > >> >> >> Personally I prefer the second one because it is more
>> > > >>>obvious
>> > > >>> that
>> > > >>> > >> >> >> something was wrong.
>> > > >>> > >> >> >>
>> > > >>> > >> >> >> Jiangjie (Becket) Qin
>> > > >>> > >> >> >>
>> > > >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang"
>><wa...@gmail.com>
>> > > >>> wrote:
>> > > >>> > >> >> >>
>> > > >>> > >> >> >> >Yeah I agree we should not silently change the
>>behavior
>> > of
>> > > >>>the
>> > > >>> > >> >>function
>> > > >>> > >> >> >> >with the given parameters; and I would prefer
>> > > >>> > >> >> >>error-logging-and-shutdown
>> > > >>> > >> >> >> >over blocking when close(>0) is used, since as Neha
>> > > >>>suggested
>> > > >>> > >> >>blocking
>> > > >>> > >> >> >> >would also not proceed with sending any data, bu
>>will
>> > just
>> > > >>>let
>> > > >>> > >> >>users to
>> > > >>> > >> >> >> >realize the issue later than sooner.
>> > > >>> > >> >> >> >
>> > > >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
>> > > >>> > >><ne...@confluent.io>
>> > > >>> > >> >> >>wrote:
>> > > >>> > >> >> >> >
>> > > >>> > >> >> >> >> >
>> > > >>> > >> >> >> >> > And I also agree it is better if we can make
>>producer
>> > > >>>block
>> > > >>> > >>when
>> > > >>> > >> >> >> >> > close() is called from sender thread so user
>>will
>> > > >>>notice
>> > > >>> > >> >>something
>> > > >>> > >> >> >> >>went
>> > > >>> > >> >> >> >> > wrong.
>> > > >>> > >> >> >> >>
>> > > >>> > >> >> >> >>
>> > > >>> > >> >> >> >> This isn't a great experience either. Why can't
>>we just
>> > > >>>throw
>> > > >>> > >>an
>> > > >>> > >> >> >> >>exception
>> > > >>> > >> >> >> >> for a behavior we know is incorrect and we'd like
>>the
>> > > >>>user to
>> > > >>> > >> >>know.
>> > > >>> > >> >> >> >> Blocking as a means of doing that seems wrong and
>> > > >>>annoying.
>> > > >>> > >> >> >> >>
>> > > >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
>> > > >>> > >><ja...@gmail.com>
>> > > >>> > >> >> >> wrote:
>> > > >>> > >> >> >> >>
>> > > >>> > >> >> >> >> > Cool.
>> > > >>> > >> >> >> >> >
>> > > >>> > >> >> >> >> > I think blocking is good or alternately
>>throwing an
>> > > >>> exception
>> > > >>> > >> >> >>directly
>> > > >>> > >> >> >> >> from
>> > > >>> > >> >> >> >> > close(). Basically I would just worry about
>>subtly
>> > > >>>doing
>> > > >>> > >> >>something
>> > > >>> > >> >> >> >> slightly
>> > > >>> > >> >> >> >> > different from what the user asked for as it
>>will be
>> > > >>>hard
>> > > >>> to
>> > > >>> > >> >>notice
>> > > >>> > >> >> >> >>that
>> > > >>> > >> >> >> >> > behavior difference.
>> > > >>> > >> >> >> >> >
>> > > >>> > >> >> >> >> > -Jay
>> > > >>> > >> >> >> >> >
>> > > >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
>> > > >>> > >> >> >> >><jqin@linkedin.com.invalid
>> > > >>> > >> >> >> >> >
>> > > >>> > >> >> >> >> > wrote:
>> > > >>> > >> >> >> >> >
>> > > >>> > >> >> >> >> > > Hi Jay,
>> > > >>> > >> >> >> >> > >
>> > > >>> > >> >> >> >> > > I have modified the KIP as you suggested. I
>>thinks
>> > as
>> > > >>> long
>> > > >>> > >>as
>> > > >>> > >> >>we
>> > > >>> > >> >> >> >>have
>> > > >>> > >> >> >> >> > > consistent define for timeout across Kafka
>> > interface,
>> > > >>> there
>> > > >>> > >> >>would
>> > > >>> > >> >> >> >>be no
>> > > >>> > >> >> >> >> > > problem. And I also agree it is better if we
>>can
>> > make
>> > > >>> > >>producer
>> > > >>> > >> >> >>block
>> > > >>> > >> >> >> >> when
>> > > >>> > >> >> >> >> > > close() is called from sender thread so user
>>will
>> > > >>>notice
>> > > >>> > >> >> >>something
>> > > >>> > >> >> >> >>went
>> > > >>> > >> >> >> >> > > wrong.
>> > > >>> > >> >> >> >> > >
>> > > >>> > >> >> >> >> > > Thanks.
>> > > >>> > >> >> >> >> > >
>> > > >>> > >> >> >> >> > > Jiangjie (Becket) Qin
>> > > >>> > >> >> >> >> > >
>> > > >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
>> > > >>><ja...@gmail.com>
>> > > >>> > >> wrote:
>> > > >>> > >> >> >> >> > >
>> > > >>> > >> >> >> >> > > >Hey Jiangjie,
>> > > >>> > >> >> >> >> > > >
>> > > >>> > >> >> >> >> > > >I think this is going to be very confusing
>>that
>> > > >>> > >> >> >> >> > > >  close(0) waits indefinitely and
>> > > >>> > >> >> >> >> > > >  close(-1) waits for 0.
>> > > >>> > >> >> >> >> > > >I understand this appears in other apis, but
>>it
>> > is a
>> > > >>> > >>constant
>> > > >>> > >> >> >> >>cause of
>> > > >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
>> > > >>> > >> >> >> >> > > >
>> > > >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't
>>need a
>> > way
>> > > >>>to
>> > > >>> > >>wait
>> > > >>> > >> >> >> >> indefinitely
>> > > >>> > >> >> >> >> > > >as
>> > > >>> > >> >> >> >> > > >we already have close() so having a magical
>> > constant
>> > > >>>for
>> > > >>> > >> >>that is
>> > > >>> > >> >> >> >> > > >redundant.
>> > > >>> > >> >> >> >> > > >
>> > > >>> > >> >> >> >> > > >Calling close() from the I/O thread was
>>already
>> > > >>>possible
>> > > >>> > >>and
>> > > >>> > >> >> >>would
>> > > >>> > >> >> >> >> block
>> > > >>> > >> >> >> >> > > >indefinitely. I think trying to silently
>>change
>> > the
>> > > >>> > >>behavior
>> > > >>> > >> >>is
>> > > >>> > >> >> >> >> probably
>> > > >>> > >> >> >> >> > > >not right. I.e. if the user calls close() in
>>the
>> > > >>> callback
>> > > >>> > >> >>there
>> > > >>> > >> >> >>is
>> > > >>> > >> >> >> >> > > >actually
>> > > >>> > >> >> >> >> > > >some misunderstanding and they need to think
>>more,
>> > > >>> > >>silently
>> > > >>> > >> >> >>making
>> > > >>> > >> >> >> >> this
>> > > >>> > >> >> >> >> > > >not
>> > > >>> > >> >> >> >> > > >block will hide the problem from them which
>>is the
>> > > >>> > >>opposite
>> > > >>> > >> >>of
>> > > >>> > >> >> >> >>what we
>> > > >>> > >> >> >> >> > > >want.
>> > > >>> > >> >> >> >> > > >
>> > > >>> > >> >> >> >> > > >-Jay
>> > > >>> > >> >> >> >> > > >
>> > > >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
>> > > >>> > >> >> >> >> <jqin@linkedin.com.invalid
>> > > >>> > >> >> >> >> > >
>> > > >>> > >> >> >> >> > > >wrote:
>> > > >>> > >> >> >> >> > > >
>> > > >>> > >> >> >> >> > > >> Hey Joe & Jay,
>> > > >>> > >> >> >> >> > > >>
>> > > >>> > >> >> >> >> > > >> Thanks for the comments on the voting
>>thread.
>> > > >>>Since it
>> > > >>> > >> >>seems
>> > > >>> > >> >> >>we
>> > > >>> > >> >> >> >> > probably
>> > > >>> > >> >> >> >> > > >> will have more discussion on this, I am
>>just
>> > > >>>replying
>> > > >>> > >>from
>> > > >>> > >> >>the
>> > > >>> > >> >> >> >> > > >>discussion
>> > > >>> > >> >> >> >> > > >> thread here.
>> > > >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less
>>like
>> > > >>> > >>half-baked,
>> > > >>> > >> >> >> >>apologize
>> > > >>> > >> >> >> >> > for
>> > > >>> > >> >> >> >> > > >> the rush...
>> > > >>> > >> >> >> >> > > >>
>> > > >>> > >> >> >> >> > > >> The contract in current KIP is:
>> > > >>> > >> >> >> >> > > >>   1. close() - wait until all requests
>>either
>> > are
>> > > >>>sent
>> > > >>> > >>or
>> > > >>> > >> >> >>reach
>> > > >>> > >> >> >> >> > request
>> > > >>> > >> >> >> >> > > >> timeout.
>> > > >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) -
>>close
>> > > >>> > >>immediately
>> > > >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) -
>> > equivalent
>> > > >>>to
>> > > >>> > >> >>close(),
>> > > >>> > >> >> >> >>i.e.
>> > > >>> > >> >> >> >> > Wait
>> > > >>> > >> >> >> >> > > >> until all requests are sent or reach
>>request
>> > > >>>timeout
>> > > >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try
>>the
>> > > >>>best to
>> > > >>> > >> >>finish
>> > > >>> > >> >> >> >> sending
>> > > >>> > >> >> >> >> > > >>in 5
>> > > >>> > >> >> >> >> > > >> milliseconds, if something went wrong, just
>> > > >>>shutdown
>> > > >>> the
>> > > >>> > >> >> >>producer
>> > > >>> > >> >> >> >> > > >>anyway,
>> > > >>> > >> >> >> >> > > >> my callback will handle the failures.
>> > > >>> > >> >> >> >> > > >>
>> > > >>> > >> >> >> >> > > >> About how we define what timeout value
>>stands
>> > for,
>> > > >>>I
>> > > >>> > >> >>actually
>> > > >>> > >> >> >> >> > struggled
>> > > >>> > >> >> >> >> > > >>a
>> > > >>> > >> >> >> >> > > >> little bit when wrote the patch.
>>Intuitively,
>> > > >>>close(0)
>> > > >>> > >> >>should
>> > > >>> > >> >> >> >>mean
>> > > >>> > >> >> >> >> > > >> immediately, however it seems that all the
>> > > >>>existing
>> > > >>> java
>> > > >>> > >> >>class
>> > > >>> > >> >> >> >>have
>> > > >>> > >> >> >> >> > this
>> > > >>> > >> >> >> >> > > >> convention of timeout=0 means no timeout or
>> > never
>> > > >>> > >>timeout
>> > > >>> > >> >> >> >> > > >>(Thread.join(0),
>> > > >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma
>>is
>> > > >>>either we
>> > > >>> > >> >>follow
>> > > >>> > >> >> >>the
>> > > >>> > >> >> >> >> > > >> intuition or we follow the convention.
>>What I
>> > > >>>chose is
>> > > >>> > >>to
>> > > >>> > >> >> >>follow
>> > > >>> > >> >> >> >>the
>> > > >>> > >> >> >> >> > > >> convention but document the interface to
>>let
>> > user
>> > > >>>be
>> > > >>> > >>aware
>> > > >>> > >> >>of
>> > > >>> > >> >> >>the
>> > > >>> > >> >> >> >> > usage.
>> > > >>> > >> >> >> >> > > >> The reason is that I think
>>producer.close() is a
>> > > >>> public
>> > > >>> > >> >> >> >>interface so
>> > > >>> > >> >> >> >> > it
>> > > >>> > >> >> >> >> > > >> might be better to follow java convention.
>> > Whereas
>> > > >>> > >> >>selector is
>> > > >>> > >> >> >> >>not a
>> > > >>> > >> >> >> >> > > >> public interface that used by user, so as
>>long
>> > as
>> > > >>>it
>> > > >>> > >>makes
>> > > >>> > >> >> >>sense
>> > > >>> > >> >> >> >>to
>> > > >>> > >> >> >> >> > us,
>> > > >>> > >> >> >> >> > > >>it
>> > > >>> > >> >> >> >> > > >> is less a problem to be different from java
>> > > >>> convention.
>> > > >>> > >> >>That
>> > > >>> > >> >> >>said
>> > > >>> > >> >> >> >> > since
>> > > >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public
>> > interface,
>> > > >>>I
>> > > >>> > >>think
>> > > >>> > >> >>it
>> > > >>> > >> >> >> >>also
>> > > >>> > >> >> >> >> > makes
>> > > >>> > >> >> >> >> > > >> sense to make producer.close() to have the
>>same
>> > > >>> > >>definition
>> > > >>> > >> >>of
>> > > >>> > >> >> >> >> > > >> consumer.poll(timeout).
>> > > >>> > >> >> >> >> > > >>
>> > > >>> > >> >> >> >> > > >> The main argument for keeping a timeout in
>>close
>> > > >>>would
>> > > >>> > >>be
>> > > >>> > >> >> >> >>separating
>> > > >>> > >> >> >> >> > the
>> > > >>> > >> >> >> >> > > >> close timeout from request timeout, which
>> > probably
>> > > >>> makes
>> > > >>> > >> >> >>sense. I
>> > > >>> > >> >> >> >> > would
>> > > >>> > >> >> >> >> > > >> guess typically the request timeout would
>>be
>> > long
>> > > >>> (e.g.
>> > > >>> > >>60
>> > > >>> > >> >> >> >>seconds)
>> > > >>> > >> >> >> >> > > >> because we might want to consider retries
>>with
>> > > >>>back
>> > > >>> off
>> > > >>> > >> >>time.
>> > > >>> > >> >> >>If
>> > > >>> > >> >> >> >>we
>> > > >>> > >> >> >> >> > have
>> > > >>> > >> >> >> >> > > >> multiple batches in accumulator, in worst
>>case
>> > > >>>that
>> > > >>> > >>could
>> > > >>> > >> >>take
>> > > >>> > >> >> >> >>up to
>> > > >>> > >> >> >> >> > > >> several minutes to complete all the
>>requests.
>> > But
>> > > >>>when
>> > > >>> > >>we
>> > > >>> > >> >> >>close a
>> > > >>> > >> >> >> >> > > >> producer, we might not want to wait for
>>that
>> > long
>> > > >>>as
>> > > >>> it
>> > > >>> > >> >>might
>> > > >>> > >> >> >> >>cause
>> > > >>> > >> >> >> >> > some
>> > > >>> > >> >> >> >> > > >> other problem like deployment tool timeout.
>> > > >>> > >> >> >> >> > > >>
>> > > >>> > >> >> >> >> > > >> There is also a subtle difference between
>> > > >>> close(timeout)
>> > > >>> > >> >>and
>> > > >>> > >> >> >> >> > > >> flush(timeout). The only purpose for
>>flush() is
>> > to
>> > > >>> write
>> > > >>> > >> >>data
>> > > >>> > >> >> >>to
>> > > >>> > >> >> >> >>the
>> > > >>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait
>>until
>> > > >>> request
>> > > >>> > >> >> >>timeout.
>> > > >>> > >> >> >> >>I
>> > > >>> > >> >> >> >> > think
>> > > >>> > >> >> >> >> > > >> that is why flush(timeout) looks strange.
>>On the
>> > > >>>other
>> > > >>> > >> >>hand,
>> > > >>> > >> >> >>the
>> > > >>> > >> >> >> >>top
>> > > >>> > >> >> >> >> > > >> priority for close() is to close the
>>producer
>> > > >>>rather
>> > > >>> > >>than
>> > > >>> > >> >> >>flush()
>> > > >>> > >> >> >> >> > data,
>> > > >>> > >> >> >> >> > > >>so
>> > > >>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded
>> > waiting
>> > > >>>for
>> > > >>> > >>its
>> > > >>> > >> >>main
>> > > >>> > >> >> >> >>job.
>> > > >>> > >> >> >> >> > > >>
>> > > >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose
>>flag.
>> > It
>> > > >>>is
>> > > >>> > >>not a
>> > > >>> > >> >> >>public
>> > > >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed
>>Changes
>> > > >>>section
>> > > >>> > >> >>which I
>> > > >>> > >> >> >> >> thought
>> > > >>> > >> >> >> >> > > >>was
>> > > >>> > >> >> >> >> > > >> supposed to provide implementation details.
>> > > >>> > >> >> >> >> > > >>
>> > > >>> > >> >> >> >> > > >> Thanks again for all the comments and
>> > suggestions!
>> > > >>> > >> >> >> >> > > >>
>> > > >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
>> > > >>> > >> >> >> >> > > >>
>> > > >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
>> > > >>> jqin@linkedin.com>
>> > > >>> > >> >> wrote:
>> > > >>> > >> >> >> >> > > >>
>> > > >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s
>> > comments.
>> > > >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process
>>if no
>> > > >>>further
>> > > >>> > >> >> >>comments
>> > > >>> > >> >> >> >>are
>> > > >>> > >> >> >> >> > > >> >received by tomorrow.
>> > > >>> > >> >> >> >> > > >> >
>> > > >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
>> > > >>> > >> >> >> >> > > >> >
>> > > >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
>> > > >>><jay.kreps@gmail.com
>> > > >>> >
>> > > >>> > >> >>wrote:
>> > > >>> > >> >> >> >> > > >> >
>> > > >>> > >> >> >> >> > > >> >>Hey Jiangjie,
>> > > >>> > >> >> >> >> > > >> >>
>> > > >>> > >> >> >> >> > > >> >>Can you capture the full motivation and
>>use
>> > > >>>cases
>> > > >>> for
>> > > >>> > >>the
>> > > >>> > >> >> >> >>feature?
>> > > >>> > >> >> >> >> > > >>This
>> > > >>> > >> >> >> >> > > >> >>mentions your interest in having a way of
>> > > >>>aborting
>> > > >>> > >>from
>> > > >>> > >> >> >>inside
>> > > >>> > >> >> >> >>the
>> > > >>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain
>>that
>> > > >>>usage
>> > > >>> or
>> > > >>> > >>why
>> > > >>> > >> >> >>other
>> > > >>> > >> >> >> >> > people
>> > > >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't
>>list
>> > the
>> > > >>> > >>primary
>> > > >>> > >> >>use
>> > > >>> > >> >> >> >>case
>> > > >>> > >> >> >> >> for
>> > > >>> > >> >> >> >> > > >> >>having
>> > > >>> > >> >> >> >> > > >> >>close with a bounded timeout which was to
>> > avoid
>> > > >>> > >>blocking
>> > > >>> > >> >>too
>> > > >>> > >> >> >> >>long
>> > > >>> > >> >> >> >> on
>> > > >>> > >> >> >> >> > > >> >>shutdown.
>> > > >>> > >> >> >> >> > > >> >>
>> > > >>> > >> >> >> >> > > >> >>-Jay
>> > > >>> > >> >> >> >> > > >> >>
>> > > >>> > >> >> >> >> > > >> >>
>> > > >>> > >> >> >> >> > > >> >>
>> > > >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM,
>>Jiangjie Qin
>> > > >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
>> > > >>> > >> >> >> >> > > >> >
>> > > >>> > >> >> >> >> > > >> >>wrote:
>> > > >>> > >> >> >> >> > > >> >>
>> > > >>> > >> >> >> >> > > >> >>> Hi,
>> > > >>> > >> >> >> >> > > >> >>>
>> > > >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
>> > > >>>close(timeout)
>> > > >>> to
>> > > >>> > >>new
>> > > >>> > >> >> >> >> producer.
>> > > >>> > >> >> >> >> > > >>Most
>> > > >>> > >> >> >> >> > > >> >>>of
>> > > >>> > >> >> >> >> > > >> >>> the previous discussions are in
>>KAFKA-1660
>> > > >>>where
>> > > >>> > >>Parth
>> > > >>> > >> >> >> >> Brahmbhatt
>> > > >>> > >> >> >> >> > > >>has
>> > > >>> > >> >> >> >> > > >> >>> already done a lot of work.
>> > > >>> > >> >> >> >> > > >> >>> Since this is an interface change so
>>we are
>> > > >>>going
>> > > >>> > >> >>through
>> > > >>> > >> >> >>the
>> > > >>> > >> >> >> >> KIP
>> > > >>> > >> >> >> >> > > >> >>>process.
>> > > >>> > >> >> >> >> > > >> >>> Here is the KIP link:
>> > > >>> > >> >> >> >> > > >> >>>
>> > > >>> > >> >> >> >> > > >> >>>
>> > > >>> > >> >> >> >> > > >>
>> > > >>> > >> >> >> >> > >
>> > > >>> > >> >> >> >>
>> > > >>> > >> >> >>
>> > > >>> > >> >>
>> > > >>> > >>
>> > > >>> >
>> > > >>>
>> > > 
>>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>> > > >>> > >> >> >> >> > > >> >>>2
>> > > >>> > >> >> >> >> > > >> >>>
>> > > >>> > >> >> >> >> > > >> >>> Thanks.
>> > > >>> > >> >> >> >> > > >> >>>
>> > > >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
>> > > >>> > >> >> >> >> > > >> >>>
>> > > >>> > >> >> >> >> > > >> >
>> > > >>> > >> >> >> >> > > >>
>> > > >>> > >> >> >> >> > > >>
>> > > >>> > >> >> >> >> > >
>> > > >>> > >> >> >> >> > >
>> > > >>> > >> >> >> >> >
>> > > >>> > >> >> >> >>
>> > > >>> > >> >> >> >>
>> > > >>> > >> >> >> >>
>> > > >>> > >> >> >> >> --
>> > > >>> > >> >> >> >> Thanks,
>> > > >>> > >> >> >> >> Neha
>> > > >>> > >> >> >> >>
>> > > >>> > >> >> >> >
>> > > >>> > >> >> >> >
>> > > >>> > >> >> >> >
>> > > >>> > >> >> >> >--
>> > > >>> > >> >> >> >-- Guozhang
>> > > >>> > >> >> >>
>> > > >>> > >> >> >>
>> > > >>> > >> >> >
>> > > >>> > >> >> >
>> > > >>> > >> >> >--
>> > > >>> > >> >> >-- Guozhang
>> > > >>> > >> >>
>> > > >>> > >> >>
>> > > >>> > >> >
>> > > >>> > >> >
>> > > >>> > >> >--
>> > > >>> > >> >-- Guozhang
>> > > >>> > >>
>> > > >>> > >>
>> > > >>> >
>> > > >>> >
>> > > >>>
>> > > >>
>> > > >>
>> > > >>
>> > > >>--
>> > > >>-- Guozhang
>> > > >
>> > >
>> > >
>> >
>> 
>> 
>> 
>> -- 
>> Thanks,
>> Neha
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Joel Koshy <jj...@gmail.com>.

(1) should work, but as Jun suggested earlier in the thread it is
slightly misleading. The (intuitive) post-condition of "close" is that
the producer has shutdown - i.e., its sender thread, closed its
metrics, serializer/deserializer, etc. That is not necessarily a
post-condition of "close(0)" although one can contend that if you call
the method in non-blocking mode (zero timeout) then it is reasonable
to not expect that post-condition.

So I think that although (2) adds one more API it brings "simplicity
by virtue of overall clarity".

I would be in favor of (2) but not strongly opposed to (1).

Thanks,

Joel

On Thu, Mar 19, 2015 at 10:05:04AM -0700, Neha Narkhede wrote:
> I'm in favor of (1) for the sake of simplicity and as Jay mentions to
> reduce the number of different APIs. Can you explain when (1) does not work?
> 
> On Wed, Mar 18, 2015 at 2:52 PM, Jay Kreps <ja...@gmail.com> wrote:
> 
> > Personally I'm in favor of (1) just to reduce the number of different APIs.
> > People will find the difference between abort and close subtle and
> > confusing and the only instance where you want it is this somewhat unusual
> > case you guys are pursuing, right?
> >
> > -Jay
> >
> > On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
> > wrote:
> >
> > > It looks we have another option and are now deciding between the
> > following
> > > two interfaces:
> > >
> > > 1. Close() + close(timeout)
> > >   - timeout could be either positive or zero.
> > >   - only close(0) can be called from sender thread
> > >
> > > 2. Close() + abort() + close(timeout)
> > >   - timeout can either be positive or zero
> > >   - only abort() can be called from sender thread
> > >
> > >   - abort() is equivalent to close(0) in 1) but does not join sender
> > > thread and does not close metrics.
> > >   - Another thread has to call close() or close(timeout) in order to make
> > > sure the resources in producer are gone.
> > >
> > > The tow approach provides the same function we need, the difference is
> > > approach 2) follows convention of close() and abort(). On the other hand,
> > > approach 1) saves one interface compared with approach 2) but does not
> > > follow the convention.
> > >
> > > When the two approaches come to user code, it is probably something like
> > > this:
> > >
> > > Try {
> > >   While(!finished)
> > >     Producer.send(record, callback)
> > > } catch (Exception e) {
> > >   Producer.close(5)
> > > }
> > >
> > > Class CallbackImpl implements Callback {
> > >   onCompletion(RecordMetadata metadata Exception e) {
> > >     If (e != null)
> > >       Abort() / close()
> > >   }
> > > }
> > >
> > > Because the two approach leads to almost the same user code, assuming
> > > users are always calling producer.close() as a clean up step, personally
> > I
> > > prefer approach 2) as it follows convention.
> > >
> > > Any thoughts?
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On 3/17/15, 10:25 AM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
> > >
> > > >Hi Jun,
> > > >
> > > >Yes, as Guozhang said, the main reason we set a flag is because close(0)
> > > >is expected to be called by sender thread itself.
> > > >If we want to maintain the semantic meaning of close(), one alternative
> > is
> > > >to have an abort() method does the same thing as close(0) except
> > cleanup.
> > > >And in close(timeout), after timeout we call abort() and join the sender
> > > >thread. This was one of the previous proposal. We merged abort to
> > close(0)
> > > >because they are almost doing the same thing. But from what you
> > mentioned,
> > > >it might make sense to have two separate methods.
> > > >
> > > >Thanks.
> > > >
> > > >Jiangjie (Becket) Qin
> > > >
> > > >On 3/16/15, 10:31 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
> > > >
> > > >>Yeah in this sense the sender thread will not exist immediately in the
> > > >>close(0) call, but will only terminate after the current response batch
> > > >>has
> > > >>been processed, as will the producer instance itself.
> > > >>
> > > >>There is a reason for this though: for a clean shutdown the caller
> > thread
> > > >>has to wait for the sender thread to join before closing the producer
> > > >>instance, but this cannot be achieve if close(0) is called by the
> > sender
> > > >>thread itself (for example in KAFKA-1659, there is a proposal from
> > Andrew
> > > >>Stein on using thread.interrupt and thread.stop, but if it is called by
> > > >>the
> > > >>ioThread itself the stop call will fail). Hence we came up with the
> > flag
> > > >>approach to let the sender thread to close as soon as it is at the
> > > >>barrier
> > > >>of the run loop.
> > > >>
> > > >>Guozhang
> > > >>
> > > >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <ju...@confluent.io> wrote:
> > > >>
> > > >>> Hmm, does that mean that after close(0), the sender thread is not
> > > >>>necessary
> > > >>> gone? Normally, after closing an entity, we expect all internal
> > threads
> > > >>> associated with the entity are shut down completely.
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> Jun
> > > >>>
> > > >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
> > > >>><jq...@linkedin.com.invalid>
> > > >>> wrote:
> > > >>>
> > > >>> > Hi Jun,
> > > >>> >
> > > >>> > Close(0) will set two flags in sender. Running=false and a newly
> > > >>>added
> > > >>> > forceClose=true. It will also set accumulator.closed=true so no
> > > >>>further
> > > >>> > producer.send() will succeed.
> > > >>> > The sender thread will finish executing all the callbacks in
> > current
> > > >>> batch
> > > >>> > of responses, then it will see the forceClose flag. It will just
> > fail
> > > >>>all
> > > >>> > the incomplete batches in the producer and exit.
> > > >>> > So close(0) is a non-blocking call and sender thread will not try
> > to
> > > >>>join
> > > >>> > itself in close(0).
> > > >>> >
> > > >>> > Thanks.
> > > >>> >
> > > >>> > Jiangjie (Becket) Qin
> > > >>> >
> > > >>> > On 3/16/15, 2:50 PM, "Jun Rao" <ju...@confluent.io> wrote:
> > > >>> >
> > > >>> > >How does close(0) work if it's called from the sender thread? If
> > > >>> close(0)
> > > >>> > >needs to wait for the sender thread to join, wouldn't this cause a
> > > >>> > >deadlock?
> > > >>> > >
> > > >>> > >Thanks,
> > > >>> > >
> > > >>> > >Jun
> > > >>> > >
> > > >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
> > > >>><jqin@linkedin.com.invalid
> > > >>> >
> > > >>> > >wrote:
> > > >>> > >
> > > >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered without
> > > >>> > >> discussing with you :)
> > > >>> > >>
> > > >>> > >> Jiangjie (Becket) Qin
> > > >>> > >>
> > > >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wa...@gmail.com>
> > wrote:
> > > >>> > >>
> > > >>> > >> >Thanks Jiangjie,
> > > >>> > >> >
> > > >>> > >> >After talking to you offline on this, I have been convinced and
> > > >>> > >>changed my
> > > >>> > >> >preference to blocking. The immediate shutdown approach does
> > have
> > > >>> some
> > > >>> > >> >unsafeness in some cases.
> > > >>> > >> >
> > > >>> > >> >Guozhang
> > > >>> > >> >
> > > >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
> > > >>> > >><jqin@linkedin.com.invalid
> > > >>> > >> >
> > > >>> > >> >wrote:
> > > >>> > >> >
> > > >>> > >> >> It looks that the problem we want to solve and the purpose we
> > > >>>want
> > > >>> to
> > > >>> > >> >> achieve is:
> > > >>> > >> >> If user uses close() in callback, we want to let user be
> > aware
> > > >>>that
> > > >>> > >>they
> > > >>> > >> >> should use close(0) instead of close() in the callback.
> > > >>> > >> >>
> > > >>> > >> >> We have agreed that we will have an error log to inform user
> > > >>>about
> > > >>> > >>this
> > > >>> > >> >> mis-usage. The options differ in the way how we can force
> > user
> > > >>>to
> > > >>> > >>take a
> > > >>> > >> >> look at that error log.
> > > >>> > >> >> There are two scenarios:
> > > >>> > >> >> 1. User does not expect the program to exit.
> > > >>> > >> >> 2. User expect the program to exit.
> > > >>> > >> >>
> > > >>> > >> >> For scenario 1), blocking will probably delay the discovery
> > of
> > > >>>the
> > > >>> > >> >> problem. Calling close(0) exposes the problem quicker. In
> > this
> > > >>> > >>scenario
> > > >>> > >> >> producer just encounter a send failure when running normally.
> > > >>> > >> >> For scenario 2), blocking will expose the problem quick.
> > > >>>Calling
> > > >>> > >> >>close(-1)
> > > >>> > >> >> might hide the problem. This scenario might include: a) Unit
> > > >>>test
> > > >>> > >>for a
> > > >>> > >> >> send failure. b) Message sending during a close() call from a
> > > >>>user
> > > >>> > >> >>thread.
> > > >>> > >> >>
> > > >>> > >> >> So as a summary table:
> > > >>> > >> >>
> > > >>> > >> >>                   Scenario 1)
> >  Scenario
> > > >>>2)
> > > >>> > >> >>
> > > >>> > >> >> Blocking      Delay problem discovery         Guaranteed
> > > >>>problem
> > > >>> > >> >>discovery
> > > >>> > >> >>
> > > >>> > >> >> Close(-1)     Immediate problem discovery     Problem might
> > be
> > > >>> hidden
> > > >>> > >> >>
> > > >>> > >> >>
> > > >>> > >> >> Personally I prefer blocking because it seems providing more
> > > >>> > >>guarantees
> > > >>> > >> >> and safer.
> > > >>> > >> >>
> > > >>> > >> >> Thanks.
> > > >>> > >> >>
> > > >>> > >> >> Jiangjie (Becket) Qin
> > > >>> > >> >>
> > > >>> > >> >>
> > > >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wa...@gmail.com>
> > > >>>wrote:
> > > >>> > >> >>
> > > >>> > >> >> >HI Jiangjie,
> > > >>> > >> >> >
> > > >>> > >> >> >As far as I understand calling close() in the ioThread is
> > not
> > > >>> > >>common,
> > > >>> > >> >>as
> > > >>> > >> >> >it
> > > >>> > >> >> >may only trigger when we saw some non-retriable error. Hence
> > > >>>when
> > > >>> > >>user
> > > >>> > >> >>run
> > > >>> > >> >> >their program it is unlikely that close() will be triggered
> > > >>>and
> > > >>> > >>problem
> > > >>> > >> >> >will be detected. So it seems to me that from the error
> > > >>>detection
> > > >>> > >> >>aspect
> > > >>> > >> >> >these two options seems to be the same as people will
> > usually
> > > >>> > >>detect it
> > > >>> > >> >> >from the producer metrics all dropping to 0.
> > > >>> > >> >> >
> > > >>> > >> >> >Guozhang
> > > >>> > >> >> >
> > > >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> > > >>> > >> >><jq...@linkedin.com.invalid>
> > > >>> > >> >> >wrote:
> > > >>> > >> >> >
> > > >>> > >> >> >> It seems there are two options we can choose from when
> > > >>>close()
> > > >>> is
> > > >>> > >> >>called
> > > >>> > >> >> >> from sender thread (callback):
> > > >>> > >> >> >> 1. Log an error and close the producer using close(-1)
> > > >>> > >> >> >> 2. Log an error and block.
> > > >>> > >> >> >> (Throwing an exception will not work because we catch all
> > > >>>the
> > > >>> > >> >>exception
> > > >>> > >> >> >> thrown from user callback. It will just lead to an error
> > > >>>log.)
> > > >>> > >> >> >>
> > > >>> > >> >> >> My concern for the first option is that the producer will
> > be
> > > >>> > >>closed
> > > >>> > >> >>even
> > > >>> > >> >> >> if we logged and error. I am wondering if some user would
> > > >>>not
> > > >>> even
> > > >>> > >> >>take
> > > >>> > >> >> >>a
> > > >>> > >> >> >> look at the log if producer is closed normally. Because
> > from
> > > >>>the
> > > >>> > >> >> >>programs
> > > >>> > >> >> >> behavior, everything looks good. If that is the case, the
> > > >>>error
> > > >>> > >> >>message
> > > >>> > >> >> >>we
> > > >>> > >> >> >> logged probably will just be ignored until some day when
> > > >>>people
> > > >>> > >>check
> > > >>> > >> >> >>the
> > > >>> > >> >> >> log and see it.
> > > >>> > >> >> >>
> > > >>> > >> >> >> As for the second option, because producer does not close
> > > >>>but
> > > >>> > >>blocks.
> > > >>> > >> >> >>User
> > > >>> > >> >> >> will notice this the first time they run the program. They
> > > >>> > >>probably
> > > >>> > >> >>will
> > > >>> > >> >> >> look at the log to see why producer could not be closed
> > and
> > > >>>they
> > > >>> > >>will
> > > >>> > >> >> >>see
> > > >>> > >> >> >> the error log we put there. So they will get informed
> > about
> > > >>>this
> > > >>> > >> >> >>mis-usage
> > > >>> > >> >> >> of close() in sender thread the first time they run the
> > code
> > > >>> > >>instead
> > > >>> > >> >>of
> > > >>> > >> >> >> some time later.
> > > >>> > >> >> >>
> > > >>> > >> >> >> Personally I prefer the second one because it is more
> > > >>>obvious
> > > >>> that
> > > >>> > >> >> >> something was wrong.
> > > >>> > >> >> >>
> > > >>> > >> >> >> Jiangjie (Becket) Qin
> > > >>> > >> >> >>
> > > >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com>
> > > >>> wrote:
> > > >>> > >> >> >>
> > > >>> > >> >> >> >Yeah I agree we should not silently change the behavior
> > of
> > > >>>the
> > > >>> > >> >>function
> > > >>> > >> >> >> >with the given parameters; and I would prefer
> > > >>> > >> >> >>error-logging-and-shutdown
> > > >>> > >> >> >> >over blocking when close(>0) is used, since as Neha
> > > >>>suggested
> > > >>> > >> >>blocking
> > > >>> > >> >> >> >would also not proceed with sending any data, bu will
> > just
> > > >>>let
> > > >>> > >> >>users to
> > > >>> > >> >> >> >realize the issue later than sooner.
> > > >>> > >> >> >> >
> > > >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
> > > >>> > >><ne...@confluent.io>
> > > >>> > >> >> >>wrote:
> > > >>> > >> >> >> >
> > > >>> > >> >> >> >> >
> > > >>> > >> >> >> >> > And I also agree it is better if we can make producer
> > > >>>block
> > > >>> > >>when
> > > >>> > >> >> >> >> > close() is called from sender thread so user will
> > > >>>notice
> > > >>> > >> >>something
> > > >>> > >> >> >> >>went
> > > >>> > >> >> >> >> > wrong.
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >> This isn't a great experience either. Why can't we just
> > > >>>throw
> > > >>> > >>an
> > > >>> > >> >> >> >>exception
> > > >>> > >> >> >> >> for a behavior we know is incorrect and we'd like the
> > > >>>user to
> > > >>> > >> >>know.
> > > >>> > >> >> >> >> Blocking as a means of doing that seems wrong and
> > > >>>annoying.
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
> > > >>> > >><ja...@gmail.com>
> > > >>> > >> >> >> wrote:
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >> > Cool.
> > > >>> > >> >> >> >> >
> > > >>> > >> >> >> >> > I think blocking is good or alternately throwing an
> > > >>> exception
> > > >>> > >> >> >>directly
> > > >>> > >> >> >> >> from
> > > >>> > >> >> >> >> > close(). Basically I would just worry about subtly
> > > >>>doing
> > > >>> > >> >>something
> > > >>> > >> >> >> >> slightly
> > > >>> > >> >> >> >> > different from what the user asked for as it will be
> > > >>>hard
> > > >>> to
> > > >>> > >> >>notice
> > > >>> > >> >> >> >>that
> > > >>> > >> >> >> >> > behavior difference.
> > > >>> > >> >> >> >> >
> > > >>> > >> >> >> >> > -Jay
> > > >>> > >> >> >> >> >
> > > >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> > > >>> > >> >> >> >><jqin@linkedin.com.invalid
> > > >>> > >> >> >> >> >
> > > >>> > >> >> >> >> > wrote:
> > > >>> > >> >> >> >> >
> > > >>> > >> >> >> >> > > Hi Jay,
> > > >>> > >> >> >> >> > >
> > > >>> > >> >> >> >> > > I have modified the KIP as you suggested. I thinks
> > as
> > > >>> long
> > > >>> > >>as
> > > >>> > >> >>we
> > > >>> > >> >> >> >>have
> > > >>> > >> >> >> >> > > consistent define for timeout across Kafka
> > interface,
> > > >>> there
> > > >>> > >> >>would
> > > >>> > >> >> >> >>be no
> > > >>> > >> >> >> >> > > problem. And I also agree it is better if we can
> > make
> > > >>> > >>producer
> > > >>> > >> >> >>block
> > > >>> > >> >> >> >> when
> > > >>> > >> >> >> >> > > close() is called from sender thread so user will
> > > >>>notice
> > > >>> > >> >> >>something
> > > >>> > >> >> >> >>went
> > > >>> > >> >> >> >> > > wrong.
> > > >>> > >> >> >> >> > >
> > > >>> > >> >> >> >> > > Thanks.
> > > >>> > >> >> >> >> > >
> > > >>> > >> >> >> >> > > Jiangjie (Becket) Qin
> > > >>> > >> >> >> >> > >
> > > >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
> > > >>><ja...@gmail.com>
> > > >>> > >> wrote:
> > > >>> > >> >> >> >> > >
> > > >>> > >> >> >> >> > > >Hey Jiangjie,
> > > >>> > >> >> >> >> > > >
> > > >>> > >> >> >> >> > > >I think this is going to be very confusing that
> > > >>> > >> >> >> >> > > >  close(0) waits indefinitely and
> > > >>> > >> >> >> >> > > >  close(-1) waits for 0.
> > > >>> > >> >> >> >> > > >I understand this appears in other apis, but it
> > is a
> > > >>> > >>constant
> > > >>> > >> >> >> >>cause of
> > > >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
> > > >>> > >> >> >> >> > > >
> > > >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't need a
> > way
> > > >>>to
> > > >>> > >>wait
> > > >>> > >> >> >> >> indefinitely
> > > >>> > >> >> >> >> > > >as
> > > >>> > >> >> >> >> > > >we already have close() so having a magical
> > constant
> > > >>>for
> > > >>> > >> >>that is
> > > >>> > >> >> >> >> > > >redundant.
> > > >>> > >> >> >> >> > > >
> > > >>> > >> >> >> >> > > >Calling close() from the I/O thread was already
> > > >>>possible
> > > >>> > >>and
> > > >>> > >> >> >>would
> > > >>> > >> >> >> >> block
> > > >>> > >> >> >> >> > > >indefinitely. I think trying to silently change
> > the
> > > >>> > >>behavior
> > > >>> > >> >>is
> > > >>> > >> >> >> >> probably
> > > >>> > >> >> >> >> > > >not right. I.e. if the user calls close() in the
> > > >>> callback
> > > >>> > >> >>there
> > > >>> > >> >> >>is
> > > >>> > >> >> >> >> > > >actually
> > > >>> > >> >> >> >> > > >some misunderstanding and they need to think more,
> > > >>> > >>silently
> > > >>> > >> >> >>making
> > > >>> > >> >> >> >> this
> > > >>> > >> >> >> >> > > >not
> > > >>> > >> >> >> >> > > >block will hide the problem from them which is the
> > > >>> > >>opposite
> > > >>> > >> >>of
> > > >>> > >> >> >> >>what we
> > > >>> > >> >> >> >> > > >want.
> > > >>> > >> >> >> >> > > >
> > > >>> > >> >> >> >> > > >-Jay
> > > >>> > >> >> >> >> > > >
> > > >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> > > >>> > >> >> >> >> <jqin@linkedin.com.invalid
> > > >>> > >> >> >> >> > >
> > > >>> > >> >> >> >> > > >wrote:
> > > >>> > >> >> >> >> > > >
> > > >>> > >> >> >> >> > > >> Hey Joe & Jay,
> > > >>> > >> >> >> >> > > >>
> > > >>> > >> >> >> >> > > >> Thanks for the comments on the voting thread.
> > > >>>Since it
> > > >>> > >> >>seems
> > > >>> > >> >> >>we
> > > >>> > >> >> >> >> > probably
> > > >>> > >> >> >> >> > > >> will have more discussion on this, I am just
> > > >>>replying
> > > >>> > >>from
> > > >>> > >> >>the
> > > >>> > >> >> >> >> > > >>discussion
> > > >>> > >> >> >> >> > > >> thread here.
> > > >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less like
> > > >>> > >>half-baked,
> > > >>> > >> >> >> >>apologize
> > > >>> > >> >> >> >> > for
> > > >>> > >> >> >> >> > > >> the rush...
> > > >>> > >> >> >> >> > > >>
> > > >>> > >> >> >> >> > > >> The contract in current KIP is:
> > > >>> > >> >> >> >> > > >>   1. close() - wait until all requests either
> > are
> > > >>>sent
> > > >>> > >>or
> > > >>> > >> >> >>reach
> > > >>> > >> >> >> >> > request
> > > >>> > >> >> >> >> > > >> timeout.
> > > >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close
> > > >>> > >>immediately
> > > >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) -
> > equivalent
> > > >>>to
> > > >>> > >> >>close(),
> > > >>> > >> >> >> >>i.e.
> > > >>> > >> >> >> >> > Wait
> > > >>> > >> >> >> >> > > >> until all requests are sent or reach request
> > > >>>timeout
> > > >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the
> > > >>>best to
> > > >>> > >> >>finish
> > > >>> > >> >> >> >> sending
> > > >>> > >> >> >> >> > > >>in 5
> > > >>> > >> >> >> >> > > >> milliseconds, if something went wrong, just
> > > >>>shutdown
> > > >>> the
> > > >>> > >> >> >>producer
> > > >>> > >> >> >> >> > > >>anyway,
> > > >>> > >> >> >> >> > > >> my callback will handle the failures.
> > > >>> > >> >> >> >> > > >>
> > > >>> > >> >> >> >> > > >> About how we define what timeout value stands
> > for,
> > > >>>I
> > > >>> > >> >>actually
> > > >>> > >> >> >> >> > struggled
> > > >>> > >> >> >> >> > > >>a
> > > >>> > >> >> >> >> > > >> little bit when wrote the patch. Intuitively,
> > > >>>close(0)
> > > >>> > >> >>should
> > > >>> > >> >> >> >>mean
> > > >>> > >> >> >> >> > > >> immediately, however it seems that all the
> > > >>>existing
> > > >>> java
> > > >>> > >> >>class
> > > >>> > >> >> >> >>have
> > > >>> > >> >> >> >> > this
> > > >>> > >> >> >> >> > > >> convention of timeout=0 means no timeout or
> > never
> > > >>> > >>timeout
> > > >>> > >> >> >> >> > > >>(Thread.join(0),
> > > >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is
> > > >>>either we
> > > >>> > >> >>follow
> > > >>> > >> >> >>the
> > > >>> > >> >> >> >> > > >> intuition or we follow the convention. What I
> > > >>>chose is
> > > >>> > >>to
> > > >>> > >> >> >>follow
> > > >>> > >> >> >> >>the
> > > >>> > >> >> >> >> > > >> convention but document the interface to let
> > user
> > > >>>be
> > > >>> > >>aware
> > > >>> > >> >>of
> > > >>> > >> >> >>the
> > > >>> > >> >> >> >> > usage.
> > > >>> > >> >> >> >> > > >> The reason is that I think producer.close() is a
> > > >>> public
> > > >>> > >> >> >> >>interface so
> > > >>> > >> >> >> >> > it
> > > >>> > >> >> >> >> > > >> might be better to follow java convention.
> > Whereas
> > > >>> > >> >>selector is
> > > >>> > >> >> >> >>not a
> > > >>> > >> >> >> >> > > >> public interface that used by user, so as long
> > as
> > > >>>it
> > > >>> > >>makes
> > > >>> > >> >> >>sense
> > > >>> > >> >> >> >>to
> > > >>> > >> >> >> >> > us,
> > > >>> > >> >> >> >> > > >>it
> > > >>> > >> >> >> >> > > >> is less a problem to be different from java
> > > >>> convention.
> > > >>> > >> >>That
> > > >>> > >> >> >>said
> > > >>> > >> >> >> >> > since
> > > >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public
> > interface,
> > > >>>I
> > > >>> > >>think
> > > >>> > >> >>it
> > > >>> > >> >> >> >>also
> > > >>> > >> >> >> >> > makes
> > > >>> > >> >> >> >> > > >> sense to make producer.close() to have the same
> > > >>> > >>definition
> > > >>> > >> >>of
> > > >>> > >> >> >> >> > > >> consumer.poll(timeout).
> > > >>> > >> >> >> >> > > >>
> > > >>> > >> >> >> >> > > >> The main argument for keeping a timeout in close
> > > >>>would
> > > >>> > >>be
> > > >>> > >> >> >> >>separating
> > > >>> > >> >> >> >> > the
> > > >>> > >> >> >> >> > > >> close timeout from request timeout, which
> > probably
> > > >>> makes
> > > >>> > >> >> >>sense. I
> > > >>> > >> >> >> >> > would
> > > >>> > >> >> >> >> > > >> guess typically the request timeout would be
> > long
> > > >>> (e.g.
> > > >>> > >>60
> > > >>> > >> >> >> >>seconds)
> > > >>> > >> >> >> >> > > >> because we might want to consider retries with
> > > >>>back
> > > >>> off
> > > >>> > >> >>time.
> > > >>> > >> >> >>If
> > > >>> > >> >> >> >>we
> > > >>> > >> >> >> >> > have
> > > >>> > >> >> >> >> > > >> multiple batches in accumulator, in worst case
> > > >>>that
> > > >>> > >>could
> > > >>> > >> >>take
> > > >>> > >> >> >> >>up to
> > > >>> > >> >> >> >> > > >> several minutes to complete all the requests.
> > But
> > > >>>when
> > > >>> > >>we
> > > >>> > >> >> >>close a
> > > >>> > >> >> >> >> > > >> producer, we might not want to wait for that
> > long
> > > >>>as
> > > >>> it
> > > >>> > >> >>might
> > > >>> > >> >> >> >>cause
> > > >>> > >> >> >> >> > some
> > > >>> > >> >> >> >> > > >> other problem like deployment tool timeout.
> > > >>> > >> >> >> >> > > >>
> > > >>> > >> >> >> >> > > >> There is also a subtle difference between
> > > >>> close(timeout)
> > > >>> > >> >>and
> > > >>> > >> >> >> >> > > >> flush(timeout). The only purpose for flush() is
> > to
> > > >>> write
> > > >>> > >> >>data
> > > >>> > >> >> >>to
> > > >>> > >> >> >> >>the
> > > >>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait until
> > > >>> request
> > > >>> > >> >> >>timeout.
> > > >>> > >> >> >> >>I
> > > >>> > >> >> >> >> > think
> > > >>> > >> >> >> >> > > >> that is why flush(timeout) looks strange. On the
> > > >>>other
> > > >>> > >> >>hand,
> > > >>> > >> >> >>the
> > > >>> > >> >> >> >>top
> > > >>> > >> >> >> >> > > >> priority for close() is to close the producer
> > > >>>rather
> > > >>> > >>than
> > > >>> > >> >> >>flush()
> > > >>> > >> >> >> >> > data,
> > > >>> > >> >> >> >> > > >>so
> > > >>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded
> > waiting
> > > >>>for
> > > >>> > >>its
> > > >>> > >> >>main
> > > >>> > >> >> >> >>job.
> > > >>> > >> >> >> >> > > >>
> > > >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose flag.
> > It
> > > >>>is
> > > >>> > >>not a
> > > >>> > >> >> >>public
> > > >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed Changes
> > > >>>section
> > > >>> > >> >>which I
> > > >>> > >> >> >> >> thought
> > > >>> > >> >> >> >> > > >>was
> > > >>> > >> >> >> >> > > >> supposed to provide implementation details.
> > > >>> > >> >> >> >> > > >>
> > > >>> > >> >> >> >> > > >> Thanks again for all the comments and
> > suggestions!
> > > >>> > >> >> >> >> > > >>
> > > >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
> > > >>> > >> >> >> >> > > >>
> > > >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
> > > >>> jqin@linkedin.com>
> > > >>> > >> >> wrote:
> > > >>> > >> >> >> >> > > >>
> > > >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s
> > comments.
> > > >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process if no
> > > >>>further
> > > >>> > >> >> >>comments
> > > >>> > >> >> >> >>are
> > > >>> > >> >> >> >> > > >> >received by tomorrow.
> > > >>> > >> >> >> >> > > >> >
> > > >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
> > > >>> > >> >> >> >> > > >> >
> > > >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
> > > >>><jay.kreps@gmail.com
> > > >>> >
> > > >>> > >> >>wrote:
> > > >>> > >> >> >> >> > > >> >
> > > >>> > >> >> >> >> > > >> >>Hey Jiangjie,
> > > >>> > >> >> >> >> > > >> >>
> > > >>> > >> >> >> >> > > >> >>Can you capture the full motivation and use
> > > >>>cases
> > > >>> for
> > > >>> > >>the
> > > >>> > >> >> >> >>feature?
> > > >>> > >> >> >> >> > > >>This
> > > >>> > >> >> >> >> > > >> >>mentions your interest in having a way of
> > > >>>aborting
> > > >>> > >>from
> > > >>> > >> >> >>inside
> > > >>> > >> >> >> >>the
> > > >>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain that
> > > >>>usage
> > > >>> or
> > > >>> > >>why
> > > >>> > >> >> >>other
> > > >>> > >> >> >> >> > people
> > > >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't list
> > the
> > > >>> > >>primary
> > > >>> > >> >>use
> > > >>> > >> >> >> >>case
> > > >>> > >> >> >> >> for
> > > >>> > >> >> >> >> > > >> >>having
> > > >>> > >> >> >> >> > > >> >>close with a bounded timeout which was to
> > avoid
> > > >>> > >>blocking
> > > >>> > >> >>too
> > > >>> > >> >> >> >>long
> > > >>> > >> >> >> >> on
> > > >>> > >> >> >> >> > > >> >>shutdown.
> > > >>> > >> >> >> >> > > >> >>
> > > >>> > >> >> >> >> > > >> >>-Jay
> > > >>> > >> >> >> >> > > >> >>
> > > >>> > >> >> >> >> > > >> >>
> > > >>> > >> >> >> >> > > >> >>
> > > >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> > > >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
> > > >>> > >> >> >> >> > > >> >
> > > >>> > >> >> >> >> > > >> >>wrote:
> > > >>> > >> >> >> >> > > >> >>
> > > >>> > >> >> >> >> > > >> >>> Hi,
> > > >>> > >> >> >> >> > > >> >>>
> > > >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
> > > >>>close(timeout)
> > > >>> to
> > > >>> > >>new
> > > >>> > >> >> >> >> producer.
> > > >>> > >> >> >> >> > > >>Most
> > > >>> > >> >> >> >> > > >> >>>of
> > > >>> > >> >> >> >> > > >> >>> the previous discussions are in KAFKA-1660
> > > >>>where
> > > >>> > >>Parth
> > > >>> > >> >> >> >> Brahmbhatt
> > > >>> > >> >> >> >> > > >>has
> > > >>> > >> >> >> >> > > >> >>> already done a lot of work.
> > > >>> > >> >> >> >> > > >> >>> Since this is an interface change so we are
> > > >>>going
> > > >>> > >> >>through
> > > >>> > >> >> >>the
> > > >>> > >> >> >> >> KIP
> > > >>> > >> >> >> >> > > >> >>>process.
> > > >>> > >> >> >> >> > > >> >>> Here is the KIP link:
> > > >>> > >> >> >> >> > > >> >>>
> > > >>> > >> >> >> >> > > >> >>>
> > > >>> > >> >> >> >> > > >>
> > > >>> > >> >> >> >> > >
> > > >>> > >> >> >> >>
> > > >>> > >> >> >>
> > > >>> > >> >>
> > > >>> > >>
> > > >>> >
> > > >>>
> > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> > > >>> > >> >> >> >> > > >> >>>2
> > > >>> > >> >> >> >> > > >> >>>
> > > >>> > >> >> >> >> > > >> >>> Thanks.
> > > >>> > >> >> >> >> > > >> >>>
> > > >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> > > >>> > >> >> >> >> > > >> >>>
> > > >>> > >> >> >> >> > > >> >
> > > >>> > >> >> >> >> > > >>
> > > >>> > >> >> >> >> > > >>
> > > >>> > >> >> >> >> > >
> > > >>> > >> >> >> >> > >
> > > >>> > >> >> >> >> >
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >> --
> > > >>> > >> >> >> >> Thanks,
> > > >>> > >> >> >> >> Neha
> > > >>> > >> >> >> >>
> > > >>> > >> >> >> >
> > > >>> > >> >> >> >
> > > >>> > >> >> >> >
> > > >>> > >> >> >> >--
> > > >>> > >> >> >> >-- Guozhang
> > > >>> > >> >> >>
> > > >>> > >> >> >>
> > > >>> > >> >> >
> > > >>> > >> >> >
> > > >>> > >> >> >--
> > > >>> > >> >> >-- Guozhang
> > > >>> > >> >>
> > > >>> > >> >>
> > > >>> > >> >
> > > >>> > >> >
> > > >>> > >> >--
> > > >>> > >> >-- Guozhang
> > > >>> > >>
> > > >>> > >>
> > > >>> >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >>--
> > > >>-- Guozhang
> > > >
> > >
> > >
> >
> 
> 
> 
> -- 
> Thanks,
> Neha

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jun Rao <ju...@confluent.io>.

That's true. However, in general, timeout is not going to be enforced
precisely due to other things like GCs.

Thanks,

Jun

On Fri, Mar 20, 2015 at 4:39 PM, Guozhang Wang <wa...@gmail.com> wrote:

> With this approach close(timeout > 0) will not actually close by the
> elapsed timeout right? This semantics mismatch is a bit concerning to me..
>
> Guozhang
>
> On Fri, Mar 20, 2015 at 2:37 PM, Jun Rao <ju...@confluent.io> wrote:
>
> > Yes, that probably is simpler. Basically, close() and close(timeout) will
> > always wait for the sender thread to complete. Close(0) will just
> initiate
> > the shutdown of the sender thread, but not waiting for the thread to
> > complete.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Mar 20, 2015 at 11:29 AM, Jiangjie Qin <jqin@linkedin.com.invalid
> >
> > wrote:
> >
> > > That’s a very good point.
> > > Currently if timeout > 0, it will wait up to timeout for the sender
> > thread
> > > to complete. After that it will do a force close but not block.
> > >
> > > Maybe we should make it to be:
> > > If timeout > 0, wait up to timeout for the sender thread to complete.
> > > After that, make a force close but still try to join the sender thread.
> > > If timeout = 0, it is non-blocking.
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On 3/20/15, 9:16 AM, "Jun Rao" <ju...@confluent.io> wrote:
> > >
> > > >Got it. Just to clarify, does close(timeout) always wait for the
> sender
> > > >thread to complete?
> > > >
> > > >Thanks,
> > > >
> > > >Jun
> > > >
> > > >On Fri, Mar 20, 2015 at 7:41 AM, Jiangjie Qin
> <jqin@linkedin.com.invalid
> > >
> > > >wrote:
> > > >
> > > >> I think the we agreed that we are going to log an error and block.
> By
> > > >> doing this we can make sure the error log to be checked by user in
> all
> > > >> cases.
> > > >> If we silently replace close() to close(0) in sender thread, in some
> > > >>cases
> > > >> such as send error during a normal close(), user might not notice
> > > >> something went wrong.
> > > >>
> > > >> On 3/19/15, 10:34 PM, "Joel Koshy" <jj...@gmail.com> wrote:
> > > >>
> > > >> >close can probably just getCurrentThread and check if ==
> > senderThread.
> > > >> >I'm actually not sure from this thread if there was clear agreement
> > on
> > > >> >whether it should change close(timeout)/close() to close(0) or if
> it
> > > >> >should
> > > >> >log an error and block up to the timeout.
> > > >> >
> > > >> >On Thursday, March 19, 2015, Jun Rao <ju...@confluent.io> wrote:
> > > >> >
> > > >> >> So in (1), if a close() or close(timeout) is called from a
> > callback,
> > > >>we
> > > >> >> will just turn that into a close(0)? Implementation wise, how do
> we
> > > >>know
> > > >> >> whether a close() call is made from the sender thread or not?
> > > >> >>
> > > >> >> Thanks,
> > > >> >>
> > > >> >> Jun
> > > >> >>
> > > >> >> On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin
> > > >> >><jq...@linkedin.com.invalid>
> > > >> >> wrote:
> > > >> >>
> > > >> >> > It looks we have another option and are now deciding between
> the
> > > >> >> following
> > > >> >> > two interfaces:
> > > >> >> >
> > > >> >> > 1. Close() + close(timeout)
> > > >> >> >   - timeout could be either positive or zero.
> > > >> >> >   - only close(0) can be called from sender thread
> > > >> >> >
> > > >> >> > 2. Close() + abort() + close(timeout)
> > > >> >> >   - timeout can either be positive or zero
> > > >> >> >   - only abort() can be called from sender thread
> > > >> >> >
> > > >> >> >   - abort() is equivalent to close(0) in 1) but does not join
> > > >>sender
> > > >> >> > thread and does not close metrics.
> > > >> >> >   - Another thread has to call close() or close(timeout) in
> order
> > > >>to
> > > >> >>make
> > > >> >> > sure the resources in producer are gone.
> > > >> >> >
> > > >> >> > The tow approach provides the same function we need, the
> > > >>difference is
> > > >> >> > approach 2) follows convention of close() and abort(). On the
> > other
> > > >> >>hand,
> > > >> >> > approach 1) saves one interface compared with approach 2) but
> > does
> > > >>not
> > > >> >> > follow the convention.
> > > >> >> >
> > > >> >> > When the two approaches come to user code, it is probably
> > something
> > > >> >>like
> > > >> >> > this:
> > > >> >> >
> > > >> >> > Try {
> > > >> >> >   While(!finished)
> > > >> >> >     Producer.send(record, callback)
> > > >> >> > } catch (Exception e) {
> > > >> >> >   Producer.close(5)
> > > >> >> > }
> > > >> >> >
> > > >> >> > Class CallbackImpl implements Callback {
> > > >> >> >   onCompletion(RecordMetadata metadata Exception e) {
> > > >> >> >     If (e != null)
> > > >> >> >       Abort() / close()
> > > >> >> >   }
> > > >> >> > }
> > > >> >> >
> > > >> >> > Because the two approach leads to almost the same user code,
> > > >>assuming
> > > >> >> > users are always calling producer.close() as a clean up step,
> > > >> >>personally
> > > >> >> I
> > > >> >> > prefer approach 2) as it follows convention.
> > > >> >> >
> > > >> >> > Any thoughts?
> > > >> >> >
> > > >> >> > Jiangjie (Becket) Qin
> > > >> >> >
> > > >> >> >
> > > >> >> > On 3/17/15, 10:25 AM, "Jiangjie Qin" <jqin@linkedin.com
> > > >> >><javascript:;>>
> > > >> >> wrote:
> > > >> >> >
> > > >> >> > >Hi Jun,
> > > >> >> > >
> > > >> >> > >Yes, as Guozhang said, the main reason we set a flag is
> because
> > > >> >>close(0)
> > > >> >> > >is expected to be called by sender thread itself.
> > > >> >> > >If we want to maintain the semantic meaning of close(), one
> > > >> >>alternative
> > > >> >> is
> > > >> >> > >to have an abort() method does the same thing as close(0)
> except
> > > >> >> cleanup.
> > > >> >> > >And in close(timeout), after timeout we call abort() and join
> > the
> > > >> >>sender
> > > >> >> > >thread. This was one of the previous proposal. We merged abort
> > to
> > > >> >> close(0)
> > > >> >> > >because they are almost doing the same thing. But from what
> you
> > > >> >> mentioned,
> > > >> >> > >it might make sense to have two separate methods.
> > > >> >> > >
> > > >> >> > >Thanks.
> > > >> >> > >
> > > >> >> > >Jiangjie (Becket) Qin
> > > >> >> > >
> > > >> >> > >On 3/16/15, 10:31 PM, "Guozhang Wang" <wangguoz@gmail.com
> > > >> >> <javascript:;>> wrote:
> > > >> >> > >
> > > >> >> > >>Yeah in this sense the sender thread will not exist
> immediately
> > > >>in
> > > >> >>the
> > > >> >> > >>close(0) call, but will only terminate after the current
> > response
> > > >> >>batch
> > > >> >> > >>has
> > > >> >> > >>been processed, as will the producer instance itself.
> > > >> >> > >>
> > > >> >> > >>There is a reason for this though: for a clean shutdown the
> > > >>caller
> > > >> >> thread
> > > >> >> > >>has to wait for the sender thread to join before closing the
> > > >> >>producer
> > > >> >> > >>instance, but this cannot be achieve if close(0) is called by
> > the
> > > >> >> sender
> > > >> >> > >>thread itself (for example in KAFKA-1659, there is a proposal
> > > >>from
> > > >> >> Andrew
> > > >> >> > >>Stein on using thread.interrupt and thread.stop, but if it is
> > > >> >>called by
> > > >> >> > >>the
> > > >> >> > >>ioThread itself the stop call will fail). Hence we came up
> with
> > > >>the
> > > >> >> flag
> > > >> >> > >>approach to let the sender thread to close as soon as it is
> at
> > > >>the
> > > >> >> > >>barrier
> > > >> >> > >>of the run loop.
> > > >> >> > >>
> > > >> >> > >>Guozhang
> > > >> >> > >>
> > > >> >> > >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <jun@confluent.io
> > > >> >> <javascript:;>> wrote:
> > > >> >> > >>
> > > >> >> > >>> Hmm, does that mean that after close(0), the sender thread
> is
> > > >>not
> > > >> >> > >>>necessary
> > > >> >> > >>> gone? Normally, after closing an entity, we expect all
> > internal
> > > >> >> threads
> > > >> >> > >>> associated with the entity are shut down completely.
> > > >> >> > >>>
> > > >> >> > >>> Thanks,
> > > >> >> > >>>
> > > >> >> > >>> Jun
> > > >> >> > >>>
> > > >> >> > >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
> > > >> >> > >>><jq...@linkedin.com.invalid>
> > > >> >> > >>> wrote:
> > > >> >> > >>>
> > > >> >> > >>> > Hi Jun,
> > > >> >> > >>> >
> > > >> >> > >>> > Close(0) will set two flags in sender. Running=false and
> a
> > > >>newly
> > > >> >> > >>>added
> > > >> >> > >>> > forceClose=true. It will also set accumulator.closed=true
> > so
> > > >>no
> > > >> >> > >>>further
> > > >> >> > >>> > producer.send() will succeed.
> > > >> >> > >>> > The sender thread will finish executing all the callbacks
> > in
> > > >> >> current
> > > >> >> > >>> batch
> > > >> >> > >>> > of responses, then it will see the forceClose flag. It
> will
> > > >>just
> > > >> >> fail
> > > >> >> > >>>all
> > > >> >> > >>> > the incomplete batches in the producer and exit.
> > > >> >> > >>> > So close(0) is a non-blocking call and sender thread will
> > not
> > > >> >>try
> > > >> >> to
> > > >> >> > >>>join
> > > >> >> > >>> > itself in close(0).
> > > >> >> > >>> >
> > > >> >> > >>> > Thanks.
> > > >> >> > >>> >
> > > >> >> > >>> > Jiangjie (Becket) Qin
> > > >> >> > >>> >
> > > >> >> > >>> > On 3/16/15, 2:50 PM, "Jun Rao" <jun@confluent.io
> > > >> <javascript:;>>
> > > >> >> wrote:
> > > >> >> > >>> >
> > > >> >> > >>> > >How does close(0) work if it's called from the sender
> > > >>thread?
> > > >> >>If
> > > >> >> > >>> close(0)
> > > >> >> > >>> > >needs to wait for the sender thread to join, wouldn't
> this
> > > >> >>cause a
> > > >> >> > >>> > >deadlock?
> > > >> >> > >>> > >
> > > >> >> > >>> > >Thanks,
> > > >> >> > >>> > >
> > > >> >> > >>> > >Jun
> > > >> >> > >>> > >
> > > >> >> > >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
> > > >> >> > >>><jqin@linkedin.com.invalid
> > > >> >> > >>> >
> > > >> >> > >>> > >wrote:
> > > >> >> > >>> > >
> > > >> >> > >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly
> considered
> > > >> >>without
> > > >> >> > >>> > >> discussing with you :)
> > > >> >> > >>> > >>
> > > >> >> > >>> > >> Jiangjie (Becket) Qin
> > > >> >> > >>> > >>
> > > >> >> > >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <
> > wangguoz@gmail.com
> > > >> >> <javascript:;>> wrote:
> > > >> >> > >>> > >>
> > > >> >> > >>> > >> >Thanks Jiangjie,
> > > >> >> > >>> > >> >
> > > >> >> > >>> > >> >After talking to you offline on this, I have been
> > > >>convinced
> > > >> >>and
> > > >> >> > >>> > >>changed my
> > > >> >> > >>> > >> >preference to blocking. The immediate shutdown
> approach
> > > >>does
> > > >> >> have
> > > >> >> > >>> some
> > > >> >> > >>> > >> >unsafeness in some cases.
> > > >> >> > >>> > >> >
> > > >> >> > >>> > >> >Guozhang
> > > >> >> > >>> > >> >
> > > >> >> > >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
> > > >> >> > >>> > >><jqin@linkedin.com.invalid
> > > >> >> > >>> > >> >
> > > >> >> > >>> > >> >wrote:
> > > >> >> > >>> > >> >
> > > >> >> > >>> > >> >> It looks that the problem we want to solve and the
> > > >> >>purpose we
> > > >> >> > >>>want
> > > >> >> > >>> to
> > > >> >> > >>> > >> >> achieve is:
> > > >> >> > >>> > >> >> If user uses close() in callback, we want to let
> user
> > > >>be
> > > >> >> aware
> > > >> >> > >>>that
> > > >> >> > >>> > >>they
> > > >> >> > >>> > >> >> should use close(0) instead of close() in the
> > callback.
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >> We have agreed that we will have an error log to
> > inform
> > > >> >>user
> > > >> >> > >>>about
> > > >> >> > >>> > >>this
> > > >> >> > >>> > >> >> mis-usage. The options differ in the way how we can
> > > >>force
> > > >> >> user
> > > >> >> > >>>to
> > > >> >> > >>> > >>take a
> > > >> >> > >>> > >> >> look at that error log.
> > > >> >> > >>> > >> >> There are two scenarios:
> > > >> >> > >>> > >> >> 1. User does not expect the program to exit.
> > > >> >> > >>> > >> >> 2. User expect the program to exit.
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >> For scenario 1), blocking will probably delay the
> > > >> >>discovery
> > > >> >> of
> > > >> >> > >>>the
> > > >> >> > >>> > >> >> problem. Calling close(0) exposes the problem
> > quicker.
> > > >>In
> > > >> >> this
> > > >> >> > >>> > >>scenario
> > > >> >> > >>> > >> >> producer just encounter a send failure when running
> > > >> >>normally.
> > > >> >> > >>> > >> >> For scenario 2), blocking will expose the problem
> > > >>quick.
> > > >> >> > >>>Calling
> > > >> >> > >>> > >> >>close(-1)
> > > >> >> > >>> > >> >> might hide the problem. This scenario might
> include:
> > a)
> > > >> >>Unit
> > > >> >> > >>>test
> > > >> >> > >>> > >>for a
> > > >> >> > >>> > >> >> send failure. b) Message sending during a close()
> > call
> > > >> >>from a
> > > >> >> > >>>user
> > > >> >> > >>> > >> >>thread.
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >> So as a summary table:
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >>                   Scenario 1)
> > > >> >>  Scenario
> > > >> >> > >>>2)
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >> Blocking      Delay problem discovery
> > > >>Guaranteed
> > > >> >> > >>>problem
> > > >> >> > >>> > >> >>discovery
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >> Close(-1)     Immediate problem discovery
>  Problem
> > > >> >>might
> > > >> >> be
> > > >> >> > >>> hidden
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >> Personally I prefer blocking because it seems
> > providing
> > > >> >>more
> > > >> >> > >>> > >>guarantees
> > > >> >> > >>> > >> >> and safer.
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >> Thanks.
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >> Jiangjie (Becket) Qin
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang"
> > > >><wangguoz@gmail.com
> > > >> >> <javascript:;>>
> > > >> >> > >>>wrote:
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >> >HI Jiangjie,
> > > >> >> > >>> > >> >> >
> > > >> >> > >>> > >> >> >As far as I understand calling close() in the
> > > >>ioThread is
> > > >> >> not
> > > >> >> > >>> > >>common,
> > > >> >> > >>> > >> >>as
> > > >> >> > >>> > >> >> >it
> > > >> >> > >>> > >> >> >may only trigger when we saw some non-retriable
> > error.
> > > >> >>Hence
> > > >> >> > >>>when
> > > >> >> > >>> > >>user
> > > >> >> > >>> > >> >>run
> > > >> >> > >>> > >> >> >their program it is unlikely that close() will be
> > > >> >>triggered
> > > >> >> > >>>and
> > > >> >> > >>> > >>problem
> > > >> >> > >>> > >> >> >will be detected. So it seems to me that from the
> > > >>error
> > > >> >> > >>>detection
> > > >> >> > >>> > >> >>aspect
> > > >> >> > >>> > >> >> >these two options seems to be the same as people
> > will
> > > >> >> usually
> > > >> >> > >>> > >>detect it
> > > >> >> > >>> > >> >> >from the producer metrics all dropping to 0.
> > > >> >> > >>> > >> >> >
> > > >> >> > >>> > >> >> >Guozhang
> > > >> >> > >>> > >> >> >
> > > >> >> > >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> > > >> >> > >>> > >> >><jq...@linkedin.com.invalid>
> > > >> >> > >>> > >> >> >wrote:
> > > >> >> > >>> > >> >> >
> > > >> >> > >>> > >> >> >> It seems there are two options we can choose
> from
> > > >>when
> > > >> >> > >>>close()
> > > >> >> > >>> is
> > > >> >> > >>> > >> >>called
> > > >> >> > >>> > >> >> >> from sender thread (callback):
> > > >> >> > >>> > >> >> >> 1. Log an error and close the producer using
> > > >>close(-1)
> > > >> >> > >>> > >> >> >> 2. Log an error and block.
> > > >> >> > >>> > >> >> >> (Throwing an exception will not work because we
> > > >>catch
> > > >> >>all
> > > >> >> > >>>the
> > > >> >> > >>> > >> >>exception
> > > >> >> > >>> > >> >> >> thrown from user callback. It will just lead to
> an
> > > >> >>error
> > > >> >> > >>>log.)
> > > >> >> > >>> > >> >> >>
> > > >> >> > >>> > >> >> >> My concern for the first option is that the
> > producer
> > > >> >>will
> > > >> >> be
> > > >> >> > >>> > >>closed
> > > >> >> > >>> > >> >>even
> > > >> >> > >>> > >> >> >> if we logged and error. I am wondering if some
> > user
> > > >> >>would
> > > >> >> > >>>not
> > > >> >> > >>> even
> > > >> >> > >>> > >> >>take
> > > >> >> > >>> > >> >> >>a
> > > >> >> > >>> > >> >> >> look at the log if producer is closed normally.
> > > >>Because
> > > >> >> from
> > > >> >> > >>>the
> > > >> >> > >>> > >> >> >>programs
> > > >> >> > >>> > >> >> >> behavior, everything looks good. If that is the
> > > >>case,
> > > >> >>the
> > > >> >> > >>>error
> > > >> >> > >>> > >> >>message
> > > >> >> > >>> > >> >> >>we
> > > >> >> > >>> > >> >> >> logged probably will just be ignored until some
> > day
> > > >> >>when
> > > >> >> > >>>people
> > > >> >> > >>> > >>check
> > > >> >> > >>> > >> >> >>the
> > > >> >> > >>> > >> >> >> log and see it.
> > > >> >> > >>> > >> >> >>
> > > >> >> > >>> > >> >> >> As for the second option, because producer does
> > not
> > > >> >>close
> > > >> >> > >>>but
> > > >> >> > >>> > >>blocks.
> > > >> >> > >>> > >> >> >>User
> > > >> >> > >>> > >> >> >> will notice this the first time they run the
> > > >>program.
> > > >> >>They
> > > >> >> > >>> > >>probably
> > > >> >> > >>> > >> >>will
> > > >> >> > >>> > >> >> >> look at the log to see why producer could not be
> > > >>closed
> > > >> >> and
> > > >> >> > >>>they
> > > >> >> > >>> > >>will
> > > >> >> > >>> > >> >> >>see
> > > >> >> > >>> > >> >> >> the error log we put there. So they will get
> > > >>informed
> > > >> >> about
> > > >> >> > >>>this
> > > >> >> > >>> > >> >> >>mis-usage
> > > >> >> > >>> > >> >> >> of close() in sender thread the first time they
> > run
> > > >>the
> > > >> >> code
> > > >> >> > >>> > >>instead
> > > >> >> > >>> > >> >>of
> > > >> >> > >>> > >> >> >> some time later.
> > > >> >> > >>> > >> >> >>
> > > >> >> > >>> > >> >> >> Personally I prefer the second one because it is
> > > >>more
> > > >> >> > >>>obvious
> > > >> >> > >>> that
> > > >> >> > >>> > >> >> >> something was wrong.
> > > >> >> > >>> > >> >> >>
> > > >> >> > >>> > >> >> >> Jiangjie (Becket) Qin
> > > >> >> > >>> > >> >> >>
> > > >> >> > >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang"
> > > >> >><wangguoz@gmail.com
> > > >> >> <javascript:;>>
> > > >> >> > >>> wrote:
> > > >> >> > >>> > >> >> >>
> > > >> >> > >>> > >> >> >> >Yeah I agree we should not silently change the
> > > >> >>behavior
> > > >> >> of
> > > >> >> > >>>the
> > > >> >> > >>> > >> >>function
> > > >> >> > >>> > >> >> >> >with the given parameters; and I would prefer
> > > >> >> > >>> > >> >> >>error-logging-and-shutdown
> > > >> >> > >>> > >> >> >> >over blocking when close(>0) is used, since as
> > Neha
> > > >> >> > >>>suggested
> > > >> >> > >>> > >> >>blocking
> > > >> >> > >>> > >> >> >> >would also not proceed with sending any data,
> bu
> > > >>will
> > > >> >> just
> > > >> >> > >>>let
> > > >> >> > >>> > >> >>users to
> > > >> >> > >>> > >> >> >> >realize the issue later than sooner.
> > > >> >> > >>> > >> >> >> >
> > > >> >> > >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
> > > >> >> > >>> > >><neha@confluent.io <javascript:;>>
> > > >> >> > >>> > >> >> >>wrote:
> > > >> >> > >>> > >> >> >> >
> > > >> >> > >>> > >> >> >> >> >
> > > >> >> > >>> > >> >> >> >> > And I also agree it is better if we can
> make
> > > >> >>producer
> > > >> >> > >>>block
> > > >> >> > >>> > >>when
> > > >> >> > >>> > >> >> >> >> > close() is called from sender thread so
> user
> > > >>will
> > > >> >> > >>>notice
> > > >> >> > >>> > >> >>something
> > > >> >> > >>> > >> >> >> >>went
> > > >> >> > >>> > >> >> >> >> > wrong.
> > > >> >> > >>> > >> >> >> >>
> > > >> >> > >>> > >> >> >> >>
> > > >> >> > >>> > >> >> >> >> This isn't a great experience either. Why
> can't
> > > >>we
> > > >> >>just
> > > >> >> > >>>throw
> > > >> >> > >>> > >>an
> > > >> >> > >>> > >> >> >> >>exception
> > > >> >> > >>> > >> >> >> >> for a behavior we know is incorrect and we'd
> > like
> > > >> >>the
> > > >> >> > >>>user to
> > > >> >> > >>> > >> >>know.
> > > >> >> > >>> > >> >> >> >> Blocking as a means of doing that seems wrong
> > and
> > > >> >> > >>>annoying.
> > > >> >> > >>> > >> >> >> >>
> > > >> >> > >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
> > > >> >> > >>> > >><jay.kreps@gmail.com <javascript:;>>
> > > >> >> > >>> > >> >> >> wrote:
> > > >> >> > >>> > >> >> >> >>
> > > >> >> > >>> > >> >> >> >> > Cool.
> > > >> >> > >>> > >> >> >> >> >
> > > >> >> > >>> > >> >> >> >> > I think blocking is good or alternately
> > > >>throwing
> > > >> >>an
> > > >> >> > >>> exception
> > > >> >> > >>> > >> >> >>directly
> > > >> >> > >>> > >> >> >> >> from
> > > >> >> > >>> > >> >> >> >> > close(). Basically I would just worry about
> > > >>subtly
> > > >> >> > >>>doing
> > > >> >> > >>> > >> >>something
> > > >> >> > >>> > >> >> >> >> slightly
> > > >> >> > >>> > >> >> >> >> > different from what the user asked for as
> it
> > > >>will
> > > >> >>be
> > > >> >> > >>>hard
> > > >> >> > >>> to
> > > >> >> > >>> > >> >>notice
> > > >> >> > >>> > >> >> >> >>that
> > > >> >> > >>> > >> >> >> >> > behavior difference.
> > > >> >> > >>> > >> >> >> >> >
> > > >> >> > >>> > >> >> >> >> > -Jay
> > > >> >> > >>> > >> >> >> >> >
> > > >> >> > >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie
> Qin
> > > >> >> > >>> > >> >> >> >><jqin@linkedin.com.invalid
> > > >> >> > >>> > >> >> >> >> >
> > > >> >> > >>> > >> >> >> >> > wrote:
> > > >> >> > >>> > >> >> >> >> >
> > > >> >> > >>> > >> >> >> >> > > Hi Jay,
> > > >> >> > >>> > >> >> >> >> > >
> > > >> >> > >>> > >> >> >> >> > > I have modified the KIP as you
> suggested. I
> > > >> >>thinks
> > > >> >> as
> > > >> >> > >>> long
> > > >> >> > >>> > >>as
> > > >> >> > >>> > >> >>we
> > > >> >> > >>> > >> >> >> >>have
> > > >> >> > >>> > >> >> >> >> > > consistent define for timeout across
> Kafka
> > > >> >> interface,
> > > >> >> > >>> there
> > > >> >> > >>> > >> >>would
> > > >> >> > >>> > >> >> >> >>be no
> > > >> >> > >>> > >> >> >> >> > > problem. And I also agree it is better if
> > we
> > > >>can
> > > >> >> make
> > > >> >> > >>> > >>producer
> > > >> >> > >>> > >> >> >>block
> > > >> >> > >>> > >> >> >> >> when
> > > >> >> > >>> > >> >> >> >> > > close() is called from sender thread so
> > user
> > > >> >>will
> > > >> >> > >>>notice
> > > >> >> > >>> > >> >> >>something
> > > >> >> > >>> > >> >> >> >>went
> > > >> >> > >>> > >> >> >> >> > > wrong.
> > > >> >> > >>> > >> >> >> >> > >
> > > >> >> > >>> > >> >> >> >> > > Thanks.
> > > >> >> > >>> > >> >> >> >> > >
> > > >> >> > >>> > >> >> >> >> > > Jiangjie (Becket) Qin
> > > >> >> > >>> > >> >> >> >> > >
> > > >> >> > >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
> > > >> >> > >>><jay.kreps@gmail.com <javascript:;>>
> > > >> >> > >>> > >> wrote:
> > > >> >> > >>> > >> >> >> >> > >
> > > >> >> > >>> > >> >> >> >> > > >Hey Jiangjie,
> > > >> >> > >>> > >> >> >> >> > > >
> > > >> >> > >>> > >> >> >> >> > > >I think this is going to be very
> confusing
> > > >>that
> > > >> >> > >>> > >> >> >> >> > > >  close(0) waits indefinitely and
> > > >> >> > >>> > >> >> >> >> > > >  close(-1) waits for 0.
> > > >> >> > >>> > >> >> >> >> > > >I understand this appears in other apis,
> > > >>but it
> > > >> >> is a
> > > >> >> > >>> > >>constant
> > > >> >> > >>> > >> >> >> >>cause of
> > > >> >> > >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
> > > >> >> > >>> > >> >> >> >> > > >
> > > >> >> > >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't
> > > >>need a
> > > >> >> way
> > > >> >> > >>>to
> > > >> >> > >>> > >>wait
> > > >> >> > >>> > >> >> >> >> indefinitely
> > > >> >> > >>> > >> >> >> >> > > >as
> > > >> >> > >>> > >> >> >> >> > > >we already have close() so having a
> > magical
> > > >> >> constant
> > > >> >> > >>>for
> > > >> >> > >>> > >> >>that is
> > > >> >> > >>> > >> >> >> >> > > >redundant.
> > > >> >> > >>> > >> >> >> >> > > >
> > > >> >> > >>> > >> >> >> >> > > >Calling close() from the I/O thread was
> > > >>already
> > > >> >> > >>>possible
> > > >> >> > >>> > >>and
> > > >> >> > >>> > >> >> >>would
> > > >> >> > >>> > >> >> >> >> block
> > > >> >> > >>> > >> >> >> >> > > >indefinitely. I think trying to silently
> > > >>change
> > > >> >> the
> > > >> >> > >>> > >>behavior
> > > >> >> > >>> > >> >>is
> > > >> >> > >>> > >> >> >> >> probably
> > > >> >> > >>> > >> >> >> >> > > >not right. I.e. if the user calls
> close()
> > in
> > > >> >>the
> > > >> >> > >>> callback
> > > >> >> > >>> > >> >>there
> > > >> >> > >>> > >> >> >>is
> > > >> >> > >>> > >> >> >> >> > > >actually
> > > >> >> > >>> > >> >> >> >> > > >some misunderstanding and they need to
> > think
> > > >> >>more,
> > > >> >> > >>> > >>silently
> > > >> >> > >>> > >> >> >>making
> > > >> >> > >>> > >> >> >> >> this
> > > >> >> > >>> > >> >> >> >> > > >not
> > > >> >> > >>> > >> >> >> >> > > >block will hide the problem from them
> > which
> > > >>is
> > > >> >>the
> > > >> >> > >>> > >>opposite
> > > >> >> > >>> > >> >>of
> > > >> >> > >>> > >> >> >> >>what we
> > > >> >> > >>> > >> >> >> >> > > >want.
> > > >> >> > >>> > >> >> >> >> > > >
> > > >> >> > >>> > >> >> >> >> > > >-Jay
> > > >> >> > >>> > >> >> >> >> > > >
> > > >> >> > >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM,
> Jiangjie
> > > >>Qin
> > > >> >> > >>> > >> >> >> >> <jqin@linkedin.com.invalid
> > > >> >> > >>> > >> >> >> >> > >
> > > >> >> > >>> > >> >> >> >> > > >wrote:
> > > >> >> > >>> > >> >> >> >> > > >
> > > >> >> > >>> > >> >> >> >> > > >> Hey Joe & Jay,
> > > >> >> > >>> > >> >> >> >> > > >>
> > > >> >> > >>> > >> >> >> >> > > >> Thanks for the comments on the voting
> > > >>thread.
> > > >> >> > >>>Since it
> > > >> >> > >>> > >> >>seems
> > > >> >> > >>> > >> >> >>we
> > > >> >> > >>> > >> >> >> >> > probably
> > > >> >> > >>> > >> >> >> >> > > >> will have more discussion on this, I
> am
> > > >>just
> > > >> >> > >>>replying
> > > >> >> > >>> > >>from
> > > >> >> > >>> > >> >>the
> > > >> >> > >>> > >> >> >> >> > > >>discussion
> > > >> >> > >>> > >> >> >> >> > > >> thread here.
> > > >> >> > >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it
> > less
> > > >> >>like
> > > >> >> > >>> > >>half-baked,
> > > >> >> > >>> > >> >> >> >>apologize
> > > >> >> > >>> > >> >> >> >> > for
> > > >> >> > >>> > >> >> >> >> > > >> the rush...
> > > >> >> > >>> > >> >> >> >> > > >>
> > > >> >> > >>> > >> >> >> >> > > >> The contract in current KIP is:
> > > >> >> > >>> > >> >> >> >> > > >>   1. close() - wait until all requests
> > > >>either
> > > >> >> are
> > > >> >> > >>>sent
> > > >> >> > >>> > >>or
> > > >> >> > >>> > >> >> >>reach
> > > >> >> > >>> > >> >> >> >> > request
> > > >> >> > >>> > >> >> >> >> > > >> timeout.
> > > >> >> > >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS)
> -
> > > >>close
> > > >> >> > >>> > >>immediately
> > > >> >> > >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) -
> > > >> >> equivalent
> > > >> >> > >>>to
> > > >> >> > >>> > >> >>close(),
> > > >> >> > >>> > >> >> >> >>i.e.
> > > >> >> > >>> > >> >> >> >> > Wait
> > > >> >> > >>> > >> >> >> >> > > >> until all requests are sent or reach
> > > >>request
> > > >> >> > >>>timeout
> > > >> >> > >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) -
> > try
> > > >> >>the
> > > >> >> > >>>best to
> > > >> >> > >>> > >> >>finish
> > > >> >> > >>> > >> >> >> >> sending
> > > >> >> > >>> > >> >> >> >> > > >>in 5
> > > >> >> > >>> > >> >> >> >> > > >> milliseconds, if something went wrong,
> > > >>just
> > > >> >> > >>>shutdown
> > > >> >> > >>> the
> > > >> >> > >>> > >> >> >>producer
> > > >> >> > >>> > >> >> >> >> > > >>anyway,
> > > >> >> > >>> > >> >> >> >> > > >> my callback will handle the failures.
> > > >> >> > >>> > >> >> >> >> > > >>
> > > >> >> > >>> > >> >> >> >> > > >> About how we define what timeout value
> > > >>stands
> > > >> >> for,
> > > >> >> > >>>I
> > > >> >> > >>> > >> >>actually
> > > >> >> > >>> > >> >> >> >> > struggled
> > > >> >> > >>> > >> >> >> >> > > >>a
> > > >> >> > >>> > >> >> >> >> > > >> little bit when wrote the patch.
> > > >>Intuitively,
> > > >> >> > >>>close(0)
> > > >> >> > >>> > >> >>should
> > > >> >> > >>> > >> >> >> >>mean
> > > >> >> > >>> > >> >> >> >> > > >> immediately, however it seems that all
> > the
> > > >> >> > >>>existing
> > > >> >> > >>> java
> > > >> >> > >>> > >> >>class
> > > >> >> > >>> > >> >> >> >>have
> > > >> >> > >>> > >> >> >> >> > this
> > > >> >> > >>> > >> >> >> >> > > >> convention of timeout=0 means no
> timeout
> > > >>or
> > > >> >> never
> > > >> >> > >>> > >>timeout
> > > >> >> > >>> > >> >> >> >> > > >>(Thread.join(0),
> > > >> >> > >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the
> > dilemma
> > > >>is
> > > >> >> > >>>either we
> > > >> >> > >>> > >> >>follow
> > > >> >> > >>> > >> >> >>the
> > > >> >> > >>> > >> >> >> >> > > >> intuition or we follow the convention.
> > > >>What I
> > > >> >> > >>>chose is
> > > >> >> > >>> > >>to
> > > >> >> > >>> > >> >> >>follow
> > > >> >> > >>> > >> >> >> >>the
> > > >> >> > >>> > >> >> >> >> > > >> convention but document the interface
> to
> > > >>let
> > > >> >> user
> > > >> >> > >>>be
> > > >> >> > >>> > >>aware
> > > >> >> > >>> > >> >>of
> > > >> >> > >>> > >> >> >>the
> > > >> >> > >>> > >> >> >> >> > usage.
> > > >> >> > >>> > >> >> >> >> > > >> The reason is that I think
> > > >>producer.close()
> > > >> >>is a
> > > >> >> > >>> public
> > > >> >> > >>> > >> >> >> >>interface so
> > > >> >> > >>> > >> >> >> >> > it
> > > >> >> > >>> > >> >> >> >> > > >> might be better to follow java
> > convention.
> > > >> >> Whereas
> > > >> >> > >>> > >> >>selector is
> > > >> >> > >>> > >> >> >> >>not a
> > > >> >> > >>> > >> >> >> >> > > >> public interface that used by user, so
> > as
> > > >> >>long
> > > >> >> as
> > > >> >> > >>>it
> > > >> >> > >>> > >>makes
> > > >> >> > >>> > >> >> >>sense
> > > >> >> > >>> > >> >> >> >>to
> > > >> >> > >>> > >> >> >> >> > us,
> > > >> >> > >>> > >> >> >> >> > > >>it
> > > >> >> > >>> > >> >> >> >> > > >> is less a problem to be different from
> > > >>java
> > > >> >> > >>> convention.
> > > >> >> > >>> > >> >>That
> > > >> >> > >>> > >> >> >>said
> > > >> >> > >>> > >> >> >> >> > since
> > > >> >> > >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a
> public
> > > >> >> interface,
> > > >> >> > >>>I
> > > >> >> > >>> > >>think
> > > >> >> > >>> > >> >>it
> > > >> >> > >>> > >> >> >> >>also
> > > >> >> > >>> > >> >> >> >> > makes
> > > >> >> > >>> > >> >> >> >> > > >> sense to make producer.close() to have
> > the
> > > >> >>same
> > > >> >> > >>> > >>definition
> > > >> >> > >>> > >> >>of
> > > >> >> > >>> > >> >> >> >> > > >> consumer.poll(timeout).
> > > >> >> > >>> > >> >> >> >> > > >>
> > > >> >> > >>> > >> >> >> >> > > >> The main argument for keeping a
> timeout
> > in
> > > >> >>close
> > > >> >> > >>>would
> > > >> >> > >>> > >>be
> > > >> >> > >>> > >> >> >> >>separating
> > > >> >> > >>> > >> >> >> >> > the
> > > >> >> > >>> > >> >> >> >> > > >> close timeout from request timeout,
> > which
> > > >> >> probably
> > > >> >> > >>> makes
> > > >> >> > >>> > >> >> >>sense. I
> > > >> >> > >>> > >> >> >> >> > would
> > > >> >> > >>> > >> >> >> >> > > >> guess typically the request timeout
> > would
> > > >>be
> > > >> >> long
> > > >> >> > >>> (e.g.
> > > >> >> > >>> > >>60
> > > >> >> > >>> > >> >> >> >>seconds)
> > > >> >> > >>> > >> >> >> >> > > >> because we might want to consider
> > retries
> > > >> >>with
> > > >> >> > >>>back
> > > >> >> > >>> off
> > > >> >> > >>> > >> >>time.
> > > >> >> > >>> > >> >> >>If
> > > >> >> > >>> > >> >> >> >>we
> > > >> >> > >>> > >> >> >> >> > have
> > > >> >> > >>> > >> >> >> >> > > >> multiple batches in accumulator, in
> > worst
> > > >> >>case
> > > >> >> > >>>that
> > > >> >> > >>> > >>could
> > > >> >> > >>> > >> >>take
> > > >> >> > >>> > >> >> >> >>up to
> > > >> >> > >>> > >> >> >> >> > > >> several minutes to complete all the
> > > >>requests.
> > > >> >> But
> > > >> >> > >>>when
> > > >> >> > >>> > >>we
> > > >> >> > >>> > >> >> >>close a
> > > >> >> > >>> > >> >> >> >> > > >> producer, we might not want to wait
> for
> > > >>that
> > > >> >> long
> > > >> >> > >>>as
> > > >> >> > >>> it
> > > >> >> > >>> > >> >>might
> > > >> >> > >>> > >> >> >> >>cause
> > > >> >> > >>> > >> >> >> >> > some
> > > >> >> > >>> > >> >> >> >> > > >> other problem like deployment tool
> > > >>timeout.
> > > >> >> > >>> > >> >> >> >> > > >>
> > > >> >> > >>> > >> >> >> >> > > >> There is also a subtle difference
> > between
> > > >> >> > >>> close(timeout)
> > > >> >> > >>> > >> >>and
> > > >> >> > >>> > >> >> >> >> > > >> flush(timeout). The only purpose for
> > > >>flush()
> > > >> >>is
> > > >> >> to
> > > >> >> > >>> write
> > > >> >> > >>> > >> >>data
> > > >> >> > >>> > >> >> >>to
> > > >> >> > >>> > >> >> >> >>the
> > > >> >> > >>> > >> >> >> >> > > >> broker, so it makes perfect sense to
> > wait
> > > >> >>until
> > > >> >> > >>> request
> > > >> >> > >>> > >> >> >>timeout.
> > > >> >> > >>> > >> >> >> >>I
> > > >> >> > >>> > >> >> >> >> > think
> > > >> >> > >>> > >> >> >> >> > > >> that is why flush(timeout) looks
> > strange.
> > > >>On
> > > >> >>the
> > > >> >> > >>>other
> > > >> >> > >>> > >> >>hand,
> > > >> >> > >>> > >> >> >>the
> > > >> >> > >>> > >> >> >> >>top
> > > >> >> > >>> > >> >> >> >> > > >> priority for close() is to close the
> > > >>producer
> > > >> >> > >>>rather
> > > >> >> > >>> > >>than
> > > >> >> > >>> > >> >> >>flush()
> > > >> >> > >>> > >> >> >> >> > data,
> > > >> >> > >>> > >> >> >> >> > > >>so
> > > >> >> > >>> > >> >> >> >> > > >> close(timeout) gives guarantee on
> > bounded
> > > >> >> waiting
> > > >> >> > >>>for
> > > >> >> > >>> > >>its
> > > >> >> > >>> > >> >>main
> > > >> >> > >>> > >> >> >> >>job.
> > > >> >> > >>> > >> >> >> >> > > >>
> > > >> >> > >>> > >> >> >> >> > > >> Sorry for the confusion about
> forceClose
> > > >> >>flag.
> > > >> >> It
> > > >> >> > >>>is
> > > >> >> > >>> > >>not a
> > > >> >> > >>> > >> >> >>public
> > > >> >> > >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed
> > > >>Changes
> > > >> >> > >>>section
> > > >> >> > >>> > >> >>which I
> > > >> >> > >>> > >> >> >> >> thought
> > > >> >> > >>> > >> >> >> >> > > >>was
> > > >> >> > >>> > >> >> >> >> > > >> supposed to provide implementation
> > > >>details.
> > > >> >> > >>> > >> >> >> >> > > >>
> > > >> >> > >>> > >> >> >> >> > > >> Thanks again for all the comments and
> > > >> >> suggestions!
> > > >> >> > >>> > >> >> >> >> > > >>
> > > >> >> > >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
> > > >> >> > >>> > >> >> >> >> > > >>
> > > >> >> > >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
> > > >> >> > >>> jqin@linkedin.com <javascript:;>>
> > > >> >> > >>> > >> >> wrote:
> > > >> >> > >>> > >> >> >> >> > > >>
> > > >> >> > >>> > >> >> >> >> > > >> >The KIP page has been updated per
> Jay¹s
> > > >> >> comments.
> > > >> >> > >>> > >> >> >> >> > > >> >I¹d like to initiate the voting
> process
> > > >>if
> > > >> >>no
> > > >> >> > >>>further
> > > >> >> > >>> > >> >> >>comments
> > > >> >> > >>> > >> >> >> >>are
> > > >> >> > >>> > >> >> >> >> > > >> >received by tomorrow.
> > > >> >> > >>> > >> >> >> >> > > >> >
> > > >> >> > >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
> > > >> >> > >>> > >> >> >> >> > > >> >
> > > >> >> > >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
> > > >> >> > >>><jay.kreps@gmail.com <javascript:;>
> > > >> >> > >>> >
> > > >> >> > >>> > >> >>wrote:
> > > >> >> > >>> > >> >> >> >> > > >> >
> > > >> >> > >>> > >> >> >> >> > > >> >>Hey Jiangjie,
> > > >> >> > >>> > >> >> >> >> > > >> >>
> > > >> >> > >>> > >> >> >> >> > > >> >>Can you capture the full motivation
> > and
> > > >>use
> > > >> >> > >>>cases
> > > >> >> > >>> for
> > > >> >> > >>> > >>the
> > > >> >> > >>> > >> >> >> >>feature?
> > > >> >> > >>> > >> >> >> >> > > >>This
> > > >> >> > >>> > >> >> >> >> > > >> >>mentions your interest in having a
> way
> > > >>of
> > > >> >> > >>>aborting
> > > >> >> > >>> > >>from
> > > >> >> > >>> > >> >> >>inside
> > > >> >> > >>> > >> >> >> >>the
> > > >> >> > >>> > >> >> >> >> > > >> >>Callback. But it doesn't really
> > explain
> > > >> >>that
> > > >> >> > >>>usage
> > > >> >> > >>> or
> > > >> >> > >>> > >>why
> > > >> >> > >>> > >> >> >>other
> > > >> >> > >>> > >> >> >> >> > people
> > > >> >> > >>> > >> >> >> >> > > >> >>would want to do that. It also
> doesn't
> > > >>list
> > > >> >> the
> > > >> >> > >>> > >>primary
> > > >> >> > >>> > >> >>use
> > > >> >> > >>> > >> >> >> >>case
> > > >> >> > >>> > >> >> >> >> for
> > > >> >> > >>> > >> >> >> >> > > >> >>having
> > > >> >> > >>> > >> >> >> >> > > >> >>close with a bounded timeout which
> was
> > > >>to
> > > >> >> avoid
> > > >> >> > >>> > >>blocking
> > > >> >> > >>> > >> >>too
> > > >> >> > >>> > >> >> >> >>long
> > > >> >> > >>> > >> >> >> >> on
> > > >> >> > >>> > >> >> >> >> > > >> >>shutdown.
> > > >> >> > >>> > >> >> >> >> > > >> >>
> > > >> >> > >>> > >> >> >> >> > > >> >>-Jay
> > > >> >> > >>> > >> >> >> >> > > >> >>
> > > >> >> > >>> > >> >> >> >> > > >> >>
> > > >> >> > >>> > >> >> >> >> > > >> >>
> > > >> >> > >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM,
> > > >>Jiangjie
> > > >> >>Qin
> > > >> >> > >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
> > > >> >> > >>> > >> >> >> >> > > >> >
> > > >> >> > >>> > >> >> >> >> > > >> >>wrote:
> > > >> >> > >>> > >> >> >> >> > > >> >>
> > > >> >> > >>> > >> >> >> >> > > >> >>> Hi,
> > > >> >> > >>> > >> >> >> >> > > >> >>>
> > > >> >> > >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
> > > >> >> > >>>close(timeout)
> > > >> >> > >>> to
> > > >> >> > >>> > >>new
> > > >> >> > >>> > >> >> >> >> producer.
> > > >> >> > >>> > >> >> >> >> > > >>Most
> > > >> >> > >>> > >> >> >> >> > > >> >>>of
> > > >> >> > >>> > >> >> >> >> > > >> >>> the previous discussions are in
> > > >> >>KAFKA-1660
> > > >> >> > >>>where
> > > >> >> > >>> > >>Parth
> > > >> >> > >>> > >> >> >> >> Brahmbhatt
> > > >> >> > >>> > >> >> >> >> > > >>has
> > > >> >> > >>> > >> >> >> >> > > >> >>> already done a lot of work.
> > > >> >> > >>> > >> >> >> >> > > >> >>> Since this is an interface change
> so
> > > >>we
> > > >> >>are
> > > >> >> > >>>going
> > > >> >> > >>> > >> >>through
> > > >> >> > >>> > >> >> >>the
> > > >> >> > >>> > >> >> >> >> KIP
> > > >> >> > >>> > >> >> >> >> > > >> >>>process.
> > > >> >> > >>> > >> >> >> >> > > >> >>> Here is the KIP link:
> > > >> >> > >>> > >> >> >> >> > > >> >>>
> > > >> >> > >>> > >> >> >> >> > > >> >>>
> > > >> >> > >>> > >> >> >> >> > > >>
> > > >> >> > >>> > >> >> >> >> > >
> > > >> >> > >>> > >> >> >> >>
> > > >> >> > >>> > >> >> >>
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >>
> > > >> >> > >>> >
> > > >> >> > >>>
> > > >> >> >
> > > >>
> > > >>>>
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=537397
> > > >>>>8
> > > >> >> > >>> > >> >> >> >> > > >> >>>2
> > > >> >> > >>> > >> >> >> >> > > >> >>>
> > > >> >> > >>> > >> >> >> >> > > >> >>> Thanks.
> > > >> >> > >>> > >> >> >> >> > > >> >>>
> > > >> >> > >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> > > >> >> > >>> > >> >> >> >> > > >> >>>
> > > >> >> > >>> > >> >> >> >> > > >> >
> > > >> >> > >>> > >> >> >> >> > > >>
> > > >> >> > >>> > >> >> >> >> > > >>
> > > >> >> > >>> > >> >> >> >> > >
> > > >> >> > >>> > >> >> >> >> > >
> > > >> >> > >>> > >> >> >> >> >
> > > >> >> > >>> > >> >> >> >>
> > > >> >> > >>> > >> >> >> >>
> > > >> >> > >>> > >> >> >> >>
> > > >> >> > >>> > >> >> >> >> --
> > > >> >> > >>> > >> >> >> >> Thanks,
> > > >> >> > >>> > >> >> >> >> Neha
> > > >> >> > >>> > >> >> >> >>
> > > >> >> > >>> > >> >> >> >
> > > >> >> > >>> > >> >> >> >
> > > >> >> > >>> > >> >> >> >
> > > >> >> > >>> > >> >> >> >--
> > > >> >> > >>> > >> >> >> >-- Guozhang
> > > >> >> > >>> > >> >> >>
> > > >> >> > >>> > >> >> >>
> > > >> >> > >>> > >> >> >
> > > >> >> > >>> > >> >> >
> > > >> >> > >>> > >> >> >--
> > > >> >> > >>> > >> >> >-- Guozhang
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >>
> > > >> >> > >>> > >> >
> > > >> >> > >>> > >> >
> > > >> >> > >>> > >> >--
> > > >> >> > >>> > >> >-- Guozhang
> > > >> >> > >>> > >>
> > > >> >> > >>> > >>
> > > >> >> > >>> >
> > > >> >> > >>> >
> > > >> >> > >>>
> > > >> >> > >>
> > > >> >> > >>
> > > >> >> > >>
> > > >> >> > >>--
> > > >> >> > >>-- Guozhang
> > > >> >> > >
> > > >> >> >
> > > >> >> >
> > > >> >>
> > > >> >
> > > >> >
> > > >> >--
> > > >> >Sent from Gmail Mobile
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Guozhang Wang <wa...@gmail.com>.

With this approach close(timeout > 0) will not actually close by the
elapsed timeout right? This semantics mismatch is a bit concerning to me..

Guozhang

On Fri, Mar 20, 2015 at 2:37 PM, Jun Rao <ju...@confluent.io> wrote:

> Yes, that probably is simpler. Basically, close() and close(timeout) will
> always wait for the sender thread to complete. Close(0) will just initiate
> the shutdown of the sender thread, but not waiting for the thread to
> complete.
>
> Thanks,
>
> Jun
>
> On Fri, Mar 20, 2015 at 11:29 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
> wrote:
>
> > That’s a very good point.
> > Currently if timeout > 0, it will wait up to timeout for the sender
> thread
> > to complete. After that it will do a force close but not block.
> >
> > Maybe we should make it to be:
> > If timeout > 0, wait up to timeout for the sender thread to complete.
> > After that, make a force close but still try to join the sender thread.
> > If timeout = 0, it is non-blocking.
> >
> > Jiangjie (Becket) Qin
> >
> > On 3/20/15, 9:16 AM, "Jun Rao" <ju...@confluent.io> wrote:
> >
> > >Got it. Just to clarify, does close(timeout) always wait for the sender
> > >thread to complete?
> > >
> > >Thanks,
> > >
> > >Jun
> > >
> > >On Fri, Mar 20, 2015 at 7:41 AM, Jiangjie Qin <jqin@linkedin.com.invalid
> >
> > >wrote:
> > >
> > >> I think the we agreed that we are going to log an error and block. By
> > >> doing this we can make sure the error log to be checked by user in all
> > >> cases.
> > >> If we silently replace close() to close(0) in sender thread, in some
> > >>cases
> > >> such as send error during a normal close(), user might not notice
> > >> something went wrong.
> > >>
> > >> On 3/19/15, 10:34 PM, "Joel Koshy" <jj...@gmail.com> wrote:
> > >>
> > >> >close can probably just getCurrentThread and check if ==
> senderThread.
> > >> >I'm actually not sure from this thread if there was clear agreement
> on
> > >> >whether it should change close(timeout)/close() to close(0) or if it
> > >> >should
> > >> >log an error and block up to the timeout.
> > >> >
> > >> >On Thursday, March 19, 2015, Jun Rao <ju...@confluent.io> wrote:
> > >> >
> > >> >> So in (1), if a close() or close(timeout) is called from a
> callback,
> > >>we
> > >> >> will just turn that into a close(0)? Implementation wise, how do we
> > >>know
> > >> >> whether a close() call is made from the sender thread or not?
> > >> >>
> > >> >> Thanks,
> > >> >>
> > >> >> Jun
> > >> >>
> > >> >> On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin
> > >> >><jq...@linkedin.com.invalid>
> > >> >> wrote:
> > >> >>
> > >> >> > It looks we have another option and are now deciding between the
> > >> >> following
> > >> >> > two interfaces:
> > >> >> >
> > >> >> > 1. Close() + close(timeout)
> > >> >> >   - timeout could be either positive or zero.
> > >> >> >   - only close(0) can be called from sender thread
> > >> >> >
> > >> >> > 2. Close() + abort() + close(timeout)
> > >> >> >   - timeout can either be positive or zero
> > >> >> >   - only abort() can be called from sender thread
> > >> >> >
> > >> >> >   - abort() is equivalent to close(0) in 1) but does not join
> > >>sender
> > >> >> > thread and does not close metrics.
> > >> >> >   - Another thread has to call close() or close(timeout) in order
> > >>to
> > >> >>make
> > >> >> > sure the resources in producer are gone.
> > >> >> >
> > >> >> > The tow approach provides the same function we need, the
> > >>difference is
> > >> >> > approach 2) follows convention of close() and abort(). On the
> other
> > >> >>hand,
> > >> >> > approach 1) saves one interface compared with approach 2) but
> does
> > >>not
> > >> >> > follow the convention.
> > >> >> >
> > >> >> > When the two approaches come to user code, it is probably
> something
> > >> >>like
> > >> >> > this:
> > >> >> >
> > >> >> > Try {
> > >> >> >   While(!finished)
> > >> >> >     Producer.send(record, callback)
> > >> >> > } catch (Exception e) {
> > >> >> >   Producer.close(5)
> > >> >> > }
> > >> >> >
> > >> >> > Class CallbackImpl implements Callback {
> > >> >> >   onCompletion(RecordMetadata metadata Exception e) {
> > >> >> >     If (e != null)
> > >> >> >       Abort() / close()
> > >> >> >   }
> > >> >> > }
> > >> >> >
> > >> >> > Because the two approach leads to almost the same user code,
> > >>assuming
> > >> >> > users are always calling producer.close() as a clean up step,
> > >> >>personally
> > >> >> I
> > >> >> > prefer approach 2) as it follows convention.
> > >> >> >
> > >> >> > Any thoughts?
> > >> >> >
> > >> >> > Jiangjie (Becket) Qin
> > >> >> >
> > >> >> >
> > >> >> > On 3/17/15, 10:25 AM, "Jiangjie Qin" <jqin@linkedin.com
> > >> >><javascript:;>>
> > >> >> wrote:
> > >> >> >
> > >> >> > >Hi Jun,
> > >> >> > >
> > >> >> > >Yes, as Guozhang said, the main reason we set a flag is because
> > >> >>close(0)
> > >> >> > >is expected to be called by sender thread itself.
> > >> >> > >If we want to maintain the semantic meaning of close(), one
> > >> >>alternative
> > >> >> is
> > >> >> > >to have an abort() method does the same thing as close(0) except
> > >> >> cleanup.
> > >> >> > >And in close(timeout), after timeout we call abort() and join
> the
> > >> >>sender
> > >> >> > >thread. This was one of the previous proposal. We merged abort
> to
> > >> >> close(0)
> > >> >> > >because they are almost doing the same thing. But from what you
> > >> >> mentioned,
> > >> >> > >it might make sense to have two separate methods.
> > >> >> > >
> > >> >> > >Thanks.
> > >> >> > >
> > >> >> > >Jiangjie (Becket) Qin
> > >> >> > >
> > >> >> > >On 3/16/15, 10:31 PM, "Guozhang Wang" <wangguoz@gmail.com
> > >> >> <javascript:;>> wrote:
> > >> >> > >
> > >> >> > >>Yeah in this sense the sender thread will not exist immediately
> > >>in
> > >> >>the
> > >> >> > >>close(0) call, but will only terminate after the current
> response
> > >> >>batch
> > >> >> > >>has
> > >> >> > >>been processed, as will the producer instance itself.
> > >> >> > >>
> > >> >> > >>There is a reason for this though: for a clean shutdown the
> > >>caller
> > >> >> thread
> > >> >> > >>has to wait for the sender thread to join before closing the
> > >> >>producer
> > >> >> > >>instance, but this cannot be achieve if close(0) is called by
> the
> > >> >> sender
> > >> >> > >>thread itself (for example in KAFKA-1659, there is a proposal
> > >>from
> > >> >> Andrew
> > >> >> > >>Stein on using thread.interrupt and thread.stop, but if it is
> > >> >>called by
> > >> >> > >>the
> > >> >> > >>ioThread itself the stop call will fail). Hence we came up with
> > >>the
> > >> >> flag
> > >> >> > >>approach to let the sender thread to close as soon as it is at
> > >>the
> > >> >> > >>barrier
> > >> >> > >>of the run loop.
> > >> >> > >>
> > >> >> > >>Guozhang
> > >> >> > >>
> > >> >> > >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <jun@confluent.io
> > >> >> <javascript:;>> wrote:
> > >> >> > >>
> > >> >> > >>> Hmm, does that mean that after close(0), the sender thread is
> > >>not
> > >> >> > >>>necessary
> > >> >> > >>> gone? Normally, after closing an entity, we expect all
> internal
> > >> >> threads
> > >> >> > >>> associated with the entity are shut down completely.
> > >> >> > >>>
> > >> >> > >>> Thanks,
> > >> >> > >>>
> > >> >> > >>> Jun
> > >> >> > >>>
> > >> >> > >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
> > >> >> > >>><jq...@linkedin.com.invalid>
> > >> >> > >>> wrote:
> > >> >> > >>>
> > >> >> > >>> > Hi Jun,
> > >> >> > >>> >
> > >> >> > >>> > Close(0) will set two flags in sender. Running=false and a
> > >>newly
> > >> >> > >>>added
> > >> >> > >>> > forceClose=true. It will also set accumulator.closed=true
> so
> > >>no
> > >> >> > >>>further
> > >> >> > >>> > producer.send() will succeed.
> > >> >> > >>> > The sender thread will finish executing all the callbacks
> in
> > >> >> current
> > >> >> > >>> batch
> > >> >> > >>> > of responses, then it will see the forceClose flag. It will
> > >>just
> > >> >> fail
> > >> >> > >>>all
> > >> >> > >>> > the incomplete batches in the producer and exit.
> > >> >> > >>> > So close(0) is a non-blocking call and sender thread will
> not
> > >> >>try
> > >> >> to
> > >> >> > >>>join
> > >> >> > >>> > itself in close(0).
> > >> >> > >>> >
> > >> >> > >>> > Thanks.
> > >> >> > >>> >
> > >> >> > >>> > Jiangjie (Becket) Qin
> > >> >> > >>> >
> > >> >> > >>> > On 3/16/15, 2:50 PM, "Jun Rao" <jun@confluent.io
> > >> <javascript:;>>
> > >> >> wrote:
> > >> >> > >>> >
> > >> >> > >>> > >How does close(0) work if it's called from the sender
> > >>thread?
> > >> >>If
> > >> >> > >>> close(0)
> > >> >> > >>> > >needs to wait for the sender thread to join, wouldn't this
> > >> >>cause a
> > >> >> > >>> > >deadlock?
> > >> >> > >>> > >
> > >> >> > >>> > >Thanks,
> > >> >> > >>> > >
> > >> >> > >>> > >Jun
> > >> >> > >>> > >
> > >> >> > >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
> > >> >> > >>><jqin@linkedin.com.invalid
> > >> >> > >>> >
> > >> >> > >>> > >wrote:
> > >> >> > >>> > >
> > >> >> > >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered
> > >> >>without
> > >> >> > >>> > >> discussing with you :)
> > >> >> > >>> > >>
> > >> >> > >>> > >> Jiangjie (Becket) Qin
> > >> >> > >>> > >>
> > >> >> > >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <
> wangguoz@gmail.com
> > >> >> <javascript:;>> wrote:
> > >> >> > >>> > >>
> > >> >> > >>> > >> >Thanks Jiangjie,
> > >> >> > >>> > >> >
> > >> >> > >>> > >> >After talking to you offline on this, I have been
> > >>convinced
> > >> >>and
> > >> >> > >>> > >>changed my
> > >> >> > >>> > >> >preference to blocking. The immediate shutdown approach
> > >>does
> > >> >> have
> > >> >> > >>> some
> > >> >> > >>> > >> >unsafeness in some cases.
> > >> >> > >>> > >> >
> > >> >> > >>> > >> >Guozhang
> > >> >> > >>> > >> >
> > >> >> > >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
> > >> >> > >>> > >><jqin@linkedin.com.invalid
> > >> >> > >>> > >> >
> > >> >> > >>> > >> >wrote:
> > >> >> > >>> > >> >
> > >> >> > >>> > >> >> It looks that the problem we want to solve and the
> > >> >>purpose we
> > >> >> > >>>want
> > >> >> > >>> to
> > >> >> > >>> > >> >> achieve is:
> > >> >> > >>> > >> >> If user uses close() in callback, we want to let user
> > >>be
> > >> >> aware
> > >> >> > >>>that
> > >> >> > >>> > >>they
> > >> >> > >>> > >> >> should use close(0) instead of close() in the
> callback.
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >> We have agreed that we will have an error log to
> inform
> > >> >>user
> > >> >> > >>>about
> > >> >> > >>> > >>this
> > >> >> > >>> > >> >> mis-usage. The options differ in the way how we can
> > >>force
> > >> >> user
> > >> >> > >>>to
> > >> >> > >>> > >>take a
> > >> >> > >>> > >> >> look at that error log.
> > >> >> > >>> > >> >> There are two scenarios:
> > >> >> > >>> > >> >> 1. User does not expect the program to exit.
> > >> >> > >>> > >> >> 2. User expect the program to exit.
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >> For scenario 1), blocking will probably delay the
> > >> >>discovery
> > >> >> of
> > >> >> > >>>the
> > >> >> > >>> > >> >> problem. Calling close(0) exposes the problem
> quicker.
> > >>In
> > >> >> this
> > >> >> > >>> > >>scenario
> > >> >> > >>> > >> >> producer just encounter a send failure when running
> > >> >>normally.
> > >> >> > >>> > >> >> For scenario 2), blocking will expose the problem
> > >>quick.
> > >> >> > >>>Calling
> > >> >> > >>> > >> >>close(-1)
> > >> >> > >>> > >> >> might hide the problem. This scenario might include:
> a)
> > >> >>Unit
> > >> >> > >>>test
> > >> >> > >>> > >>for a
> > >> >> > >>> > >> >> send failure. b) Message sending during a close()
> call
> > >> >>from a
> > >> >> > >>>user
> > >> >> > >>> > >> >>thread.
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >> So as a summary table:
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >>                   Scenario 1)
> > >> >>  Scenario
> > >> >> > >>>2)
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >> Blocking      Delay problem discovery
> > >>Guaranteed
> > >> >> > >>>problem
> > >> >> > >>> > >> >>discovery
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >> Close(-1)     Immediate problem discovery     Problem
> > >> >>might
> > >> >> be
> > >> >> > >>> hidden
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >> Personally I prefer blocking because it seems
> providing
> > >> >>more
> > >> >> > >>> > >>guarantees
> > >> >> > >>> > >> >> and safer.
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >> Thanks.
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >> Jiangjie (Becket) Qin
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang"
> > >><wangguoz@gmail.com
> > >> >> <javascript:;>>
> > >> >> > >>>wrote:
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >> >HI Jiangjie,
> > >> >> > >>> > >> >> >
> > >> >> > >>> > >> >> >As far as I understand calling close() in the
> > >>ioThread is
> > >> >> not
> > >> >> > >>> > >>common,
> > >> >> > >>> > >> >>as
> > >> >> > >>> > >> >> >it
> > >> >> > >>> > >> >> >may only trigger when we saw some non-retriable
> error.
> > >> >>Hence
> > >> >> > >>>when
> > >> >> > >>> > >>user
> > >> >> > >>> > >> >>run
> > >> >> > >>> > >> >> >their program it is unlikely that close() will be
> > >> >>triggered
> > >> >> > >>>and
> > >> >> > >>> > >>problem
> > >> >> > >>> > >> >> >will be detected. So it seems to me that from the
> > >>error
> > >> >> > >>>detection
> > >> >> > >>> > >> >>aspect
> > >> >> > >>> > >> >> >these two options seems to be the same as people
> will
> > >> >> usually
> > >> >> > >>> > >>detect it
> > >> >> > >>> > >> >> >from the producer metrics all dropping to 0.
> > >> >> > >>> > >> >> >
> > >> >> > >>> > >> >> >Guozhang
> > >> >> > >>> > >> >> >
> > >> >> > >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> > >> >> > >>> > >> >><jq...@linkedin.com.invalid>
> > >> >> > >>> > >> >> >wrote:
> > >> >> > >>> > >> >> >
> > >> >> > >>> > >> >> >> It seems there are two options we can choose from
> > >>when
> > >> >> > >>>close()
> > >> >> > >>> is
> > >> >> > >>> > >> >>called
> > >> >> > >>> > >> >> >> from sender thread (callback):
> > >> >> > >>> > >> >> >> 1. Log an error and close the producer using
> > >>close(-1)
> > >> >> > >>> > >> >> >> 2. Log an error and block.
> > >> >> > >>> > >> >> >> (Throwing an exception will not work because we
> > >>catch
> > >> >>all
> > >> >> > >>>the
> > >> >> > >>> > >> >>exception
> > >> >> > >>> > >> >> >> thrown from user callback. It will just lead to an
> > >> >>error
> > >> >> > >>>log.)
> > >> >> > >>> > >> >> >>
> > >> >> > >>> > >> >> >> My concern for the first option is that the
> producer
> > >> >>will
> > >> >> be
> > >> >> > >>> > >>closed
> > >> >> > >>> > >> >>even
> > >> >> > >>> > >> >> >> if we logged and error. I am wondering if some
> user
> > >> >>would
> > >> >> > >>>not
> > >> >> > >>> even
> > >> >> > >>> > >> >>take
> > >> >> > >>> > >> >> >>a
> > >> >> > >>> > >> >> >> look at the log if producer is closed normally.
> > >>Because
> > >> >> from
> > >> >> > >>>the
> > >> >> > >>> > >> >> >>programs
> > >> >> > >>> > >> >> >> behavior, everything looks good. If that is the
> > >>case,
> > >> >>the
> > >> >> > >>>error
> > >> >> > >>> > >> >>message
> > >> >> > >>> > >> >> >>we
> > >> >> > >>> > >> >> >> logged probably will just be ignored until some
> day
> > >> >>when
> > >> >> > >>>people
> > >> >> > >>> > >>check
> > >> >> > >>> > >> >> >>the
> > >> >> > >>> > >> >> >> log and see it.
> > >> >> > >>> > >> >> >>
> > >> >> > >>> > >> >> >> As for the second option, because producer does
> not
> > >> >>close
> > >> >> > >>>but
> > >> >> > >>> > >>blocks.
> > >> >> > >>> > >> >> >>User
> > >> >> > >>> > >> >> >> will notice this the first time they run the
> > >>program.
> > >> >>They
> > >> >> > >>> > >>probably
> > >> >> > >>> > >> >>will
> > >> >> > >>> > >> >> >> look at the log to see why producer could not be
> > >>closed
> > >> >> and
> > >> >> > >>>they
> > >> >> > >>> > >>will
> > >> >> > >>> > >> >> >>see
> > >> >> > >>> > >> >> >> the error log we put there. So they will get
> > >>informed
> > >> >> about
> > >> >> > >>>this
> > >> >> > >>> > >> >> >>mis-usage
> > >> >> > >>> > >> >> >> of close() in sender thread the first time they
> run
> > >>the
> > >> >> code
> > >> >> > >>> > >>instead
> > >> >> > >>> > >> >>of
> > >> >> > >>> > >> >> >> some time later.
> > >> >> > >>> > >> >> >>
> > >> >> > >>> > >> >> >> Personally I prefer the second one because it is
> > >>more
> > >> >> > >>>obvious
> > >> >> > >>> that
> > >> >> > >>> > >> >> >> something was wrong.
> > >> >> > >>> > >> >> >>
> > >> >> > >>> > >> >> >> Jiangjie (Becket) Qin
> > >> >> > >>> > >> >> >>
> > >> >> > >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang"
> > >> >><wangguoz@gmail.com
> > >> >> <javascript:;>>
> > >> >> > >>> wrote:
> > >> >> > >>> > >> >> >>
> > >> >> > >>> > >> >> >> >Yeah I agree we should not silently change the
> > >> >>behavior
> > >> >> of
> > >> >> > >>>the
> > >> >> > >>> > >> >>function
> > >> >> > >>> > >> >> >> >with the given parameters; and I would prefer
> > >> >> > >>> > >> >> >>error-logging-and-shutdown
> > >> >> > >>> > >> >> >> >over blocking when close(>0) is used, since as
> Neha
> > >> >> > >>>suggested
> > >> >> > >>> > >> >>blocking
> > >> >> > >>> > >> >> >> >would also not proceed with sending any data, bu
> > >>will
> > >> >> just
> > >> >> > >>>let
> > >> >> > >>> > >> >>users to
> > >> >> > >>> > >> >> >> >realize the issue later than sooner.
> > >> >> > >>> > >> >> >> >
> > >> >> > >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
> > >> >> > >>> > >><neha@confluent.io <javascript:;>>
> > >> >> > >>> > >> >> >>wrote:
> > >> >> > >>> > >> >> >> >
> > >> >> > >>> > >> >> >> >> >
> > >> >> > >>> > >> >> >> >> > And I also agree it is better if we can make
> > >> >>producer
> > >> >> > >>>block
> > >> >> > >>> > >>when
> > >> >> > >>> > >> >> >> >> > close() is called from sender thread so user
> > >>will
> > >> >> > >>>notice
> > >> >> > >>> > >> >>something
> > >> >> > >>> > >> >> >> >>went
> > >> >> > >>> > >> >> >> >> > wrong.
> > >> >> > >>> > >> >> >> >>
> > >> >> > >>> > >> >> >> >>
> > >> >> > >>> > >> >> >> >> This isn't a great experience either. Why can't
> > >>we
> > >> >>just
> > >> >> > >>>throw
> > >> >> > >>> > >>an
> > >> >> > >>> > >> >> >> >>exception
> > >> >> > >>> > >> >> >> >> for a behavior we know is incorrect and we'd
> like
> > >> >>the
> > >> >> > >>>user to
> > >> >> > >>> > >> >>know.
> > >> >> > >>> > >> >> >> >> Blocking as a means of doing that seems wrong
> and
> > >> >> > >>>annoying.
> > >> >> > >>> > >> >> >> >>
> > >> >> > >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
> > >> >> > >>> > >><jay.kreps@gmail.com <javascript:;>>
> > >> >> > >>> > >> >> >> wrote:
> > >> >> > >>> > >> >> >> >>
> > >> >> > >>> > >> >> >> >> > Cool.
> > >> >> > >>> > >> >> >> >> >
> > >> >> > >>> > >> >> >> >> > I think blocking is good or alternately
> > >>throwing
> > >> >>an
> > >> >> > >>> exception
> > >> >> > >>> > >> >> >>directly
> > >> >> > >>> > >> >> >> >> from
> > >> >> > >>> > >> >> >> >> > close(). Basically I would just worry about
> > >>subtly
> > >> >> > >>>doing
> > >> >> > >>> > >> >>something
> > >> >> > >>> > >> >> >> >> slightly
> > >> >> > >>> > >> >> >> >> > different from what the user asked for as it
> > >>will
> > >> >>be
> > >> >> > >>>hard
> > >> >> > >>> to
> > >> >> > >>> > >> >>notice
> > >> >> > >>> > >> >> >> >>that
> > >> >> > >>> > >> >> >> >> > behavior difference.
> > >> >> > >>> > >> >> >> >> >
> > >> >> > >>> > >> >> >> >> > -Jay
> > >> >> > >>> > >> >> >> >> >
> > >> >> > >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> > >> >> > >>> > >> >> >> >><jqin@linkedin.com.invalid
> > >> >> > >>> > >> >> >> >> >
> > >> >> > >>> > >> >> >> >> > wrote:
> > >> >> > >>> > >> >> >> >> >
> > >> >> > >>> > >> >> >> >> > > Hi Jay,
> > >> >> > >>> > >> >> >> >> > >
> > >> >> > >>> > >> >> >> >> > > I have modified the KIP as you suggested. I
> > >> >>thinks
> > >> >> as
> > >> >> > >>> long
> > >> >> > >>> > >>as
> > >> >> > >>> > >> >>we
> > >> >> > >>> > >> >> >> >>have
> > >> >> > >>> > >> >> >> >> > > consistent define for timeout across Kafka
> > >> >> interface,
> > >> >> > >>> there
> > >> >> > >>> > >> >>would
> > >> >> > >>> > >> >> >> >>be no
> > >> >> > >>> > >> >> >> >> > > problem. And I also agree it is better if
> we
> > >>can
> > >> >> make
> > >> >> > >>> > >>producer
> > >> >> > >>> > >> >> >>block
> > >> >> > >>> > >> >> >> >> when
> > >> >> > >>> > >> >> >> >> > > close() is called from sender thread so
> user
> > >> >>will
> > >> >> > >>>notice
> > >> >> > >>> > >> >> >>something
> > >> >> > >>> > >> >> >> >>went
> > >> >> > >>> > >> >> >> >> > > wrong.
> > >> >> > >>> > >> >> >> >> > >
> > >> >> > >>> > >> >> >> >> > > Thanks.
> > >> >> > >>> > >> >> >> >> > >
> > >> >> > >>> > >> >> >> >> > > Jiangjie (Becket) Qin
> > >> >> > >>> > >> >> >> >> > >
> > >> >> > >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
> > >> >> > >>><jay.kreps@gmail.com <javascript:;>>
> > >> >> > >>> > >> wrote:
> > >> >> > >>> > >> >> >> >> > >
> > >> >> > >>> > >> >> >> >> > > >Hey Jiangjie,
> > >> >> > >>> > >> >> >> >> > > >
> > >> >> > >>> > >> >> >> >> > > >I think this is going to be very confusing
> > >>that
> > >> >> > >>> > >> >> >> >> > > >  close(0) waits indefinitely and
> > >> >> > >>> > >> >> >> >> > > >  close(-1) waits for 0.
> > >> >> > >>> > >> >> >> >> > > >I understand this appears in other apis,
> > >>but it
> > >> >> is a
> > >> >> > >>> > >>constant
> > >> >> > >>> > >> >> >> >>cause of
> > >> >> > >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
> > >> >> > >>> > >> >> >> >> > > >
> > >> >> > >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't
> > >>need a
> > >> >> way
> > >> >> > >>>to
> > >> >> > >>> > >>wait
> > >> >> > >>> > >> >> >> >> indefinitely
> > >> >> > >>> > >> >> >> >> > > >as
> > >> >> > >>> > >> >> >> >> > > >we already have close() so having a
> magical
> > >> >> constant
> > >> >> > >>>for
> > >> >> > >>> > >> >>that is
> > >> >> > >>> > >> >> >> >> > > >redundant.
> > >> >> > >>> > >> >> >> >> > > >
> > >> >> > >>> > >> >> >> >> > > >Calling close() from the I/O thread was
> > >>already
> > >> >> > >>>possible
> > >> >> > >>> > >>and
> > >> >> > >>> > >> >> >>would
> > >> >> > >>> > >> >> >> >> block
> > >> >> > >>> > >> >> >> >> > > >indefinitely. I think trying to silently
> > >>change
> > >> >> the
> > >> >> > >>> > >>behavior
> > >> >> > >>> > >> >>is
> > >> >> > >>> > >> >> >> >> probably
> > >> >> > >>> > >> >> >> >> > > >not right. I.e. if the user calls close()
> in
> > >> >>the
> > >> >> > >>> callback
> > >> >> > >>> > >> >>there
> > >> >> > >>> > >> >> >>is
> > >> >> > >>> > >> >> >> >> > > >actually
> > >> >> > >>> > >> >> >> >> > > >some misunderstanding and they need to
> think
> > >> >>more,
> > >> >> > >>> > >>silently
> > >> >> > >>> > >> >> >>making
> > >> >> > >>> > >> >> >> >> this
> > >> >> > >>> > >> >> >> >> > > >not
> > >> >> > >>> > >> >> >> >> > > >block will hide the problem from them
> which
> > >>is
> > >> >>the
> > >> >> > >>> > >>opposite
> > >> >> > >>> > >> >>of
> > >> >> > >>> > >> >> >> >>what we
> > >> >> > >>> > >> >> >> >> > > >want.
> > >> >> > >>> > >> >> >> >> > > >
> > >> >> > >>> > >> >> >> >> > > >-Jay
> > >> >> > >>> > >> >> >> >> > > >
> > >> >> > >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie
> > >>Qin
> > >> >> > >>> > >> >> >> >> <jqin@linkedin.com.invalid
> > >> >> > >>> > >> >> >> >> > >
> > >> >> > >>> > >> >> >> >> > > >wrote:
> > >> >> > >>> > >> >> >> >> > > >
> > >> >> > >>> > >> >> >> >> > > >> Hey Joe & Jay,
> > >> >> > >>> > >> >> >> >> > > >>
> > >> >> > >>> > >> >> >> >> > > >> Thanks for the comments on the voting
> > >>thread.
> > >> >> > >>>Since it
> > >> >> > >>> > >> >>seems
> > >> >> > >>> > >> >> >>we
> > >> >> > >>> > >> >> >> >> > probably
> > >> >> > >>> > >> >> >> >> > > >> will have more discussion on this, I am
> > >>just
> > >> >> > >>>replying
> > >> >> > >>> > >>from
> > >> >> > >>> > >> >>the
> > >> >> > >>> > >> >> >> >> > > >>discussion
> > >> >> > >>> > >> >> >> >> > > >> thread here.
> > >> >> > >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it
> less
> > >> >>like
> > >> >> > >>> > >>half-baked,
> > >> >> > >>> > >> >> >> >>apologize
> > >> >> > >>> > >> >> >> >> > for
> > >> >> > >>> > >> >> >> >> > > >> the rush...
> > >> >> > >>> > >> >> >> >> > > >>
> > >> >> > >>> > >> >> >> >> > > >> The contract in current KIP is:
> > >> >> > >>> > >> >> >> >> > > >>   1. close() - wait until all requests
> > >>either
> > >> >> are
> > >> >> > >>>sent
> > >> >> > >>> > >>or
> > >> >> > >>> > >> >> >>reach
> > >> >> > >>> > >> >> >> >> > request
> > >> >> > >>> > >> >> >> >> > > >> timeout.
> > >> >> > >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) -
> > >>close
> > >> >> > >>> > >>immediately
> > >> >> > >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) -
> > >> >> equivalent
> > >> >> > >>>to
> > >> >> > >>> > >> >>close(),
> > >> >> > >>> > >> >> >> >>i.e.
> > >> >> > >>> > >> >> >> >> > Wait
> > >> >> > >>> > >> >> >> >> > > >> until all requests are sent or reach
> > >>request
> > >> >> > >>>timeout
> > >> >> > >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) -
> try
> > >> >>the
> > >> >> > >>>best to
> > >> >> > >>> > >> >>finish
> > >> >> > >>> > >> >> >> >> sending
> > >> >> > >>> > >> >> >> >> > > >>in 5
> > >> >> > >>> > >> >> >> >> > > >> milliseconds, if something went wrong,
> > >>just
> > >> >> > >>>shutdown
> > >> >> > >>> the
> > >> >> > >>> > >> >> >>producer
> > >> >> > >>> > >> >> >> >> > > >>anyway,
> > >> >> > >>> > >> >> >> >> > > >> my callback will handle the failures.
> > >> >> > >>> > >> >> >> >> > > >>
> > >> >> > >>> > >> >> >> >> > > >> About how we define what timeout value
> > >>stands
> > >> >> for,
> > >> >> > >>>I
> > >> >> > >>> > >> >>actually
> > >> >> > >>> > >> >> >> >> > struggled
> > >> >> > >>> > >> >> >> >> > > >>a
> > >> >> > >>> > >> >> >> >> > > >> little bit when wrote the patch.
> > >>Intuitively,
> > >> >> > >>>close(0)
> > >> >> > >>> > >> >>should
> > >> >> > >>> > >> >> >> >>mean
> > >> >> > >>> > >> >> >> >> > > >> immediately, however it seems that all
> the
> > >> >> > >>>existing
> > >> >> > >>> java
> > >> >> > >>> > >> >>class
> > >> >> > >>> > >> >> >> >>have
> > >> >> > >>> > >> >> >> >> > this
> > >> >> > >>> > >> >> >> >> > > >> convention of timeout=0 means no timeout
> > >>or
> > >> >> never
> > >> >> > >>> > >>timeout
> > >> >> > >>> > >> >> >> >> > > >>(Thread.join(0),
> > >> >> > >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the
> dilemma
> > >>is
> > >> >> > >>>either we
> > >> >> > >>> > >> >>follow
> > >> >> > >>> > >> >> >>the
> > >> >> > >>> > >> >> >> >> > > >> intuition or we follow the convention.
> > >>What I
> > >> >> > >>>chose is
> > >> >> > >>> > >>to
> > >> >> > >>> > >> >> >>follow
> > >> >> > >>> > >> >> >> >>the
> > >> >> > >>> > >> >> >> >> > > >> convention but document the interface to
> > >>let
> > >> >> user
> > >> >> > >>>be
> > >> >> > >>> > >>aware
> > >> >> > >>> > >> >>of
> > >> >> > >>> > >> >> >>the
> > >> >> > >>> > >> >> >> >> > usage.
> > >> >> > >>> > >> >> >> >> > > >> The reason is that I think
> > >>producer.close()
> > >> >>is a
> > >> >> > >>> public
> > >> >> > >>> > >> >> >> >>interface so
> > >> >> > >>> > >> >> >> >> > it
> > >> >> > >>> > >> >> >> >> > > >> might be better to follow java
> convention.
> > >> >> Whereas
> > >> >> > >>> > >> >>selector is
> > >> >> > >>> > >> >> >> >>not a
> > >> >> > >>> > >> >> >> >> > > >> public interface that used by user, so
> as
> > >> >>long
> > >> >> as
> > >> >> > >>>it
> > >> >> > >>> > >>makes
> > >> >> > >>> > >> >> >>sense
> > >> >> > >>> > >> >> >> >>to
> > >> >> > >>> > >> >> >> >> > us,
> > >> >> > >>> > >> >> >> >> > > >>it
> > >> >> > >>> > >> >> >> >> > > >> is less a problem to be different from
> > >>java
> > >> >> > >>> convention.
> > >> >> > >>> > >> >>That
> > >> >> > >>> > >> >> >>said
> > >> >> > >>> > >> >> >> >> > since
> > >> >> > >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public
> > >> >> interface,
> > >> >> > >>>I
> > >> >> > >>> > >>think
> > >> >> > >>> > >> >>it
> > >> >> > >>> > >> >> >> >>also
> > >> >> > >>> > >> >> >> >> > makes
> > >> >> > >>> > >> >> >> >> > > >> sense to make producer.close() to have
> the
> > >> >>same
> > >> >> > >>> > >>definition
> > >> >> > >>> > >> >>of
> > >> >> > >>> > >> >> >> >> > > >> consumer.poll(timeout).
> > >> >> > >>> > >> >> >> >> > > >>
> > >> >> > >>> > >> >> >> >> > > >> The main argument for keeping a timeout
> in
> > >> >>close
> > >> >> > >>>would
> > >> >> > >>> > >>be
> > >> >> > >>> > >> >> >> >>separating
> > >> >> > >>> > >> >> >> >> > the
> > >> >> > >>> > >> >> >> >> > > >> close timeout from request timeout,
> which
> > >> >> probably
> > >> >> > >>> makes
> > >> >> > >>> > >> >> >>sense. I
> > >> >> > >>> > >> >> >> >> > would
> > >> >> > >>> > >> >> >> >> > > >> guess typically the request timeout
> would
> > >>be
> > >> >> long
> > >> >> > >>> (e.g.
> > >> >> > >>> > >>60
> > >> >> > >>> > >> >> >> >>seconds)
> > >> >> > >>> > >> >> >> >> > > >> because we might want to consider
> retries
> > >> >>with
> > >> >> > >>>back
> > >> >> > >>> off
> > >> >> > >>> > >> >>time.
> > >> >> > >>> > >> >> >>If
> > >> >> > >>> > >> >> >> >>we
> > >> >> > >>> > >> >> >> >> > have
> > >> >> > >>> > >> >> >> >> > > >> multiple batches in accumulator, in
> worst
> > >> >>case
> > >> >> > >>>that
> > >> >> > >>> > >>could
> > >> >> > >>> > >> >>take
> > >> >> > >>> > >> >> >> >>up to
> > >> >> > >>> > >> >> >> >> > > >> several minutes to complete all the
> > >>requests.
> > >> >> But
> > >> >> > >>>when
> > >> >> > >>> > >>we
> > >> >> > >>> > >> >> >>close a
> > >> >> > >>> > >> >> >> >> > > >> producer, we might not want to wait for
> > >>that
> > >> >> long
> > >> >> > >>>as
> > >> >> > >>> it
> > >> >> > >>> > >> >>might
> > >> >> > >>> > >> >> >> >>cause
> > >> >> > >>> > >> >> >> >> > some
> > >> >> > >>> > >> >> >> >> > > >> other problem like deployment tool
> > >>timeout.
> > >> >> > >>> > >> >> >> >> > > >>
> > >> >> > >>> > >> >> >> >> > > >> There is also a subtle difference
> between
> > >> >> > >>> close(timeout)
> > >> >> > >>> > >> >>and
> > >> >> > >>> > >> >> >> >> > > >> flush(timeout). The only purpose for
> > >>flush()
> > >> >>is
> > >> >> to
> > >> >> > >>> write
> > >> >> > >>> > >> >>data
> > >> >> > >>> > >> >> >>to
> > >> >> > >>> > >> >> >> >>the
> > >> >> > >>> > >> >> >> >> > > >> broker, so it makes perfect sense to
> wait
> > >> >>until
> > >> >> > >>> request
> > >> >> > >>> > >> >> >>timeout.
> > >> >> > >>> > >> >> >> >>I
> > >> >> > >>> > >> >> >> >> > think
> > >> >> > >>> > >> >> >> >> > > >> that is why flush(timeout) looks
> strange.
> > >>On
> > >> >>the
> > >> >> > >>>other
> > >> >> > >>> > >> >>hand,
> > >> >> > >>> > >> >> >>the
> > >> >> > >>> > >> >> >> >>top
> > >> >> > >>> > >> >> >> >> > > >> priority for close() is to close the
> > >>producer
> > >> >> > >>>rather
> > >> >> > >>> > >>than
> > >> >> > >>> > >> >> >>flush()
> > >> >> > >>> > >> >> >> >> > data,
> > >> >> > >>> > >> >> >> >> > > >>so
> > >> >> > >>> > >> >> >> >> > > >> close(timeout) gives guarantee on
> bounded
> > >> >> waiting
> > >> >> > >>>for
> > >> >> > >>> > >>its
> > >> >> > >>> > >> >>main
> > >> >> > >>> > >> >> >> >>job.
> > >> >> > >>> > >> >> >> >> > > >>
> > >> >> > >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose
> > >> >>flag.
> > >> >> It
> > >> >> > >>>is
> > >> >> > >>> > >>not a
> > >> >> > >>> > >> >> >>public
> > >> >> > >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed
> > >>Changes
> > >> >> > >>>section
> > >> >> > >>> > >> >>which I
> > >> >> > >>> > >> >> >> >> thought
> > >> >> > >>> > >> >> >> >> > > >>was
> > >> >> > >>> > >> >> >> >> > > >> supposed to provide implementation
> > >>details.
> > >> >> > >>> > >> >> >> >> > > >>
> > >> >> > >>> > >> >> >> >> > > >> Thanks again for all the comments and
> > >> >> suggestions!
> > >> >> > >>> > >> >> >> >> > > >>
> > >> >> > >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
> > >> >> > >>> > >> >> >> >> > > >>
> > >> >> > >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
> > >> >> > >>> jqin@linkedin.com <javascript:;>>
> > >> >> > >>> > >> >> wrote:
> > >> >> > >>> > >> >> >> >> > > >>
> > >> >> > >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s
> > >> >> comments.
> > >> >> > >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process
> > >>if
> > >> >>no
> > >> >> > >>>further
> > >> >> > >>> > >> >> >>comments
> > >> >> > >>> > >> >> >> >>are
> > >> >> > >>> > >> >> >> >> > > >> >received by tomorrow.
> > >> >> > >>> > >> >> >> >> > > >> >
> > >> >> > >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
> > >> >> > >>> > >> >> >> >> > > >> >
> > >> >> > >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
> > >> >> > >>><jay.kreps@gmail.com <javascript:;>
> > >> >> > >>> >
> > >> >> > >>> > >> >>wrote:
> > >> >> > >>> > >> >> >> >> > > >> >
> > >> >> > >>> > >> >> >> >> > > >> >>Hey Jiangjie,
> > >> >> > >>> > >> >> >> >> > > >> >>
> > >> >> > >>> > >> >> >> >> > > >> >>Can you capture the full motivation
> and
> > >>use
> > >> >> > >>>cases
> > >> >> > >>> for
> > >> >> > >>> > >>the
> > >> >> > >>> > >> >> >> >>feature?
> > >> >> > >>> > >> >> >> >> > > >>This
> > >> >> > >>> > >> >> >> >> > > >> >>mentions your interest in having a way
> > >>of
> > >> >> > >>>aborting
> > >> >> > >>> > >>from
> > >> >> > >>> > >> >> >>inside
> > >> >> > >>> > >> >> >> >>the
> > >> >> > >>> > >> >> >> >> > > >> >>Callback. But it doesn't really
> explain
> > >> >>that
> > >> >> > >>>usage
> > >> >> > >>> or
> > >> >> > >>> > >>why
> > >> >> > >>> > >> >> >>other
> > >> >> > >>> > >> >> >> >> > people
> > >> >> > >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't
> > >>list
> > >> >> the
> > >> >> > >>> > >>primary
> > >> >> > >>> > >> >>use
> > >> >> > >>> > >> >> >> >>case
> > >> >> > >>> > >> >> >> >> for
> > >> >> > >>> > >> >> >> >> > > >> >>having
> > >> >> > >>> > >> >> >> >> > > >> >>close with a bounded timeout which was
> > >>to
> > >> >> avoid
> > >> >> > >>> > >>blocking
> > >> >> > >>> > >> >>too
> > >> >> > >>> > >> >> >> >>long
> > >> >> > >>> > >> >> >> >> on
> > >> >> > >>> > >> >> >> >> > > >> >>shutdown.
> > >> >> > >>> > >> >> >> >> > > >> >>
> > >> >> > >>> > >> >> >> >> > > >> >>-Jay
> > >> >> > >>> > >> >> >> >> > > >> >>
> > >> >> > >>> > >> >> >> >> > > >> >>
> > >> >> > >>> > >> >> >> >> > > >> >>
> > >> >> > >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM,
> > >>Jiangjie
> > >> >>Qin
> > >> >> > >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
> > >> >> > >>> > >> >> >> >> > > >> >
> > >> >> > >>> > >> >> >> >> > > >> >>wrote:
> > >> >> > >>> > >> >> >> >> > > >> >>
> > >> >> > >>> > >> >> >> >> > > >> >>> Hi,
> > >> >> > >>> > >> >> >> >> > > >> >>>
> > >> >> > >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
> > >> >> > >>>close(timeout)
> > >> >> > >>> to
> > >> >> > >>> > >>new
> > >> >> > >>> > >> >> >> >> producer.
> > >> >> > >>> > >> >> >> >> > > >>Most
> > >> >> > >>> > >> >> >> >> > > >> >>>of
> > >> >> > >>> > >> >> >> >> > > >> >>> the previous discussions are in
> > >> >>KAFKA-1660
> > >> >> > >>>where
> > >> >> > >>> > >>Parth
> > >> >> > >>> > >> >> >> >> Brahmbhatt
> > >> >> > >>> > >> >> >> >> > > >>has
> > >> >> > >>> > >> >> >> >> > > >> >>> already done a lot of work.
> > >> >> > >>> > >> >> >> >> > > >> >>> Since this is an interface change so
> > >>we
> > >> >>are
> > >> >> > >>>going
> > >> >> > >>> > >> >>through
> > >> >> > >>> > >> >> >>the
> > >> >> > >>> > >> >> >> >> KIP
> > >> >> > >>> > >> >> >> >> > > >> >>>process.
> > >> >> > >>> > >> >> >> >> > > >> >>> Here is the KIP link:
> > >> >> > >>> > >> >> >> >> > > >> >>>
> > >> >> > >>> > >> >> >> >> > > >> >>>
> > >> >> > >>> > >> >> >> >> > > >>
> > >> >> > >>> > >> >> >> >> > >
> > >> >> > >>> > >> >> >> >>
> > >> >> > >>> > >> >> >>
> > >> >> > >>> > >> >>
> > >> >> > >>> > >>
> > >> >> > >>> >
> > >> >> > >>>
> > >> >> >
> > >>
> > >>>>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=537397
> > >>>>8
> > >> >> > >>> > >> >> >> >> > > >> >>>2
> > >> >> > >>> > >> >> >> >> > > >> >>>
> > >> >> > >>> > >> >> >> >> > > >> >>> Thanks.
> > >> >> > >>> > >> >> >> >> > > >> >>>
> > >> >> > >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> > >> >> > >>> > >> >> >> >> > > >> >>>
> > >> >> > >>> > >> >> >> >> > > >> >
> > >> >> > >>> > >> >> >> >> > > >>
> > >> >> > >>> > >> >> >> >> > > >>
> > >> >> > >>> > >> >> >> >> > >
> > >> >> > >>> > >> >> >> >> > >
> > >> >> > >>> > >> >> >> >> >
> > >> >> > >>> > >> >> >> >>
> > >> >> > >>> > >> >> >> >>
> > >> >> > >>> > >> >> >> >>
> > >> >> > >>> > >> >> >> >> --
> > >> >> > >>> > >> >> >> >> Thanks,
> > >> >> > >>> > >> >> >> >> Neha
> > >> >> > >>> > >> >> >> >>
> > >> >> > >>> > >> >> >> >
> > >> >> > >>> > >> >> >> >
> > >> >> > >>> > >> >> >> >
> > >> >> > >>> > >> >> >> >--
> > >> >> > >>> > >> >> >> >-- Guozhang
> > >> >> > >>> > >> >> >>
> > >> >> > >>> > >> >> >>
> > >> >> > >>> > >> >> >
> > >> >> > >>> > >> >> >
> > >> >> > >>> > >> >> >--
> > >> >> > >>> > >> >> >-- Guozhang
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >>
> > >> >> > >>> > >> >
> > >> >> > >>> > >> >
> > >> >> > >>> > >> >--
> > >> >> > >>> > >> >-- Guozhang
> > >> >> > >>> > >>
> > >> >> > >>> > >>
> > >> >> > >>> >
> > >> >> > >>> >
> > >> >> > >>>
> > >> >> > >>
> > >> >> > >>
> > >> >> > >>
> > >> >> > >>--
> > >> >> > >>-- Guozhang
> > >> >> > >
> > >> >> >
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >> >--
> > >> >Sent from Gmail Mobile
> > >>
> > >>
> >
> >
>



-- 
-- Guozhang

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jun Rao <ju...@confluent.io>.

Yes, that probably is simpler. Basically, close() and close(timeout) will
always wait for the sender thread to complete. Close(0) will just initiate
the shutdown of the sender thread, but not waiting for the thread to
complete.

Thanks,

Jun

On Fri, Mar 20, 2015 at 11:29 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> That’s a very good point.
> Currently if timeout > 0, it will wait up to timeout for the sender thread
> to complete. After that it will do a force close but not block.
>
> Maybe we should make it to be:
> If timeout > 0, wait up to timeout for the sender thread to complete.
> After that, make a force close but still try to join the sender thread.
> If timeout = 0, it is non-blocking.
>
> Jiangjie (Becket) Qin
>
> On 3/20/15, 9:16 AM, "Jun Rao" <ju...@confluent.io> wrote:
>
> >Got it. Just to clarify, does close(timeout) always wait for the sender
> >thread to complete?
> >
> >Thanks,
> >
> >Jun
> >
> >On Fri, Mar 20, 2015 at 7:41 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
> >wrote:
> >
> >> I think the we agreed that we are going to log an error and block. By
> >> doing this we can make sure the error log to be checked by user in all
> >> cases.
> >> If we silently replace close() to close(0) in sender thread, in some
> >>cases
> >> such as send error during a normal close(), user might not notice
> >> something went wrong.
> >>
> >> On 3/19/15, 10:34 PM, "Joel Koshy" <jj...@gmail.com> wrote:
> >>
> >> >close can probably just getCurrentThread and check if == senderThread.
> >> >I'm actually not sure from this thread if there was clear agreement on
> >> >whether it should change close(timeout)/close() to close(0) or if it
> >> >should
> >> >log an error and block up to the timeout.
> >> >
> >> >On Thursday, March 19, 2015, Jun Rao <ju...@confluent.io> wrote:
> >> >
> >> >> So in (1), if a close() or close(timeout) is called from a callback,
> >>we
> >> >> will just turn that into a close(0)? Implementation wise, how do we
> >>know
> >> >> whether a close() call is made from the sender thread or not?
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Jun
> >> >>
> >> >> On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin
> >> >><jq...@linkedin.com.invalid>
> >> >> wrote:
> >> >>
> >> >> > It looks we have another option and are now deciding between the
> >> >> following
> >> >> > two interfaces:
> >> >> >
> >> >> > 1. Close() + close(timeout)
> >> >> >   - timeout could be either positive or zero.
> >> >> >   - only close(0) can be called from sender thread
> >> >> >
> >> >> > 2. Close() + abort() + close(timeout)
> >> >> >   - timeout can either be positive or zero
> >> >> >   - only abort() can be called from sender thread
> >> >> >
> >> >> >   - abort() is equivalent to close(0) in 1) but does not join
> >>sender
> >> >> > thread and does not close metrics.
> >> >> >   - Another thread has to call close() or close(timeout) in order
> >>to
> >> >>make
> >> >> > sure the resources in producer are gone.
> >> >> >
> >> >> > The tow approach provides the same function we need, the
> >>difference is
> >> >> > approach 2) follows convention of close() and abort(). On the other
> >> >>hand,
> >> >> > approach 1) saves one interface compared with approach 2) but does
> >>not
> >> >> > follow the convention.
> >> >> >
> >> >> > When the two approaches come to user code, it is probably something
> >> >>like
> >> >> > this:
> >> >> >
> >> >> > Try {
> >> >> >   While(!finished)
> >> >> >     Producer.send(record, callback)
> >> >> > } catch (Exception e) {
> >> >> >   Producer.close(5)
> >> >> > }
> >> >> >
> >> >> > Class CallbackImpl implements Callback {
> >> >> >   onCompletion(RecordMetadata metadata Exception e) {
> >> >> >     If (e != null)
> >> >> >       Abort() / close()
> >> >> >   }
> >> >> > }
> >> >> >
> >> >> > Because the two approach leads to almost the same user code,
> >>assuming
> >> >> > users are always calling producer.close() as a clean up step,
> >> >>personally
> >> >> I
> >> >> > prefer approach 2) as it follows convention.
> >> >> >
> >> >> > Any thoughts?
> >> >> >
> >> >> > Jiangjie (Becket) Qin
> >> >> >
> >> >> >
> >> >> > On 3/17/15, 10:25 AM, "Jiangjie Qin" <jqin@linkedin.com
> >> >><javascript:;>>
> >> >> wrote:
> >> >> >
> >> >> > >Hi Jun,
> >> >> > >
> >> >> > >Yes, as Guozhang said, the main reason we set a flag is because
> >> >>close(0)
> >> >> > >is expected to be called by sender thread itself.
> >> >> > >If we want to maintain the semantic meaning of close(), one
> >> >>alternative
> >> >> is
> >> >> > >to have an abort() method does the same thing as close(0) except
> >> >> cleanup.
> >> >> > >And in close(timeout), after timeout we call abort() and join the
> >> >>sender
> >> >> > >thread. This was one of the previous proposal. We merged abort to
> >> >> close(0)
> >> >> > >because they are almost doing the same thing. But from what you
> >> >> mentioned,
> >> >> > >it might make sense to have two separate methods.
> >> >> > >
> >> >> > >Thanks.
> >> >> > >
> >> >> > >Jiangjie (Becket) Qin
> >> >> > >
> >> >> > >On 3/16/15, 10:31 PM, "Guozhang Wang" <wangguoz@gmail.com
> >> >> <javascript:;>> wrote:
> >> >> > >
> >> >> > >>Yeah in this sense the sender thread will not exist immediately
> >>in
> >> >>the
> >> >> > >>close(0) call, but will only terminate after the current response
> >> >>batch
> >> >> > >>has
> >> >> > >>been processed, as will the producer instance itself.
> >> >> > >>
> >> >> > >>There is a reason for this though: for a clean shutdown the
> >>caller
> >> >> thread
> >> >> > >>has to wait for the sender thread to join before closing the
> >> >>producer
> >> >> > >>instance, but this cannot be achieve if close(0) is called by the
> >> >> sender
> >> >> > >>thread itself (for example in KAFKA-1659, there is a proposal
> >>from
> >> >> Andrew
> >> >> > >>Stein on using thread.interrupt and thread.stop, but if it is
> >> >>called by
> >> >> > >>the
> >> >> > >>ioThread itself the stop call will fail). Hence we came up with
> >>the
> >> >> flag
> >> >> > >>approach to let the sender thread to close as soon as it is at
> >>the
> >> >> > >>barrier
> >> >> > >>of the run loop.
> >> >> > >>
> >> >> > >>Guozhang
> >> >> > >>
> >> >> > >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <jun@confluent.io
> >> >> <javascript:;>> wrote:
> >> >> > >>
> >> >> > >>> Hmm, does that mean that after close(0), the sender thread is
> >>not
> >> >> > >>>necessary
> >> >> > >>> gone? Normally, after closing an entity, we expect all internal
> >> >> threads
> >> >> > >>> associated with the entity are shut down completely.
> >> >> > >>>
> >> >> > >>> Thanks,
> >> >> > >>>
> >> >> > >>> Jun
> >> >> > >>>
> >> >> > >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
> >> >> > >>><jq...@linkedin.com.invalid>
> >> >> > >>> wrote:
> >> >> > >>>
> >> >> > >>> > Hi Jun,
> >> >> > >>> >
> >> >> > >>> > Close(0) will set two flags in sender. Running=false and a
> >>newly
> >> >> > >>>added
> >> >> > >>> > forceClose=true. It will also set accumulator.closed=true so
> >>no
> >> >> > >>>further
> >> >> > >>> > producer.send() will succeed.
> >> >> > >>> > The sender thread will finish executing all the callbacks in
> >> >> current
> >> >> > >>> batch
> >> >> > >>> > of responses, then it will see the forceClose flag. It will
> >>just
> >> >> fail
> >> >> > >>>all
> >> >> > >>> > the incomplete batches in the producer and exit.
> >> >> > >>> > So close(0) is a non-blocking call and sender thread will not
> >> >>try
> >> >> to
> >> >> > >>>join
> >> >> > >>> > itself in close(0).
> >> >> > >>> >
> >> >> > >>> > Thanks.
> >> >> > >>> >
> >> >> > >>> > Jiangjie (Becket) Qin
> >> >> > >>> >
> >> >> > >>> > On 3/16/15, 2:50 PM, "Jun Rao" <jun@confluent.io
> >> <javascript:;>>
> >> >> wrote:
> >> >> > >>> >
> >> >> > >>> > >How does close(0) work if it's called from the sender
> >>thread?
> >> >>If
> >> >> > >>> close(0)
> >> >> > >>> > >needs to wait for the sender thread to join, wouldn't this
> >> >>cause a
> >> >> > >>> > >deadlock?
> >> >> > >>> > >
> >> >> > >>> > >Thanks,
> >> >> > >>> > >
> >> >> > >>> > >Jun
> >> >> > >>> > >
> >> >> > >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
> >> >> > >>><jqin@linkedin.com.invalid
> >> >> > >>> >
> >> >> > >>> > >wrote:
> >> >> > >>> > >
> >> >> > >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered
> >> >>without
> >> >> > >>> > >> discussing with you :)
> >> >> > >>> > >>
> >> >> > >>> > >> Jiangjie (Becket) Qin
> >> >> > >>> > >>
> >> >> > >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wangguoz@gmail.com
> >> >> <javascript:;>> wrote:
> >> >> > >>> > >>
> >> >> > >>> > >> >Thanks Jiangjie,
> >> >> > >>> > >> >
> >> >> > >>> > >> >After talking to you offline on this, I have been
> >>convinced
> >> >>and
> >> >> > >>> > >>changed my
> >> >> > >>> > >> >preference to blocking. The immediate shutdown approach
> >>does
> >> >> have
> >> >> > >>> some
> >> >> > >>> > >> >unsafeness in some cases.
> >> >> > >>> > >> >
> >> >> > >>> > >> >Guozhang
> >> >> > >>> > >> >
> >> >> > >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
> >> >> > >>> > >><jqin@linkedin.com.invalid
> >> >> > >>> > >> >
> >> >> > >>> > >> >wrote:
> >> >> > >>> > >> >
> >> >> > >>> > >> >> It looks that the problem we want to solve and the
> >> >>purpose we
> >> >> > >>>want
> >> >> > >>> to
> >> >> > >>> > >> >> achieve is:
> >> >> > >>> > >> >> If user uses close() in callback, we want to let user
> >>be
> >> >> aware
> >> >> > >>>that
> >> >> > >>> > >>they
> >> >> > >>> > >> >> should use close(0) instead of close() in the callback.
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> We have agreed that we will have an error log to inform
> >> >>user
> >> >> > >>>about
> >> >> > >>> > >>this
> >> >> > >>> > >> >> mis-usage. The options differ in the way how we can
> >>force
> >> >> user
> >> >> > >>>to
> >> >> > >>> > >>take a
> >> >> > >>> > >> >> look at that error log.
> >> >> > >>> > >> >> There are two scenarios:
> >> >> > >>> > >> >> 1. User does not expect the program to exit.
> >> >> > >>> > >> >> 2. User expect the program to exit.
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> For scenario 1), blocking will probably delay the
> >> >>discovery
> >> >> of
> >> >> > >>>the
> >> >> > >>> > >> >> problem. Calling close(0) exposes the problem quicker.
> >>In
> >> >> this
> >> >> > >>> > >>scenario
> >> >> > >>> > >> >> producer just encounter a send failure when running
> >> >>normally.
> >> >> > >>> > >> >> For scenario 2), blocking will expose the problem
> >>quick.
> >> >> > >>>Calling
> >> >> > >>> > >> >>close(-1)
> >> >> > >>> > >> >> might hide the problem. This scenario might include: a)
> >> >>Unit
> >> >> > >>>test
> >> >> > >>> > >>for a
> >> >> > >>> > >> >> send failure. b) Message sending during a close() call
> >> >>from a
> >> >> > >>>user
> >> >> > >>> > >> >>thread.
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> So as a summary table:
> >> >> > >>> > >> >>
> >> >> > >>> > >> >>                   Scenario 1)
> >> >>  Scenario
> >> >> > >>>2)
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> Blocking      Delay problem discovery
> >>Guaranteed
> >> >> > >>>problem
> >> >> > >>> > >> >>discovery
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> Close(-1)     Immediate problem discovery     Problem
> >> >>might
> >> >> be
> >> >> > >>> hidden
> >> >> > >>> > >> >>
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> Personally I prefer blocking because it seems providing
> >> >>more
> >> >> > >>> > >>guarantees
> >> >> > >>> > >> >> and safer.
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> Thanks.
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> Jiangjie (Becket) Qin
> >> >> > >>> > >> >>
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang"
> >><wangguoz@gmail.com
> >> >> <javascript:;>>
> >> >> > >>>wrote:
> >> >> > >>> > >> >>
> >> >> > >>> > >> >> >HI Jiangjie,
> >> >> > >>> > >> >> >
> >> >> > >>> > >> >> >As far as I understand calling close() in the
> >>ioThread is
> >> >> not
> >> >> > >>> > >>common,
> >> >> > >>> > >> >>as
> >> >> > >>> > >> >> >it
> >> >> > >>> > >> >> >may only trigger when we saw some non-retriable error.
> >> >>Hence
> >> >> > >>>when
> >> >> > >>> > >>user
> >> >> > >>> > >> >>run
> >> >> > >>> > >> >> >their program it is unlikely that close() will be
> >> >>triggered
> >> >> > >>>and
> >> >> > >>> > >>problem
> >> >> > >>> > >> >> >will be detected. So it seems to me that from the
> >>error
> >> >> > >>>detection
> >> >> > >>> > >> >>aspect
> >> >> > >>> > >> >> >these two options seems to be the same as people will
> >> >> usually
> >> >> > >>> > >>detect it
> >> >> > >>> > >> >> >from the producer metrics all dropping to 0.
> >> >> > >>> > >> >> >
> >> >> > >>> > >> >> >Guozhang
> >> >> > >>> > >> >> >
> >> >> > >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> >> >> > >>> > >> >><jq...@linkedin.com.invalid>
> >> >> > >>> > >> >> >wrote:
> >> >> > >>> > >> >> >
> >> >> > >>> > >> >> >> It seems there are two options we can choose from
> >>when
> >> >> > >>>close()
> >> >> > >>> is
> >> >> > >>> > >> >>called
> >> >> > >>> > >> >> >> from sender thread (callback):
> >> >> > >>> > >> >> >> 1. Log an error and close the producer using
> >>close(-1)
> >> >> > >>> > >> >> >> 2. Log an error and block.
> >> >> > >>> > >> >> >> (Throwing an exception will not work because we
> >>catch
> >> >>all
> >> >> > >>>the
> >> >> > >>> > >> >>exception
> >> >> > >>> > >> >> >> thrown from user callback. It will just lead to an
> >> >>error
> >> >> > >>>log.)
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> My concern for the first option is that the producer
> >> >>will
> >> >> be
> >> >> > >>> > >>closed
> >> >> > >>> > >> >>even
> >> >> > >>> > >> >> >> if we logged and error. I am wondering if some user
> >> >>would
> >> >> > >>>not
> >> >> > >>> even
> >> >> > >>> > >> >>take
> >> >> > >>> > >> >> >>a
> >> >> > >>> > >> >> >> look at the log if producer is closed normally.
> >>Because
> >> >> from
> >> >> > >>>the
> >> >> > >>> > >> >> >>programs
> >> >> > >>> > >> >> >> behavior, everything looks good. If that is the
> >>case,
> >> >>the
> >> >> > >>>error
> >> >> > >>> > >> >>message
> >> >> > >>> > >> >> >>we
> >> >> > >>> > >> >> >> logged probably will just be ignored until some day
> >> >>when
> >> >> > >>>people
> >> >> > >>> > >>check
> >> >> > >>> > >> >> >>the
> >> >> > >>> > >> >> >> log and see it.
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> As for the second option, because producer does not
> >> >>close
> >> >> > >>>but
> >> >> > >>> > >>blocks.
> >> >> > >>> > >> >> >>User
> >> >> > >>> > >> >> >> will notice this the first time they run the
> >>program.
> >> >>They
> >> >> > >>> > >>probably
> >> >> > >>> > >> >>will
> >> >> > >>> > >> >> >> look at the log to see why producer could not be
> >>closed
> >> >> and
> >> >> > >>>they
> >> >> > >>> > >>will
> >> >> > >>> > >> >> >>see
> >> >> > >>> > >> >> >> the error log we put there. So they will get
> >>informed
> >> >> about
> >> >> > >>>this
> >> >> > >>> > >> >> >>mis-usage
> >> >> > >>> > >> >> >> of close() in sender thread the first time they run
> >>the
> >> >> code
> >> >> > >>> > >>instead
> >> >> > >>> > >> >>of
> >> >> > >>> > >> >> >> some time later.
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> Personally I prefer the second one because it is
> >>more
> >> >> > >>>obvious
> >> >> > >>> that
> >> >> > >>> > >> >> >> something was wrong.
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> Jiangjie (Becket) Qin
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang"
> >> >><wangguoz@gmail.com
> >> >> <javascript:;>>
> >> >> > >>> wrote:
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >> >Yeah I agree we should not silently change the
> >> >>behavior
> >> >> of
> >> >> > >>>the
> >> >> > >>> > >> >>function
> >> >> > >>> > >> >> >> >with the given parameters; and I would prefer
> >> >> > >>> > >> >> >>error-logging-and-shutdown
> >> >> > >>> > >> >> >> >over blocking when close(>0) is used, since as Neha
> >> >> > >>>suggested
> >> >> > >>> > >> >>blocking
> >> >> > >>> > >> >> >> >would also not proceed with sending any data, bu
> >>will
> >> >> just
> >> >> > >>>let
> >> >> > >>> > >> >>users to
> >> >> > >>> > >> >> >> >realize the issue later than sooner.
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
> >> >> > >>> > >><neha@confluent.io <javascript:;>>
> >> >> > >>> > >> >> >>wrote:
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> >> >
> >> >> > >>> > >> >> >> >> > And I also agree it is better if we can make
> >> >>producer
> >> >> > >>>block
> >> >> > >>> > >>when
> >> >> > >>> > >> >> >> >> > close() is called from sender thread so user
> >>will
> >> >> > >>>notice
> >> >> > >>> > >> >>something
> >> >> > >>> > >> >> >> >>went
> >> >> > >>> > >> >> >> >> > wrong.
> >> >> > >>> > >> >> >> >>
> >> >> > >>> > >> >> >> >>
> >> >> > >>> > >> >> >> >> This isn't a great experience either. Why can't
> >>we
> >> >>just
> >> >> > >>>throw
> >> >> > >>> > >>an
> >> >> > >>> > >> >> >> >>exception
> >> >> > >>> > >> >> >> >> for a behavior we know is incorrect and we'd like
> >> >>the
> >> >> > >>>user to
> >> >> > >>> > >> >>know.
> >> >> > >>> > >> >> >> >> Blocking as a means of doing that seems wrong and
> >> >> > >>>annoying.
> >> >> > >>> > >> >> >> >>
> >> >> > >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
> >> >> > >>> > >><jay.kreps@gmail.com <javascript:;>>
> >> >> > >>> > >> >> >> wrote:
> >> >> > >>> > >> >> >> >>
> >> >> > >>> > >> >> >> >> > Cool.
> >> >> > >>> > >> >> >> >> >
> >> >> > >>> > >> >> >> >> > I think blocking is good or alternately
> >>throwing
> >> >>an
> >> >> > >>> exception
> >> >> > >>> > >> >> >>directly
> >> >> > >>> > >> >> >> >> from
> >> >> > >>> > >> >> >> >> > close(). Basically I would just worry about
> >>subtly
> >> >> > >>>doing
> >> >> > >>> > >> >>something
> >> >> > >>> > >> >> >> >> slightly
> >> >> > >>> > >> >> >> >> > different from what the user asked for as it
> >>will
> >> >>be
> >> >> > >>>hard
> >> >> > >>> to
> >> >> > >>> > >> >>notice
> >> >> > >>> > >> >> >> >>that
> >> >> > >>> > >> >> >> >> > behavior difference.
> >> >> > >>> > >> >> >> >> >
> >> >> > >>> > >> >> >> >> > -Jay
> >> >> > >>> > >> >> >> >> >
> >> >> > >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> >> >> > >>> > >> >> >> >><jqin@linkedin.com.invalid
> >> >> > >>> > >> >> >> >> >
> >> >> > >>> > >> >> >> >> > wrote:
> >> >> > >>> > >> >> >> >> >
> >> >> > >>> > >> >> >> >> > > Hi Jay,
> >> >> > >>> > >> >> >> >> > >
> >> >> > >>> > >> >> >> >> > > I have modified the KIP as you suggested. I
> >> >>thinks
> >> >> as
> >> >> > >>> long
> >> >> > >>> > >>as
> >> >> > >>> > >> >>we
> >> >> > >>> > >> >> >> >>have
> >> >> > >>> > >> >> >> >> > > consistent define for timeout across Kafka
> >> >> interface,
> >> >> > >>> there
> >> >> > >>> > >> >>would
> >> >> > >>> > >> >> >> >>be no
> >> >> > >>> > >> >> >> >> > > problem. And I also agree it is better if we
> >>can
> >> >> make
> >> >> > >>> > >>producer
> >> >> > >>> > >> >> >>block
> >> >> > >>> > >> >> >> >> when
> >> >> > >>> > >> >> >> >> > > close() is called from sender thread so user
> >> >>will
> >> >> > >>>notice
> >> >> > >>> > >> >> >>something
> >> >> > >>> > >> >> >> >>went
> >> >> > >>> > >> >> >> >> > > wrong.
> >> >> > >>> > >> >> >> >> > >
> >> >> > >>> > >> >> >> >> > > Thanks.
> >> >> > >>> > >> >> >> >> > >
> >> >> > >>> > >> >> >> >> > > Jiangjie (Becket) Qin
> >> >> > >>> > >> >> >> >> > >
> >> >> > >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
> >> >> > >>><jay.kreps@gmail.com <javascript:;>>
> >> >> > >>> > >> wrote:
> >> >> > >>> > >> >> >> >> > >
> >> >> > >>> > >> >> >> >> > > >Hey Jiangjie,
> >> >> > >>> > >> >> >> >> > > >
> >> >> > >>> > >> >> >> >> > > >I think this is going to be very confusing
> >>that
> >> >> > >>> > >> >> >> >> > > >  close(0) waits indefinitely and
> >> >> > >>> > >> >> >> >> > > >  close(-1) waits for 0.
> >> >> > >>> > >> >> >> >> > > >I understand this appears in other apis,
> >>but it
> >> >> is a
> >> >> > >>> > >>constant
> >> >> > >>> > >> >> >> >>cause of
> >> >> > >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
> >> >> > >>> > >> >> >> >> > > >
> >> >> > >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't
> >>need a
> >> >> way
> >> >> > >>>to
> >> >> > >>> > >>wait
> >> >> > >>> > >> >> >> >> indefinitely
> >> >> > >>> > >> >> >> >> > > >as
> >> >> > >>> > >> >> >> >> > > >we already have close() so having a magical
> >> >> constant
> >> >> > >>>for
> >> >> > >>> > >> >>that is
> >> >> > >>> > >> >> >> >> > > >redundant.
> >> >> > >>> > >> >> >> >> > > >
> >> >> > >>> > >> >> >> >> > > >Calling close() from the I/O thread was
> >>already
> >> >> > >>>possible
> >> >> > >>> > >>and
> >> >> > >>> > >> >> >>would
> >> >> > >>> > >> >> >> >> block
> >> >> > >>> > >> >> >> >> > > >indefinitely. I think trying to silently
> >>change
> >> >> the
> >> >> > >>> > >>behavior
> >> >> > >>> > >> >>is
> >> >> > >>> > >> >> >> >> probably
> >> >> > >>> > >> >> >> >> > > >not right. I.e. if the user calls close() in
> >> >>the
> >> >> > >>> callback
> >> >> > >>> > >> >>there
> >> >> > >>> > >> >> >>is
> >> >> > >>> > >> >> >> >> > > >actually
> >> >> > >>> > >> >> >> >> > > >some misunderstanding and they need to think
> >> >>more,
> >> >> > >>> > >>silently
> >> >> > >>> > >> >> >>making
> >> >> > >>> > >> >> >> >> this
> >> >> > >>> > >> >> >> >> > > >not
> >> >> > >>> > >> >> >> >> > > >block will hide the problem from them which
> >>is
> >> >>the
> >> >> > >>> > >>opposite
> >> >> > >>> > >> >>of
> >> >> > >>> > >> >> >> >>what we
> >> >> > >>> > >> >> >> >> > > >want.
> >> >> > >>> > >> >> >> >> > > >
> >> >> > >>> > >> >> >> >> > > >-Jay
> >> >> > >>> > >> >> >> >> > > >
> >> >> > >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie
> >>Qin
> >> >> > >>> > >> >> >> >> <jqin@linkedin.com.invalid
> >> >> > >>> > >> >> >> >> > >
> >> >> > >>> > >> >> >> >> > > >wrote:
> >> >> > >>> > >> >> >> >> > > >
> >> >> > >>> > >> >> >> >> > > >> Hey Joe & Jay,
> >> >> > >>> > >> >> >> >> > > >>
> >> >> > >>> > >> >> >> >> > > >> Thanks for the comments on the voting
> >>thread.
> >> >> > >>>Since it
> >> >> > >>> > >> >>seems
> >> >> > >>> > >> >> >>we
> >> >> > >>> > >> >> >> >> > probably
> >> >> > >>> > >> >> >> >> > > >> will have more discussion on this, I am
> >>just
> >> >> > >>>replying
> >> >> > >>> > >>from
> >> >> > >>> > >> >>the
> >> >> > >>> > >> >> >> >> > > >>discussion
> >> >> > >>> > >> >> >> >> > > >> thread here.
> >> >> > >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less
> >> >>like
> >> >> > >>> > >>half-baked,
> >> >> > >>> > >> >> >> >>apologize
> >> >> > >>> > >> >> >> >> > for
> >> >> > >>> > >> >> >> >> > > >> the rush...
> >> >> > >>> > >> >> >> >> > > >>
> >> >> > >>> > >> >> >> >> > > >> The contract in current KIP is:
> >> >> > >>> > >> >> >> >> > > >>   1. close() - wait until all requests
> >>either
> >> >> are
> >> >> > >>>sent
> >> >> > >>> > >>or
> >> >> > >>> > >> >> >>reach
> >> >> > >>> > >> >> >> >> > request
> >> >> > >>> > >> >> >> >> > > >> timeout.
> >> >> > >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) -
> >>close
> >> >> > >>> > >>immediately
> >> >> > >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) -
> >> >> equivalent
> >> >> > >>>to
> >> >> > >>> > >> >>close(),
> >> >> > >>> > >> >> >> >>i.e.
> >> >> > >>> > >> >> >> >> > Wait
> >> >> > >>> > >> >> >> >> > > >> until all requests are sent or reach
> >>request
> >> >> > >>>timeout
> >> >> > >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try
> >> >>the
> >> >> > >>>best to
> >> >> > >>> > >> >>finish
> >> >> > >>> > >> >> >> >> sending
> >> >> > >>> > >> >> >> >> > > >>in 5
> >> >> > >>> > >> >> >> >> > > >> milliseconds, if something went wrong,
> >>just
> >> >> > >>>shutdown
> >> >> > >>> the
> >> >> > >>> > >> >> >>producer
> >> >> > >>> > >> >> >> >> > > >>anyway,
> >> >> > >>> > >> >> >> >> > > >> my callback will handle the failures.
> >> >> > >>> > >> >> >> >> > > >>
> >> >> > >>> > >> >> >> >> > > >> About how we define what timeout value
> >>stands
> >> >> for,
> >> >> > >>>I
> >> >> > >>> > >> >>actually
> >> >> > >>> > >> >> >> >> > struggled
> >> >> > >>> > >> >> >> >> > > >>a
> >> >> > >>> > >> >> >> >> > > >> little bit when wrote the patch.
> >>Intuitively,
> >> >> > >>>close(0)
> >> >> > >>> > >> >>should
> >> >> > >>> > >> >> >> >>mean
> >> >> > >>> > >> >> >> >> > > >> immediately, however it seems that all the
> >> >> > >>>existing
> >> >> > >>> java
> >> >> > >>> > >> >>class
> >> >> > >>> > >> >> >> >>have
> >> >> > >>> > >> >> >> >> > this
> >> >> > >>> > >> >> >> >> > > >> convention of timeout=0 means no timeout
> >>or
> >> >> never
> >> >> > >>> > >>timeout
> >> >> > >>> > >> >> >> >> > > >>(Thread.join(0),
> >> >> > >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma
> >>is
> >> >> > >>>either we
> >> >> > >>> > >> >>follow
> >> >> > >>> > >> >> >>the
> >> >> > >>> > >> >> >> >> > > >> intuition or we follow the convention.
> >>What I
> >> >> > >>>chose is
> >> >> > >>> > >>to
> >> >> > >>> > >> >> >>follow
> >> >> > >>> > >> >> >> >>the
> >> >> > >>> > >> >> >> >> > > >> convention but document the interface to
> >>let
> >> >> user
> >> >> > >>>be
> >> >> > >>> > >>aware
> >> >> > >>> > >> >>of
> >> >> > >>> > >> >> >>the
> >> >> > >>> > >> >> >> >> > usage.
> >> >> > >>> > >> >> >> >> > > >> The reason is that I think
> >>producer.close()
> >> >>is a
> >> >> > >>> public
> >> >> > >>> > >> >> >> >>interface so
> >> >> > >>> > >> >> >> >> > it
> >> >> > >>> > >> >> >> >> > > >> might be better to follow java convention.
> >> >> Whereas
> >> >> > >>> > >> >>selector is
> >> >> > >>> > >> >> >> >>not a
> >> >> > >>> > >> >> >> >> > > >> public interface that used by user, so as
> >> >>long
> >> >> as
> >> >> > >>>it
> >> >> > >>> > >>makes
> >> >> > >>> > >> >> >>sense
> >> >> > >>> > >> >> >> >>to
> >> >> > >>> > >> >> >> >> > us,
> >> >> > >>> > >> >> >> >> > > >>it
> >> >> > >>> > >> >> >> >> > > >> is less a problem to be different from
> >>java
> >> >> > >>> convention.
> >> >> > >>> > >> >>That
> >> >> > >>> > >> >> >>said
> >> >> > >>> > >> >> >> >> > since
> >> >> > >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public
> >> >> interface,
> >> >> > >>>I
> >> >> > >>> > >>think
> >> >> > >>> > >> >>it
> >> >> > >>> > >> >> >> >>also
> >> >> > >>> > >> >> >> >> > makes
> >> >> > >>> > >> >> >> >> > > >> sense to make producer.close() to have the
> >> >>same
> >> >> > >>> > >>definition
> >> >> > >>> > >> >>of
> >> >> > >>> > >> >> >> >> > > >> consumer.poll(timeout).
> >> >> > >>> > >> >> >> >> > > >>
> >> >> > >>> > >> >> >> >> > > >> The main argument for keeping a timeout in
> >> >>close
> >> >> > >>>would
> >> >> > >>> > >>be
> >> >> > >>> > >> >> >> >>separating
> >> >> > >>> > >> >> >> >> > the
> >> >> > >>> > >> >> >> >> > > >> close timeout from request timeout, which
> >> >> probably
> >> >> > >>> makes
> >> >> > >>> > >> >> >>sense. I
> >> >> > >>> > >> >> >> >> > would
> >> >> > >>> > >> >> >> >> > > >> guess typically the request timeout would
> >>be
> >> >> long
> >> >> > >>> (e.g.
> >> >> > >>> > >>60
> >> >> > >>> > >> >> >> >>seconds)
> >> >> > >>> > >> >> >> >> > > >> because we might want to consider retries
> >> >>with
> >> >> > >>>back
> >> >> > >>> off
> >> >> > >>> > >> >>time.
> >> >> > >>> > >> >> >>If
> >> >> > >>> > >> >> >> >>we
> >> >> > >>> > >> >> >> >> > have
> >> >> > >>> > >> >> >> >> > > >> multiple batches in accumulator, in worst
> >> >>case
> >> >> > >>>that
> >> >> > >>> > >>could
> >> >> > >>> > >> >>take
> >> >> > >>> > >> >> >> >>up to
> >> >> > >>> > >> >> >> >> > > >> several minutes to complete all the
> >>requests.
> >> >> But
> >> >> > >>>when
> >> >> > >>> > >>we
> >> >> > >>> > >> >> >>close a
> >> >> > >>> > >> >> >> >> > > >> producer, we might not want to wait for
> >>that
> >> >> long
> >> >> > >>>as
> >> >> > >>> it
> >> >> > >>> > >> >>might
> >> >> > >>> > >> >> >> >>cause
> >> >> > >>> > >> >> >> >> > some
> >> >> > >>> > >> >> >> >> > > >> other problem like deployment tool
> >>timeout.
> >> >> > >>> > >> >> >> >> > > >>
> >> >> > >>> > >> >> >> >> > > >> There is also a subtle difference between
> >> >> > >>> close(timeout)
> >> >> > >>> > >> >>and
> >> >> > >>> > >> >> >> >> > > >> flush(timeout). The only purpose for
> >>flush()
> >> >>is
> >> >> to
> >> >> > >>> write
> >> >> > >>> > >> >>data
> >> >> > >>> > >> >> >>to
> >> >> > >>> > >> >> >> >>the
> >> >> > >>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait
> >> >>until
> >> >> > >>> request
> >> >> > >>> > >> >> >>timeout.
> >> >> > >>> > >> >> >> >>I
> >> >> > >>> > >> >> >> >> > think
> >> >> > >>> > >> >> >> >> > > >> that is why flush(timeout) looks strange.
> >>On
> >> >>the
> >> >> > >>>other
> >> >> > >>> > >> >>hand,
> >> >> > >>> > >> >> >>the
> >> >> > >>> > >> >> >> >>top
> >> >> > >>> > >> >> >> >> > > >> priority for close() is to close the
> >>producer
> >> >> > >>>rather
> >> >> > >>> > >>than
> >> >> > >>> > >> >> >>flush()
> >> >> > >>> > >> >> >> >> > data,
> >> >> > >>> > >> >> >> >> > > >>so
> >> >> > >>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded
> >> >> waiting
> >> >> > >>>for
> >> >> > >>> > >>its
> >> >> > >>> > >> >>main
> >> >> > >>> > >> >> >> >>job.
> >> >> > >>> > >> >> >> >> > > >>
> >> >> > >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose
> >> >>flag.
> >> >> It
> >> >> > >>>is
> >> >> > >>> > >>not a
> >> >> > >>> > >> >> >>public
> >> >> > >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed
> >>Changes
> >> >> > >>>section
> >> >> > >>> > >> >>which I
> >> >> > >>> > >> >> >> >> thought
> >> >> > >>> > >> >> >> >> > > >>was
> >> >> > >>> > >> >> >> >> > > >> supposed to provide implementation
> >>details.
> >> >> > >>> > >> >> >> >> > > >>
> >> >> > >>> > >> >> >> >> > > >> Thanks again for all the comments and
> >> >> suggestions!
> >> >> > >>> > >> >> >> >> > > >>
> >> >> > >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
> >> >> > >>> > >> >> >> >> > > >>
> >> >> > >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
> >> >> > >>> jqin@linkedin.com <javascript:;>>
> >> >> > >>> > >> >> wrote:
> >> >> > >>> > >> >> >> >> > > >>
> >> >> > >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s
> >> >> comments.
> >> >> > >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process
> >>if
> >> >>no
> >> >> > >>>further
> >> >> > >>> > >> >> >>comments
> >> >> > >>> > >> >> >> >>are
> >> >> > >>> > >> >> >> >> > > >> >received by tomorrow.
> >> >> > >>> > >> >> >> >> > > >> >
> >> >> > >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
> >> >> > >>> > >> >> >> >> > > >> >
> >> >> > >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
> >> >> > >>><jay.kreps@gmail.com <javascript:;>
> >> >> > >>> >
> >> >> > >>> > >> >>wrote:
> >> >> > >>> > >> >> >> >> > > >> >
> >> >> > >>> > >> >> >> >> > > >> >>Hey Jiangjie,
> >> >> > >>> > >> >> >> >> > > >> >>
> >> >> > >>> > >> >> >> >> > > >> >>Can you capture the full motivation and
> >>use
> >> >> > >>>cases
> >> >> > >>> for
> >> >> > >>> > >>the
> >> >> > >>> > >> >> >> >>feature?
> >> >> > >>> > >> >> >> >> > > >>This
> >> >> > >>> > >> >> >> >> > > >> >>mentions your interest in having a way
> >>of
> >> >> > >>>aborting
> >> >> > >>> > >>from
> >> >> > >>> > >> >> >>inside
> >> >> > >>> > >> >> >> >>the
> >> >> > >>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain
> >> >>that
> >> >> > >>>usage
> >> >> > >>> or
> >> >> > >>> > >>why
> >> >> > >>> > >> >> >>other
> >> >> > >>> > >> >> >> >> > people
> >> >> > >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't
> >>list
> >> >> the
> >> >> > >>> > >>primary
> >> >> > >>> > >> >>use
> >> >> > >>> > >> >> >> >>case
> >> >> > >>> > >> >> >> >> for
> >> >> > >>> > >> >> >> >> > > >> >>having
> >> >> > >>> > >> >> >> >> > > >> >>close with a bounded timeout which was
> >>to
> >> >> avoid
> >> >> > >>> > >>blocking
> >> >> > >>> > >> >>too
> >> >> > >>> > >> >> >> >>long
> >> >> > >>> > >> >> >> >> on
> >> >> > >>> > >> >> >> >> > > >> >>shutdown.
> >> >> > >>> > >> >> >> >> > > >> >>
> >> >> > >>> > >> >> >> >> > > >> >>-Jay
> >> >> > >>> > >> >> >> >> > > >> >>
> >> >> > >>> > >> >> >> >> > > >> >>
> >> >> > >>> > >> >> >> >> > > >> >>
> >> >> > >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM,
> >>Jiangjie
> >> >>Qin
> >> >> > >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
> >> >> > >>> > >> >> >> >> > > >> >
> >> >> > >>> > >> >> >> >> > > >> >>wrote:
> >> >> > >>> > >> >> >> >> > > >> >>
> >> >> > >>> > >> >> >> >> > > >> >>> Hi,
> >> >> > >>> > >> >> >> >> > > >> >>>
> >> >> > >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
> >> >> > >>>close(timeout)
> >> >> > >>> to
> >> >> > >>> > >>new
> >> >> > >>> > >> >> >> >> producer.
> >> >> > >>> > >> >> >> >> > > >>Most
> >> >> > >>> > >> >> >> >> > > >> >>>of
> >> >> > >>> > >> >> >> >> > > >> >>> the previous discussions are in
> >> >>KAFKA-1660
> >> >> > >>>where
> >> >> > >>> > >>Parth
> >> >> > >>> > >> >> >> >> Brahmbhatt
> >> >> > >>> > >> >> >> >> > > >>has
> >> >> > >>> > >> >> >> >> > > >> >>> already done a lot of work.
> >> >> > >>> > >> >> >> >> > > >> >>> Since this is an interface change so
> >>we
> >> >>are
> >> >> > >>>going
> >> >> > >>> > >> >>through
> >> >> > >>> > >> >> >>the
> >> >> > >>> > >> >> >> >> KIP
> >> >> > >>> > >> >> >> >> > > >> >>>process.
> >> >> > >>> > >> >> >> >> > > >> >>> Here is the KIP link:
> >> >> > >>> > >> >> >> >> > > >> >>>
> >> >> > >>> > >> >> >> >> > > >> >>>
> >> >> > >>> > >> >> >> >> > > >>
> >> >> > >>> > >> >> >> >> > >
> >> >> > >>> > >> >> >> >>
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >>
> >> >> > >>> > >>
> >> >> > >>> >
> >> >> > >>>
> >> >> >
> >>
> >>>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=537397
> >>>>8
> >> >> > >>> > >> >> >> >> > > >> >>>2
> >> >> > >>> > >> >> >> >> > > >> >>>
> >> >> > >>> > >> >> >> >> > > >> >>> Thanks.
> >> >> > >>> > >> >> >> >> > > >> >>>
> >> >> > >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> >> >> > >>> > >> >> >> >> > > >> >>>
> >> >> > >>> > >> >> >> >> > > >> >
> >> >> > >>> > >> >> >> >> > > >>
> >> >> > >>> > >> >> >> >> > > >>
> >> >> > >>> > >> >> >> >> > >
> >> >> > >>> > >> >> >> >> > >
> >> >> > >>> > >> >> >> >> >
> >> >> > >>> > >> >> >> >>
> >> >> > >>> > >> >> >> >>
> >> >> > >>> > >> >> >> >>
> >> >> > >>> > >> >> >> >> --
> >> >> > >>> > >> >> >> >> Thanks,
> >> >> > >>> > >> >> >> >> Neha
> >> >> > >>> > >> >> >> >>
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> >
> >> >> > >>> > >> >> >> >--
> >> >> > >>> > >> >> >> >-- Guozhang
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >>
> >> >> > >>> > >> >> >
> >> >> > >>> > >> >> >
> >> >> > >>> > >> >> >--
> >> >> > >>> > >> >> >-- Guozhang
> >> >> > >>> > >> >>
> >> >> > >>> > >> >>
> >> >> > >>> > >> >
> >> >> > >>> > >> >
> >> >> > >>> > >> >--
> >> >> > >>> > >> >-- Guozhang
> >> >> > >>> > >>
> >> >> > >>> > >>
> >> >> > >>> >
> >> >> > >>> >
> >> >> > >>>
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> > >>--
> >> >> > >>-- Guozhang
> >> >> > >
> >> >> >
> >> >> >
> >> >>
> >> >
> >> >
> >> >--
> >> >Sent from Gmail Mobile
> >>
> >>
>
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

That’s a very good point.
Currently if timeout > 0, it will wait up to timeout for the sender thread
to complete. After that it will do a force close but not block.

Maybe we should make it to be:
If timeout > 0, wait up to timeout for the sender thread to complete.
After that, make a force close but still try to join the sender thread.
If timeout = 0, it is non-blocking.

Jiangjie (Becket) Qin

On 3/20/15, 9:16 AM, "Jun Rao" <ju...@confluent.io> wrote:

>Got it. Just to clarify, does close(timeout) always wait for the sender
>thread to complete?
>
>Thanks,
>
>Jun
>
>On Fri, Mar 20, 2015 at 7:41 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
>wrote:
>
>> I think the we agreed that we are going to log an error and block. By
>> doing this we can make sure the error log to be checked by user in all
>> cases.
>> If we silently replace close() to close(0) in sender thread, in some
>>cases
>> such as send error during a normal close(), user might not notice
>> something went wrong.
>>
>> On 3/19/15, 10:34 PM, "Joel Koshy" <jj...@gmail.com> wrote:
>>
>> >close can probably just getCurrentThread and check if == senderThread.
>> >I'm actually not sure from this thread if there was clear agreement on
>> >whether it should change close(timeout)/close() to close(0) or if it
>> >should
>> >log an error and block up to the timeout.
>> >
>> >On Thursday, March 19, 2015, Jun Rao <ju...@confluent.io> wrote:
>> >
>> >> So in (1), if a close() or close(timeout) is called from a callback,
>>we
>> >> will just turn that into a close(0)? Implementation wise, how do we
>>know
>> >> whether a close() call is made from the sender thread or not?
>> >>
>> >> Thanks,
>> >>
>> >> Jun
>> >>
>> >> On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin
>> >><jq...@linkedin.com.invalid>
>> >> wrote:
>> >>
>> >> > It looks we have another option and are now deciding between the
>> >> following
>> >> > two interfaces:
>> >> >
>> >> > 1. Close() + close(timeout)
>> >> >   - timeout could be either positive or zero.
>> >> >   - only close(0) can be called from sender thread
>> >> >
>> >> > 2. Close() + abort() + close(timeout)
>> >> >   - timeout can either be positive or zero
>> >> >   - only abort() can be called from sender thread
>> >> >
>> >> >   - abort() is equivalent to close(0) in 1) but does not join
>>sender
>> >> > thread and does not close metrics.
>> >> >   - Another thread has to call close() or close(timeout) in order
>>to
>> >>make
>> >> > sure the resources in producer are gone.
>> >> >
>> >> > The tow approach provides the same function we need, the
>>difference is
>> >> > approach 2) follows convention of close() and abort(). On the other
>> >>hand,
>> >> > approach 1) saves one interface compared with approach 2) but does
>>not
>> >> > follow the convention.
>> >> >
>> >> > When the two approaches come to user code, it is probably something
>> >>like
>> >> > this:
>> >> >
>> >> > Try {
>> >> >   While(!finished)
>> >> >     Producer.send(record, callback)
>> >> > } catch (Exception e) {
>> >> >   Producer.close(5)
>> >> > }
>> >> >
>> >> > Class CallbackImpl implements Callback {
>> >> >   onCompletion(RecordMetadata metadata Exception e) {
>> >> >     If (e != null)
>> >> >       Abort() / close()
>> >> >   }
>> >> > }
>> >> >
>> >> > Because the two approach leads to almost the same user code,
>>assuming
>> >> > users are always calling producer.close() as a clean up step,
>> >>personally
>> >> I
>> >> > prefer approach 2) as it follows convention.
>> >> >
>> >> > Any thoughts?
>> >> >
>> >> > Jiangjie (Becket) Qin
>> >> >
>> >> >
>> >> > On 3/17/15, 10:25 AM, "Jiangjie Qin" <jqin@linkedin.com
>> >><javascript:;>>
>> >> wrote:
>> >> >
>> >> > >Hi Jun,
>> >> > >
>> >> > >Yes, as Guozhang said, the main reason we set a flag is because
>> >>close(0)
>> >> > >is expected to be called by sender thread itself.
>> >> > >If we want to maintain the semantic meaning of close(), one
>> >>alternative
>> >> is
>> >> > >to have an abort() method does the same thing as close(0) except
>> >> cleanup.
>> >> > >And in close(timeout), after timeout we call abort() and join the
>> >>sender
>> >> > >thread. This was one of the previous proposal. We merged abort to
>> >> close(0)
>> >> > >because they are almost doing the same thing. But from what you
>> >> mentioned,
>> >> > >it might make sense to have two separate methods.
>> >> > >
>> >> > >Thanks.
>> >> > >
>> >> > >Jiangjie (Becket) Qin
>> >> > >
>> >> > >On 3/16/15, 10:31 PM, "Guozhang Wang" <wangguoz@gmail.com
>> >> <javascript:;>> wrote:
>> >> > >
>> >> > >>Yeah in this sense the sender thread will not exist immediately
>>in
>> >>the
>> >> > >>close(0) call, but will only terminate after the current response
>> >>batch
>> >> > >>has
>> >> > >>been processed, as will the producer instance itself.
>> >> > >>
>> >> > >>There is a reason for this though: for a clean shutdown the
>>caller
>> >> thread
>> >> > >>has to wait for the sender thread to join before closing the
>> >>producer
>> >> > >>instance, but this cannot be achieve if close(0) is called by the
>> >> sender
>> >> > >>thread itself (for example in KAFKA-1659, there is a proposal
>>from
>> >> Andrew
>> >> > >>Stein on using thread.interrupt and thread.stop, but if it is
>> >>called by
>> >> > >>the
>> >> > >>ioThread itself the stop call will fail). Hence we came up with
>>the
>> >> flag
>> >> > >>approach to let the sender thread to close as soon as it is at
>>the
>> >> > >>barrier
>> >> > >>of the run loop.
>> >> > >>
>> >> > >>Guozhang
>> >> > >>
>> >> > >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <jun@confluent.io
>> >> <javascript:;>> wrote:
>> >> > >>
>> >> > >>> Hmm, does that mean that after close(0), the sender thread is
>>not
>> >> > >>>necessary
>> >> > >>> gone? Normally, after closing an entity, we expect all internal
>> >> threads
>> >> > >>> associated with the entity are shut down completely.
>> >> > >>>
>> >> > >>> Thanks,
>> >> > >>>
>> >> > >>> Jun
>> >> > >>>
>> >> > >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
>> >> > >>><jq...@linkedin.com.invalid>
>> >> > >>> wrote:
>> >> > >>>
>> >> > >>> > Hi Jun,
>> >> > >>> >
>> >> > >>> > Close(0) will set two flags in sender. Running=false and a
>>newly
>> >> > >>>added
>> >> > >>> > forceClose=true. It will also set accumulator.closed=true so
>>no
>> >> > >>>further
>> >> > >>> > producer.send() will succeed.
>> >> > >>> > The sender thread will finish executing all the callbacks in
>> >> current
>> >> > >>> batch
>> >> > >>> > of responses, then it will see the forceClose flag. It will
>>just
>> >> fail
>> >> > >>>all
>> >> > >>> > the incomplete batches in the producer and exit.
>> >> > >>> > So close(0) is a non-blocking call and sender thread will not
>> >>try
>> >> to
>> >> > >>>join
>> >> > >>> > itself in close(0).
>> >> > >>> >
>> >> > >>> > Thanks.
>> >> > >>> >
>> >> > >>> > Jiangjie (Becket) Qin
>> >> > >>> >
>> >> > >>> > On 3/16/15, 2:50 PM, "Jun Rao" <jun@confluent.io
>> <javascript:;>>
>> >> wrote:
>> >> > >>> >
>> >> > >>> > >How does close(0) work if it's called from the sender
>>thread?
>> >>If
>> >> > >>> close(0)
>> >> > >>> > >needs to wait for the sender thread to join, wouldn't this
>> >>cause a
>> >> > >>> > >deadlock?
>> >> > >>> > >
>> >> > >>> > >Thanks,
>> >> > >>> > >
>> >> > >>> > >Jun
>> >> > >>> > >
>> >> > >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
>> >> > >>><jqin@linkedin.com.invalid
>> >> > >>> >
>> >> > >>> > >wrote:
>> >> > >>> > >
>> >> > >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered
>> >>without
>> >> > >>> > >> discussing with you :)
>> >> > >>> > >>
>> >> > >>> > >> Jiangjie (Becket) Qin
>> >> > >>> > >>
>> >> > >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wangguoz@gmail.com
>> >> <javascript:;>> wrote:
>> >> > >>> > >>
>> >> > >>> > >> >Thanks Jiangjie,
>> >> > >>> > >> >
>> >> > >>> > >> >After talking to you offline on this, I have been
>>convinced
>> >>and
>> >> > >>> > >>changed my
>> >> > >>> > >> >preference to blocking. The immediate shutdown approach
>>does
>> >> have
>> >> > >>> some
>> >> > >>> > >> >unsafeness in some cases.
>> >> > >>> > >> >
>> >> > >>> > >> >Guozhang
>> >> > >>> > >> >
>> >> > >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
>> >> > >>> > >><jqin@linkedin.com.invalid
>> >> > >>> > >> >
>> >> > >>> > >> >wrote:
>> >> > >>> > >> >
>> >> > >>> > >> >> It looks that the problem we want to solve and the
>> >>purpose we
>> >> > >>>want
>> >> > >>> to
>> >> > >>> > >> >> achieve is:
>> >> > >>> > >> >> If user uses close() in callback, we want to let user
>>be
>> >> aware
>> >> > >>>that
>> >> > >>> > >>they
>> >> > >>> > >> >> should use close(0) instead of close() in the callback.
>> >> > >>> > >> >>
>> >> > >>> > >> >> We have agreed that we will have an error log to inform
>> >>user
>> >> > >>>about
>> >> > >>> > >>this
>> >> > >>> > >> >> mis-usage. The options differ in the way how we can
>>force
>> >> user
>> >> > >>>to
>> >> > >>> > >>take a
>> >> > >>> > >> >> look at that error log.
>> >> > >>> > >> >> There are two scenarios:
>> >> > >>> > >> >> 1. User does not expect the program to exit.
>> >> > >>> > >> >> 2. User expect the program to exit.
>> >> > >>> > >> >>
>> >> > >>> > >> >> For scenario 1), blocking will probably delay the
>> >>discovery
>> >> of
>> >> > >>>the
>> >> > >>> > >> >> problem. Calling close(0) exposes the problem quicker.
>>In
>> >> this
>> >> > >>> > >>scenario
>> >> > >>> > >> >> producer just encounter a send failure when running
>> >>normally.
>> >> > >>> > >> >> For scenario 2), blocking will expose the problem
>>quick.
>> >> > >>>Calling
>> >> > >>> > >> >>close(-1)
>> >> > >>> > >> >> might hide the problem. This scenario might include: a)
>> >>Unit
>> >> > >>>test
>> >> > >>> > >>for a
>> >> > >>> > >> >> send failure. b) Message sending during a close() call
>> >>from a
>> >> > >>>user
>> >> > >>> > >> >>thread.
>> >> > >>> > >> >>
>> >> > >>> > >> >> So as a summary table:
>> >> > >>> > >> >>
>> >> > >>> > >> >>                   Scenario 1)
>> >>  Scenario
>> >> > >>>2)
>> >> > >>> > >> >>
>> >> > >>> > >> >> Blocking      Delay problem discovery
>>Guaranteed
>> >> > >>>problem
>> >> > >>> > >> >>discovery
>> >> > >>> > >> >>
>> >> > >>> > >> >> Close(-1)     Immediate problem discovery     Problem
>> >>might
>> >> be
>> >> > >>> hidden
>> >> > >>> > >> >>
>> >> > >>> > >> >>
>> >> > >>> > >> >> Personally I prefer blocking because it seems providing
>> >>more
>> >> > >>> > >>guarantees
>> >> > >>> > >> >> and safer.
>> >> > >>> > >> >>
>> >> > >>> > >> >> Thanks.
>> >> > >>> > >> >>
>> >> > >>> > >> >> Jiangjie (Becket) Qin
>> >> > >>> > >> >>
>> >> > >>> > >> >>
>> >> > >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang"
>><wangguoz@gmail.com
>> >> <javascript:;>>
>> >> > >>>wrote:
>> >> > >>> > >> >>
>> >> > >>> > >> >> >HI Jiangjie,
>> >> > >>> > >> >> >
>> >> > >>> > >> >> >As far as I understand calling close() in the
>>ioThread is
>> >> not
>> >> > >>> > >>common,
>> >> > >>> > >> >>as
>> >> > >>> > >> >> >it
>> >> > >>> > >> >> >may only trigger when we saw some non-retriable error.
>> >>Hence
>> >> > >>>when
>> >> > >>> > >>user
>> >> > >>> > >> >>run
>> >> > >>> > >> >> >their program it is unlikely that close() will be
>> >>triggered
>> >> > >>>and
>> >> > >>> > >>problem
>> >> > >>> > >> >> >will be detected. So it seems to me that from the
>>error
>> >> > >>>detection
>> >> > >>> > >> >>aspect
>> >> > >>> > >> >> >these two options seems to be the same as people will
>> >> usually
>> >> > >>> > >>detect it
>> >> > >>> > >> >> >from the producer metrics all dropping to 0.
>> >> > >>> > >> >> >
>> >> > >>> > >> >> >Guozhang
>> >> > >>> > >> >> >
>> >> > >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
>> >> > >>> > >> >><jq...@linkedin.com.invalid>
>> >> > >>> > >> >> >wrote:
>> >> > >>> > >> >> >
>> >> > >>> > >> >> >> It seems there are two options we can choose from
>>when
>> >> > >>>close()
>> >> > >>> is
>> >> > >>> > >> >>called
>> >> > >>> > >> >> >> from sender thread (callback):
>> >> > >>> > >> >> >> 1. Log an error and close the producer using
>>close(-1)
>> >> > >>> > >> >> >> 2. Log an error and block.
>> >> > >>> > >> >> >> (Throwing an exception will not work because we
>>catch
>> >>all
>> >> > >>>the
>> >> > >>> > >> >>exception
>> >> > >>> > >> >> >> thrown from user callback. It will just lead to an
>> >>error
>> >> > >>>log.)
>> >> > >>> > >> >> >>
>> >> > >>> > >> >> >> My concern for the first option is that the producer
>> >>will
>> >> be
>> >> > >>> > >>closed
>> >> > >>> > >> >>even
>> >> > >>> > >> >> >> if we logged and error. I am wondering if some user
>> >>would
>> >> > >>>not
>> >> > >>> even
>> >> > >>> > >> >>take
>> >> > >>> > >> >> >>a
>> >> > >>> > >> >> >> look at the log if producer is closed normally.
>>Because
>> >> from
>> >> > >>>the
>> >> > >>> > >> >> >>programs
>> >> > >>> > >> >> >> behavior, everything looks good. If that is the
>>case,
>> >>the
>> >> > >>>error
>> >> > >>> > >> >>message
>> >> > >>> > >> >> >>we
>> >> > >>> > >> >> >> logged probably will just be ignored until some day
>> >>when
>> >> > >>>people
>> >> > >>> > >>check
>> >> > >>> > >> >> >>the
>> >> > >>> > >> >> >> log and see it.
>> >> > >>> > >> >> >>
>> >> > >>> > >> >> >> As for the second option, because producer does not
>> >>close
>> >> > >>>but
>> >> > >>> > >>blocks.
>> >> > >>> > >> >> >>User
>> >> > >>> > >> >> >> will notice this the first time they run the
>>program.
>> >>They
>> >> > >>> > >>probably
>> >> > >>> > >> >>will
>> >> > >>> > >> >> >> look at the log to see why producer could not be
>>closed
>> >> and
>> >> > >>>they
>> >> > >>> > >>will
>> >> > >>> > >> >> >>see
>> >> > >>> > >> >> >> the error log we put there. So they will get
>>informed
>> >> about
>> >> > >>>this
>> >> > >>> > >> >> >>mis-usage
>> >> > >>> > >> >> >> of close() in sender thread the first time they run
>>the
>> >> code
>> >> > >>> > >>instead
>> >> > >>> > >> >>of
>> >> > >>> > >> >> >> some time later.
>> >> > >>> > >> >> >>
>> >> > >>> > >> >> >> Personally I prefer the second one because it is
>>more
>> >> > >>>obvious
>> >> > >>> that
>> >> > >>> > >> >> >> something was wrong.
>> >> > >>> > >> >> >>
>> >> > >>> > >> >> >> Jiangjie (Becket) Qin
>> >> > >>> > >> >> >>
>> >> > >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang"
>> >><wangguoz@gmail.com
>> >> <javascript:;>>
>> >> > >>> wrote:
>> >> > >>> > >> >> >>
>> >> > >>> > >> >> >> >Yeah I agree we should not silently change the
>> >>behavior
>> >> of
>> >> > >>>the
>> >> > >>> > >> >>function
>> >> > >>> > >> >> >> >with the given parameters; and I would prefer
>> >> > >>> > >> >> >>error-logging-and-shutdown
>> >> > >>> > >> >> >> >over blocking when close(>0) is used, since as Neha
>> >> > >>>suggested
>> >> > >>> > >> >>blocking
>> >> > >>> > >> >> >> >would also not proceed with sending any data, bu
>>will
>> >> just
>> >> > >>>let
>> >> > >>> > >> >>users to
>> >> > >>> > >> >> >> >realize the issue later than sooner.
>> >> > >>> > >> >> >> >
>> >> > >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
>> >> > >>> > >><neha@confluent.io <javascript:;>>
>> >> > >>> > >> >> >>wrote:
>> >> > >>> > >> >> >> >
>> >> > >>> > >> >> >> >> >
>> >> > >>> > >> >> >> >> > And I also agree it is better if we can make
>> >>producer
>> >> > >>>block
>> >> > >>> > >>when
>> >> > >>> > >> >> >> >> > close() is called from sender thread so user
>>will
>> >> > >>>notice
>> >> > >>> > >> >>something
>> >> > >>> > >> >> >> >>went
>> >> > >>> > >> >> >> >> > wrong.
>> >> > >>> > >> >> >> >>
>> >> > >>> > >> >> >> >>
>> >> > >>> > >> >> >> >> This isn't a great experience either. Why can't
>>we
>> >>just
>> >> > >>>throw
>> >> > >>> > >>an
>> >> > >>> > >> >> >> >>exception
>> >> > >>> > >> >> >> >> for a behavior we know is incorrect and we'd like
>> >>the
>> >> > >>>user to
>> >> > >>> > >> >>know.
>> >> > >>> > >> >> >> >> Blocking as a means of doing that seems wrong and
>> >> > >>>annoying.
>> >> > >>> > >> >> >> >>
>> >> > >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
>> >> > >>> > >><jay.kreps@gmail.com <javascript:;>>
>> >> > >>> > >> >> >> wrote:
>> >> > >>> > >> >> >> >>
>> >> > >>> > >> >> >> >> > Cool.
>> >> > >>> > >> >> >> >> >
>> >> > >>> > >> >> >> >> > I think blocking is good or alternately
>>throwing
>> >>an
>> >> > >>> exception
>> >> > >>> > >> >> >>directly
>> >> > >>> > >> >> >> >> from
>> >> > >>> > >> >> >> >> > close(). Basically I would just worry about
>>subtly
>> >> > >>>doing
>> >> > >>> > >> >>something
>> >> > >>> > >> >> >> >> slightly
>> >> > >>> > >> >> >> >> > different from what the user asked for as it
>>will
>> >>be
>> >> > >>>hard
>> >> > >>> to
>> >> > >>> > >> >>notice
>> >> > >>> > >> >> >> >>that
>> >> > >>> > >> >> >> >> > behavior difference.
>> >> > >>> > >> >> >> >> >
>> >> > >>> > >> >> >> >> > -Jay
>> >> > >>> > >> >> >> >> >
>> >> > >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
>> >> > >>> > >> >> >> >><jqin@linkedin.com.invalid
>> >> > >>> > >> >> >> >> >
>> >> > >>> > >> >> >> >> > wrote:
>> >> > >>> > >> >> >> >> >
>> >> > >>> > >> >> >> >> > > Hi Jay,
>> >> > >>> > >> >> >> >> > >
>> >> > >>> > >> >> >> >> > > I have modified the KIP as you suggested. I
>> >>thinks
>> >> as
>> >> > >>> long
>> >> > >>> > >>as
>> >> > >>> > >> >>we
>> >> > >>> > >> >> >> >>have
>> >> > >>> > >> >> >> >> > > consistent define for timeout across Kafka
>> >> interface,
>> >> > >>> there
>> >> > >>> > >> >>would
>> >> > >>> > >> >> >> >>be no
>> >> > >>> > >> >> >> >> > > problem. And I also agree it is better if we
>>can
>> >> make
>> >> > >>> > >>producer
>> >> > >>> > >> >> >>block
>> >> > >>> > >> >> >> >> when
>> >> > >>> > >> >> >> >> > > close() is called from sender thread so user
>> >>will
>> >> > >>>notice
>> >> > >>> > >> >> >>something
>> >> > >>> > >> >> >> >>went
>> >> > >>> > >> >> >> >> > > wrong.
>> >> > >>> > >> >> >> >> > >
>> >> > >>> > >> >> >> >> > > Thanks.
>> >> > >>> > >> >> >> >> > >
>> >> > >>> > >> >> >> >> > > Jiangjie (Becket) Qin
>> >> > >>> > >> >> >> >> > >
>> >> > >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
>> >> > >>><jay.kreps@gmail.com <javascript:;>>
>> >> > >>> > >> wrote:
>> >> > >>> > >> >> >> >> > >
>> >> > >>> > >> >> >> >> > > >Hey Jiangjie,
>> >> > >>> > >> >> >> >> > > >
>> >> > >>> > >> >> >> >> > > >I think this is going to be very confusing
>>that
>> >> > >>> > >> >> >> >> > > >  close(0) waits indefinitely and
>> >> > >>> > >> >> >> >> > > >  close(-1) waits for 0.
>> >> > >>> > >> >> >> >> > > >I understand this appears in other apis,
>>but it
>> >> is a
>> >> > >>> > >>constant
>> >> > >>> > >> >> >> >>cause of
>> >> > >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
>> >> > >>> > >> >> >> >> > > >
>> >> > >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't
>>need a
>> >> way
>> >> > >>>to
>> >> > >>> > >>wait
>> >> > >>> > >> >> >> >> indefinitely
>> >> > >>> > >> >> >> >> > > >as
>> >> > >>> > >> >> >> >> > > >we already have close() so having a magical
>> >> constant
>> >> > >>>for
>> >> > >>> > >> >>that is
>> >> > >>> > >> >> >> >> > > >redundant.
>> >> > >>> > >> >> >> >> > > >
>> >> > >>> > >> >> >> >> > > >Calling close() from the I/O thread was
>>already
>> >> > >>>possible
>> >> > >>> > >>and
>> >> > >>> > >> >> >>would
>> >> > >>> > >> >> >> >> block
>> >> > >>> > >> >> >> >> > > >indefinitely. I think trying to silently
>>change
>> >> the
>> >> > >>> > >>behavior
>> >> > >>> > >> >>is
>> >> > >>> > >> >> >> >> probably
>> >> > >>> > >> >> >> >> > > >not right. I.e. if the user calls close() in
>> >>the
>> >> > >>> callback
>> >> > >>> > >> >>there
>> >> > >>> > >> >> >>is
>> >> > >>> > >> >> >> >> > > >actually
>> >> > >>> > >> >> >> >> > > >some misunderstanding and they need to think
>> >>more,
>> >> > >>> > >>silently
>> >> > >>> > >> >> >>making
>> >> > >>> > >> >> >> >> this
>> >> > >>> > >> >> >> >> > > >not
>> >> > >>> > >> >> >> >> > > >block will hide the problem from them which
>>is
>> >>the
>> >> > >>> > >>opposite
>> >> > >>> > >> >>of
>> >> > >>> > >> >> >> >>what we
>> >> > >>> > >> >> >> >> > > >want.
>> >> > >>> > >> >> >> >> > > >
>> >> > >>> > >> >> >> >> > > >-Jay
>> >> > >>> > >> >> >> >> > > >
>> >> > >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie
>>Qin
>> >> > >>> > >> >> >> >> <jqin@linkedin.com.invalid
>> >> > >>> > >> >> >> >> > >
>> >> > >>> > >> >> >> >> > > >wrote:
>> >> > >>> > >> >> >> >> > > >
>> >> > >>> > >> >> >> >> > > >> Hey Joe & Jay,
>> >> > >>> > >> >> >> >> > > >>
>> >> > >>> > >> >> >> >> > > >> Thanks for the comments on the voting
>>thread.
>> >> > >>>Since it
>> >> > >>> > >> >>seems
>> >> > >>> > >> >> >>we
>> >> > >>> > >> >> >> >> > probably
>> >> > >>> > >> >> >> >> > > >> will have more discussion on this, I am
>>just
>> >> > >>>replying
>> >> > >>> > >>from
>> >> > >>> > >> >>the
>> >> > >>> > >> >> >> >> > > >>discussion
>> >> > >>> > >> >> >> >> > > >> thread here.
>> >> > >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less
>> >>like
>> >> > >>> > >>half-baked,
>> >> > >>> > >> >> >> >>apologize
>> >> > >>> > >> >> >> >> > for
>> >> > >>> > >> >> >> >> > > >> the rush...
>> >> > >>> > >> >> >> >> > > >>
>> >> > >>> > >> >> >> >> > > >> The contract in current KIP is:
>> >> > >>> > >> >> >> >> > > >>   1. close() - wait until all requests
>>either
>> >> are
>> >> > >>>sent
>> >> > >>> > >>or
>> >> > >>> > >> >> >>reach
>> >> > >>> > >> >> >> >> > request
>> >> > >>> > >> >> >> >> > > >> timeout.
>> >> > >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) -
>>close
>> >> > >>> > >>immediately
>> >> > >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) -
>> >> equivalent
>> >> > >>>to
>> >> > >>> > >> >>close(),
>> >> > >>> > >> >> >> >>i.e.
>> >> > >>> > >> >> >> >> > Wait
>> >> > >>> > >> >> >> >> > > >> until all requests are sent or reach
>>request
>> >> > >>>timeout
>> >> > >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try
>> >>the
>> >> > >>>best to
>> >> > >>> > >> >>finish
>> >> > >>> > >> >> >> >> sending
>> >> > >>> > >> >> >> >> > > >>in 5
>> >> > >>> > >> >> >> >> > > >> milliseconds, if something went wrong,
>>just
>> >> > >>>shutdown
>> >> > >>> the
>> >> > >>> > >> >> >>producer
>> >> > >>> > >> >> >> >> > > >>anyway,
>> >> > >>> > >> >> >> >> > > >> my callback will handle the failures.
>> >> > >>> > >> >> >> >> > > >>
>> >> > >>> > >> >> >> >> > > >> About how we define what timeout value
>>stands
>> >> for,
>> >> > >>>I
>> >> > >>> > >> >>actually
>> >> > >>> > >> >> >> >> > struggled
>> >> > >>> > >> >> >> >> > > >>a
>> >> > >>> > >> >> >> >> > > >> little bit when wrote the patch.
>>Intuitively,
>> >> > >>>close(0)
>> >> > >>> > >> >>should
>> >> > >>> > >> >> >> >>mean
>> >> > >>> > >> >> >> >> > > >> immediately, however it seems that all the
>> >> > >>>existing
>> >> > >>> java
>> >> > >>> > >> >>class
>> >> > >>> > >> >> >> >>have
>> >> > >>> > >> >> >> >> > this
>> >> > >>> > >> >> >> >> > > >> convention of timeout=0 means no timeout
>>or
>> >> never
>> >> > >>> > >>timeout
>> >> > >>> > >> >> >> >> > > >>(Thread.join(0),
>> >> > >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma
>>is
>> >> > >>>either we
>> >> > >>> > >> >>follow
>> >> > >>> > >> >> >>the
>> >> > >>> > >> >> >> >> > > >> intuition or we follow the convention.
>>What I
>> >> > >>>chose is
>> >> > >>> > >>to
>> >> > >>> > >> >> >>follow
>> >> > >>> > >> >> >> >>the
>> >> > >>> > >> >> >> >> > > >> convention but document the interface to
>>let
>> >> user
>> >> > >>>be
>> >> > >>> > >>aware
>> >> > >>> > >> >>of
>> >> > >>> > >> >> >>the
>> >> > >>> > >> >> >> >> > usage.
>> >> > >>> > >> >> >> >> > > >> The reason is that I think
>>producer.close()
>> >>is a
>> >> > >>> public
>> >> > >>> > >> >> >> >>interface so
>> >> > >>> > >> >> >> >> > it
>> >> > >>> > >> >> >> >> > > >> might be better to follow java convention.
>> >> Whereas
>> >> > >>> > >> >>selector is
>> >> > >>> > >> >> >> >>not a
>> >> > >>> > >> >> >> >> > > >> public interface that used by user, so as
>> >>long
>> >> as
>> >> > >>>it
>> >> > >>> > >>makes
>> >> > >>> > >> >> >>sense
>> >> > >>> > >> >> >> >>to
>> >> > >>> > >> >> >> >> > us,
>> >> > >>> > >> >> >> >> > > >>it
>> >> > >>> > >> >> >> >> > > >> is less a problem to be different from
>>java
>> >> > >>> convention.
>> >> > >>> > >> >>That
>> >> > >>> > >> >> >>said
>> >> > >>> > >> >> >> >> > since
>> >> > >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public
>> >> interface,
>> >> > >>>I
>> >> > >>> > >>think
>> >> > >>> > >> >>it
>> >> > >>> > >> >> >> >>also
>> >> > >>> > >> >> >> >> > makes
>> >> > >>> > >> >> >> >> > > >> sense to make producer.close() to have the
>> >>same
>> >> > >>> > >>definition
>> >> > >>> > >> >>of
>> >> > >>> > >> >> >> >> > > >> consumer.poll(timeout).
>> >> > >>> > >> >> >> >> > > >>
>> >> > >>> > >> >> >> >> > > >> The main argument for keeping a timeout in
>> >>close
>> >> > >>>would
>> >> > >>> > >>be
>> >> > >>> > >> >> >> >>separating
>> >> > >>> > >> >> >> >> > the
>> >> > >>> > >> >> >> >> > > >> close timeout from request timeout, which
>> >> probably
>> >> > >>> makes
>> >> > >>> > >> >> >>sense. I
>> >> > >>> > >> >> >> >> > would
>> >> > >>> > >> >> >> >> > > >> guess typically the request timeout would
>>be
>> >> long
>> >> > >>> (e.g.
>> >> > >>> > >>60
>> >> > >>> > >> >> >> >>seconds)
>> >> > >>> > >> >> >> >> > > >> because we might want to consider retries
>> >>with
>> >> > >>>back
>> >> > >>> off
>> >> > >>> > >> >>time.
>> >> > >>> > >> >> >>If
>> >> > >>> > >> >> >> >>we
>> >> > >>> > >> >> >> >> > have
>> >> > >>> > >> >> >> >> > > >> multiple batches in accumulator, in worst
>> >>case
>> >> > >>>that
>> >> > >>> > >>could
>> >> > >>> > >> >>take
>> >> > >>> > >> >> >> >>up to
>> >> > >>> > >> >> >> >> > > >> several minutes to complete all the
>>requests.
>> >> But
>> >> > >>>when
>> >> > >>> > >>we
>> >> > >>> > >> >> >>close a
>> >> > >>> > >> >> >> >> > > >> producer, we might not want to wait for
>>that
>> >> long
>> >> > >>>as
>> >> > >>> it
>> >> > >>> > >> >>might
>> >> > >>> > >> >> >> >>cause
>> >> > >>> > >> >> >> >> > some
>> >> > >>> > >> >> >> >> > > >> other problem like deployment tool
>>timeout.
>> >> > >>> > >> >> >> >> > > >>
>> >> > >>> > >> >> >> >> > > >> There is also a subtle difference between
>> >> > >>> close(timeout)
>> >> > >>> > >> >>and
>> >> > >>> > >> >> >> >> > > >> flush(timeout). The only purpose for
>>flush()
>> >>is
>> >> to
>> >> > >>> write
>> >> > >>> > >> >>data
>> >> > >>> > >> >> >>to
>> >> > >>> > >> >> >> >>the
>> >> > >>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait
>> >>until
>> >> > >>> request
>> >> > >>> > >> >> >>timeout.
>> >> > >>> > >> >> >> >>I
>> >> > >>> > >> >> >> >> > think
>> >> > >>> > >> >> >> >> > > >> that is why flush(timeout) looks strange.
>>On
>> >>the
>> >> > >>>other
>> >> > >>> > >> >>hand,
>> >> > >>> > >> >> >>the
>> >> > >>> > >> >> >> >>top
>> >> > >>> > >> >> >> >> > > >> priority for close() is to close the
>>producer
>> >> > >>>rather
>> >> > >>> > >>than
>> >> > >>> > >> >> >>flush()
>> >> > >>> > >> >> >> >> > data,
>> >> > >>> > >> >> >> >> > > >>so
>> >> > >>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded
>> >> waiting
>> >> > >>>for
>> >> > >>> > >>its
>> >> > >>> > >> >>main
>> >> > >>> > >> >> >> >>job.
>> >> > >>> > >> >> >> >> > > >>
>> >> > >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose
>> >>flag.
>> >> It
>> >> > >>>is
>> >> > >>> > >>not a
>> >> > >>> > >> >> >>public
>> >> > >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed
>>Changes
>> >> > >>>section
>> >> > >>> > >> >>which I
>> >> > >>> > >> >> >> >> thought
>> >> > >>> > >> >> >> >> > > >>was
>> >> > >>> > >> >> >> >> > > >> supposed to provide implementation
>>details.
>> >> > >>> > >> >> >> >> > > >>
>> >> > >>> > >> >> >> >> > > >> Thanks again for all the comments and
>> >> suggestions!
>> >> > >>> > >> >> >> >> > > >>
>> >> > >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
>> >> > >>> > >> >> >> >> > > >>
>> >> > >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
>> >> > >>> jqin@linkedin.com <javascript:;>>
>> >> > >>> > >> >> wrote:
>> >> > >>> > >> >> >> >> > > >>
>> >> > >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s
>> >> comments.
>> >> > >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process
>>if
>> >>no
>> >> > >>>further
>> >> > >>> > >> >> >>comments
>> >> > >>> > >> >> >> >>are
>> >> > >>> > >> >> >> >> > > >> >received by tomorrow.
>> >> > >>> > >> >> >> >> > > >> >
>> >> > >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
>> >> > >>> > >> >> >> >> > > >> >
>> >> > >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
>> >> > >>><jay.kreps@gmail.com <javascript:;>
>> >> > >>> >
>> >> > >>> > >> >>wrote:
>> >> > >>> > >> >> >> >> > > >> >
>> >> > >>> > >> >> >> >> > > >> >>Hey Jiangjie,
>> >> > >>> > >> >> >> >> > > >> >>
>> >> > >>> > >> >> >> >> > > >> >>Can you capture the full motivation and
>>use
>> >> > >>>cases
>> >> > >>> for
>> >> > >>> > >>the
>> >> > >>> > >> >> >> >>feature?
>> >> > >>> > >> >> >> >> > > >>This
>> >> > >>> > >> >> >> >> > > >> >>mentions your interest in having a way
>>of
>> >> > >>>aborting
>> >> > >>> > >>from
>> >> > >>> > >> >> >>inside
>> >> > >>> > >> >> >> >>the
>> >> > >>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain
>> >>that
>> >> > >>>usage
>> >> > >>> or
>> >> > >>> > >>why
>> >> > >>> > >> >> >>other
>> >> > >>> > >> >> >> >> > people
>> >> > >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't
>>list
>> >> the
>> >> > >>> > >>primary
>> >> > >>> > >> >>use
>> >> > >>> > >> >> >> >>case
>> >> > >>> > >> >> >> >> for
>> >> > >>> > >> >> >> >> > > >> >>having
>> >> > >>> > >> >> >> >> > > >> >>close with a bounded timeout which was
>>to
>> >> avoid
>> >> > >>> > >>blocking
>> >> > >>> > >> >>too
>> >> > >>> > >> >> >> >>long
>> >> > >>> > >> >> >> >> on
>> >> > >>> > >> >> >> >> > > >> >>shutdown.
>> >> > >>> > >> >> >> >> > > >> >>
>> >> > >>> > >> >> >> >> > > >> >>-Jay
>> >> > >>> > >> >> >> >> > > >> >>
>> >> > >>> > >> >> >> >> > > >> >>
>> >> > >>> > >> >> >> >> > > >> >>
>> >> > >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM,
>>Jiangjie
>> >>Qin
>> >> > >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
>> >> > >>> > >> >> >> >> > > >> >
>> >> > >>> > >> >> >> >> > > >> >>wrote:
>> >> > >>> > >> >> >> >> > > >> >>
>> >> > >>> > >> >> >> >> > > >> >>> Hi,
>> >> > >>> > >> >> >> >> > > >> >>>
>> >> > >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
>> >> > >>>close(timeout)
>> >> > >>> to
>> >> > >>> > >>new
>> >> > >>> > >> >> >> >> producer.
>> >> > >>> > >> >> >> >> > > >>Most
>> >> > >>> > >> >> >> >> > > >> >>>of
>> >> > >>> > >> >> >> >> > > >> >>> the previous discussions are in
>> >>KAFKA-1660
>> >> > >>>where
>> >> > >>> > >>Parth
>> >> > >>> > >> >> >> >> Brahmbhatt
>> >> > >>> > >> >> >> >> > > >>has
>> >> > >>> > >> >> >> >> > > >> >>> already done a lot of work.
>> >> > >>> > >> >> >> >> > > >> >>> Since this is an interface change so
>>we
>> >>are
>> >> > >>>going
>> >> > >>> > >> >>through
>> >> > >>> > >> >> >>the
>> >> > >>> > >> >> >> >> KIP
>> >> > >>> > >> >> >> >> > > >> >>>process.
>> >> > >>> > >> >> >> >> > > >> >>> Here is the KIP link:
>> >> > >>> > >> >> >> >> > > >> >>>
>> >> > >>> > >> >> >> >> > > >> >>>
>> >> > >>> > >> >> >> >> > > >>
>> >> > >>> > >> >> >> >> > >
>> >> > >>> > >> >> >> >>
>> >> > >>> > >> >> >>
>> >> > >>> > >> >>
>> >> > >>> > >>
>> >> > >>> >
>> >> > >>>
>> >> >
>> 
>>>>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=537397
>>>>8
>> >> > >>> > >> >> >> >> > > >> >>>2
>> >> > >>> > >> >> >> >> > > >> >>>
>> >> > >>> > >> >> >> >> > > >> >>> Thanks.
>> >> > >>> > >> >> >> >> > > >> >>>
>> >> > >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
>> >> > >>> > >> >> >> >> > > >> >>>
>> >> > >>> > >> >> >> >> > > >> >
>> >> > >>> > >> >> >> >> > > >>
>> >> > >>> > >> >> >> >> > > >>
>> >> > >>> > >> >> >> >> > >
>> >> > >>> > >> >> >> >> > >
>> >> > >>> > >> >> >> >> >
>> >> > >>> > >> >> >> >>
>> >> > >>> > >> >> >> >>
>> >> > >>> > >> >> >> >>
>> >> > >>> > >> >> >> >> --
>> >> > >>> > >> >> >> >> Thanks,
>> >> > >>> > >> >> >> >> Neha
>> >> > >>> > >> >> >> >>
>> >> > >>> > >> >> >> >
>> >> > >>> > >> >> >> >
>> >> > >>> > >> >> >> >
>> >> > >>> > >> >> >> >--
>> >> > >>> > >> >> >> >-- Guozhang
>> >> > >>> > >> >> >>
>> >> > >>> > >> >> >>
>> >> > >>> > >> >> >
>> >> > >>> > >> >> >
>> >> > >>> > >> >> >--
>> >> > >>> > >> >> >-- Guozhang
>> >> > >>> > >> >>
>> >> > >>> > >> >>
>> >> > >>> > >> >
>> >> > >>> > >> >
>> >> > >>> > >> >--
>> >> > >>> > >> >-- Guozhang
>> >> > >>> > >>
>> >> > >>> > >>
>> >> > >>> >
>> >> > >>> >
>> >> > >>>
>> >> > >>
>> >> > >>
>> >> > >>
>> >> > >>--
>> >> > >>-- Guozhang
>> >> > >
>> >> >
>> >> >
>> >>
>> >
>> >
>> >--
>> >Sent from Gmail Mobile
>>
>>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jun Rao <ju...@confluent.io>.

Got it. Just to clarify, does close(timeout) always wait for the sender
thread to complete?

Thanks,

Jun

On Fri, Mar 20, 2015 at 7:41 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> I think the we agreed that we are going to log an error and block. By
> doing this we can make sure the error log to be checked by user in all
> cases.
> If we silently replace close() to close(0) in sender thread, in some cases
> such as send error during a normal close(), user might not notice
> something went wrong.
>
> On 3/19/15, 10:34 PM, "Joel Koshy" <jj...@gmail.com> wrote:
>
> >close can probably just getCurrentThread and check if == senderThread.
> >I'm actually not sure from this thread if there was clear agreement on
> >whether it should change close(timeout)/close() to close(0) or if it
> >should
> >log an error and block up to the timeout.
> >
> >On Thursday, March 19, 2015, Jun Rao <ju...@confluent.io> wrote:
> >
> >> So in (1), if a close() or close(timeout) is called from a callback, we
> >> will just turn that into a close(0)? Implementation wise, how do we know
> >> whether a close() call is made from the sender thread or not?
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin
> >><jq...@linkedin.com.invalid>
> >> wrote:
> >>
> >> > It looks we have another option and are now deciding between the
> >> following
> >> > two interfaces:
> >> >
> >> > 1. Close() + close(timeout)
> >> >   - timeout could be either positive or zero.
> >> >   - only close(0) can be called from sender thread
> >> >
> >> > 2. Close() + abort() + close(timeout)
> >> >   - timeout can either be positive or zero
> >> >   - only abort() can be called from sender thread
> >> >
> >> >   - abort() is equivalent to close(0) in 1) but does not join sender
> >> > thread and does not close metrics.
> >> >   - Another thread has to call close() or close(timeout) in order to
> >>make
> >> > sure the resources in producer are gone.
> >> >
> >> > The tow approach provides the same function we need, the difference is
> >> > approach 2) follows convention of close() and abort(). On the other
> >>hand,
> >> > approach 1) saves one interface compared with approach 2) but does not
> >> > follow the convention.
> >> >
> >> > When the two approaches come to user code, it is probably something
> >>like
> >> > this:
> >> >
> >> > Try {
> >> >   While(!finished)
> >> >     Producer.send(record, callback)
> >> > } catch (Exception e) {
> >> >   Producer.close(5)
> >> > }
> >> >
> >> > Class CallbackImpl implements Callback {
> >> >   onCompletion(RecordMetadata metadata Exception e) {
> >> >     If (e != null)
> >> >       Abort() / close()
> >> >   }
> >> > }
> >> >
> >> > Because the two approach leads to almost the same user code, assuming
> >> > users are always calling producer.close() as a clean up step,
> >>personally
> >> I
> >> > prefer approach 2) as it follows convention.
> >> >
> >> > Any thoughts?
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> >
> >> > On 3/17/15, 10:25 AM, "Jiangjie Qin" <jqin@linkedin.com
> >><javascript:;>>
> >> wrote:
> >> >
> >> > >Hi Jun,
> >> > >
> >> > >Yes, as Guozhang said, the main reason we set a flag is because
> >>close(0)
> >> > >is expected to be called by sender thread itself.
> >> > >If we want to maintain the semantic meaning of close(), one
> >>alternative
> >> is
> >> > >to have an abort() method does the same thing as close(0) except
> >> cleanup.
> >> > >And in close(timeout), after timeout we call abort() and join the
> >>sender
> >> > >thread. This was one of the previous proposal. We merged abort to
> >> close(0)
> >> > >because they are almost doing the same thing. But from what you
> >> mentioned,
> >> > >it might make sense to have two separate methods.
> >> > >
> >> > >Thanks.
> >> > >
> >> > >Jiangjie (Becket) Qin
> >> > >
> >> > >On 3/16/15, 10:31 PM, "Guozhang Wang" <wangguoz@gmail.com
> >> <javascript:;>> wrote:
> >> > >
> >> > >>Yeah in this sense the sender thread will not exist immediately in
> >>the
> >> > >>close(0) call, but will only terminate after the current response
> >>batch
> >> > >>has
> >> > >>been processed, as will the producer instance itself.
> >> > >>
> >> > >>There is a reason for this though: for a clean shutdown the caller
> >> thread
> >> > >>has to wait for the sender thread to join before closing the
> >>producer
> >> > >>instance, but this cannot be achieve if close(0) is called by the
> >> sender
> >> > >>thread itself (for example in KAFKA-1659, there is a proposal from
> >> Andrew
> >> > >>Stein on using thread.interrupt and thread.stop, but if it is
> >>called by
> >> > >>the
> >> > >>ioThread itself the stop call will fail). Hence we came up with the
> >> flag
> >> > >>approach to let the sender thread to close as soon as it is at the
> >> > >>barrier
> >> > >>of the run loop.
> >> > >>
> >> > >>Guozhang
> >> > >>
> >> > >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <jun@confluent.io
> >> <javascript:;>> wrote:
> >> > >>
> >> > >>> Hmm, does that mean that after close(0), the sender thread is not
> >> > >>>necessary
> >> > >>> gone? Normally, after closing an entity, we expect all internal
> >> threads
> >> > >>> associated with the entity are shut down completely.
> >> > >>>
> >> > >>> Thanks,
> >> > >>>
> >> > >>> Jun
> >> > >>>
> >> > >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
> >> > >>><jq...@linkedin.com.invalid>
> >> > >>> wrote:
> >> > >>>
> >> > >>> > Hi Jun,
> >> > >>> >
> >> > >>> > Close(0) will set two flags in sender. Running=false and a newly
> >> > >>>added
> >> > >>> > forceClose=true. It will also set accumulator.closed=true so no
> >> > >>>further
> >> > >>> > producer.send() will succeed.
> >> > >>> > The sender thread will finish executing all the callbacks in
> >> current
> >> > >>> batch
> >> > >>> > of responses, then it will see the forceClose flag. It will just
> >> fail
> >> > >>>all
> >> > >>> > the incomplete batches in the producer and exit.
> >> > >>> > So close(0) is a non-blocking call and sender thread will not
> >>try
> >> to
> >> > >>>join
> >> > >>> > itself in close(0).
> >> > >>> >
> >> > >>> > Thanks.
> >> > >>> >
> >> > >>> > Jiangjie (Becket) Qin
> >> > >>> >
> >> > >>> > On 3/16/15, 2:50 PM, "Jun Rao" <jun@confluent.io
> <javascript:;>>
> >> wrote:
> >> > >>> >
> >> > >>> > >How does close(0) work if it's called from the sender thread?
> >>If
> >> > >>> close(0)
> >> > >>> > >needs to wait for the sender thread to join, wouldn't this
> >>cause a
> >> > >>> > >deadlock?
> >> > >>> > >
> >> > >>> > >Thanks,
> >> > >>> > >
> >> > >>> > >Jun
> >> > >>> > >
> >> > >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
> >> > >>><jqin@linkedin.com.invalid
> >> > >>> >
> >> > >>> > >wrote:
> >> > >>> > >
> >> > >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered
> >>without
> >> > >>> > >> discussing with you :)
> >> > >>> > >>
> >> > >>> > >> Jiangjie (Becket) Qin
> >> > >>> > >>
> >> > >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wangguoz@gmail.com
> >> <javascript:;>> wrote:
> >> > >>> > >>
> >> > >>> > >> >Thanks Jiangjie,
> >> > >>> > >> >
> >> > >>> > >> >After talking to you offline on this, I have been convinced
> >>and
> >> > >>> > >>changed my
> >> > >>> > >> >preference to blocking. The immediate shutdown approach does
> >> have
> >> > >>> some
> >> > >>> > >> >unsafeness in some cases.
> >> > >>> > >> >
> >> > >>> > >> >Guozhang
> >> > >>> > >> >
> >> > >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
> >> > >>> > >><jqin@linkedin.com.invalid
> >> > >>> > >> >
> >> > >>> > >> >wrote:
> >> > >>> > >> >
> >> > >>> > >> >> It looks that the problem we want to solve and the
> >>purpose we
> >> > >>>want
> >> > >>> to
> >> > >>> > >> >> achieve is:
> >> > >>> > >> >> If user uses close() in callback, we want to let user be
> >> aware
> >> > >>>that
> >> > >>> > >>they
> >> > >>> > >> >> should use close(0) instead of close() in the callback.
> >> > >>> > >> >>
> >> > >>> > >> >> We have agreed that we will have an error log to inform
> >>user
> >> > >>>about
> >> > >>> > >>this
> >> > >>> > >> >> mis-usage. The options differ in the way how we can force
> >> user
> >> > >>>to
> >> > >>> > >>take a
> >> > >>> > >> >> look at that error log.
> >> > >>> > >> >> There are two scenarios:
> >> > >>> > >> >> 1. User does not expect the program to exit.
> >> > >>> > >> >> 2. User expect the program to exit.
> >> > >>> > >> >>
> >> > >>> > >> >> For scenario 1), blocking will probably delay the
> >>discovery
> >> of
> >> > >>>the
> >> > >>> > >> >> problem. Calling close(0) exposes the problem quicker. In
> >> this
> >> > >>> > >>scenario
> >> > >>> > >> >> producer just encounter a send failure when running
> >>normally.
> >> > >>> > >> >> For scenario 2), blocking will expose the problem quick.
> >> > >>>Calling
> >> > >>> > >> >>close(-1)
> >> > >>> > >> >> might hide the problem. This scenario might include: a)
> >>Unit
> >> > >>>test
> >> > >>> > >>for a
> >> > >>> > >> >> send failure. b) Message sending during a close() call
> >>from a
> >> > >>>user
> >> > >>> > >> >>thread.
> >> > >>> > >> >>
> >> > >>> > >> >> So as a summary table:
> >> > >>> > >> >>
> >> > >>> > >> >>                   Scenario 1)
> >>  Scenario
> >> > >>>2)
> >> > >>> > >> >>
> >> > >>> > >> >> Blocking      Delay problem discovery         Guaranteed
> >> > >>>problem
> >> > >>> > >> >>discovery
> >> > >>> > >> >>
> >> > >>> > >> >> Close(-1)     Immediate problem discovery     Problem
> >>might
> >> be
> >> > >>> hidden
> >> > >>> > >> >>
> >> > >>> > >> >>
> >> > >>> > >> >> Personally I prefer blocking because it seems providing
> >>more
> >> > >>> > >>guarantees
> >> > >>> > >> >> and safer.
> >> > >>> > >> >>
> >> > >>> > >> >> Thanks.
> >> > >>> > >> >>
> >> > >>> > >> >> Jiangjie (Becket) Qin
> >> > >>> > >> >>
> >> > >>> > >> >>
> >> > >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wangguoz@gmail.com
> >> <javascript:;>>
> >> > >>>wrote:
> >> > >>> > >> >>
> >> > >>> > >> >> >HI Jiangjie,
> >> > >>> > >> >> >
> >> > >>> > >> >> >As far as I understand calling close() in the ioThread is
> >> not
> >> > >>> > >>common,
> >> > >>> > >> >>as
> >> > >>> > >> >> >it
> >> > >>> > >> >> >may only trigger when we saw some non-retriable error.
> >>Hence
> >> > >>>when
> >> > >>> > >>user
> >> > >>> > >> >>run
> >> > >>> > >> >> >their program it is unlikely that close() will be
> >>triggered
> >> > >>>and
> >> > >>> > >>problem
> >> > >>> > >> >> >will be detected. So it seems to me that from the error
> >> > >>>detection
> >> > >>> > >> >>aspect
> >> > >>> > >> >> >these two options seems to be the same as people will
> >> usually
> >> > >>> > >>detect it
> >> > >>> > >> >> >from the producer metrics all dropping to 0.
> >> > >>> > >> >> >
> >> > >>> > >> >> >Guozhang
> >> > >>> > >> >> >
> >> > >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> >> > >>> > >> >><jq...@linkedin.com.invalid>
> >> > >>> > >> >> >wrote:
> >> > >>> > >> >> >
> >> > >>> > >> >> >> It seems there are two options we can choose from when
> >> > >>>close()
> >> > >>> is
> >> > >>> > >> >>called
> >> > >>> > >> >> >> from sender thread (callback):
> >> > >>> > >> >> >> 1. Log an error and close the producer using close(-1)
> >> > >>> > >> >> >> 2. Log an error and block.
> >> > >>> > >> >> >> (Throwing an exception will not work because we catch
> >>all
> >> > >>>the
> >> > >>> > >> >>exception
> >> > >>> > >> >> >> thrown from user callback. It will just lead to an
> >>error
> >> > >>>log.)
> >> > >>> > >> >> >>
> >> > >>> > >> >> >> My concern for the first option is that the producer
> >>will
> >> be
> >> > >>> > >>closed
> >> > >>> > >> >>even
> >> > >>> > >> >> >> if we logged and error. I am wondering if some user
> >>would
> >> > >>>not
> >> > >>> even
> >> > >>> > >> >>take
> >> > >>> > >> >> >>a
> >> > >>> > >> >> >> look at the log if producer is closed normally. Because
> >> from
> >> > >>>the
> >> > >>> > >> >> >>programs
> >> > >>> > >> >> >> behavior, everything looks good. If that is the case,
> >>the
> >> > >>>error
> >> > >>> > >> >>message
> >> > >>> > >> >> >>we
> >> > >>> > >> >> >> logged probably will just be ignored until some day
> >>when
> >> > >>>people
> >> > >>> > >>check
> >> > >>> > >> >> >>the
> >> > >>> > >> >> >> log and see it.
> >> > >>> > >> >> >>
> >> > >>> > >> >> >> As for the second option, because producer does not
> >>close
> >> > >>>but
> >> > >>> > >>blocks.
> >> > >>> > >> >> >>User
> >> > >>> > >> >> >> will notice this the first time they run the program.
> >>They
> >> > >>> > >>probably
> >> > >>> > >> >>will
> >> > >>> > >> >> >> look at the log to see why producer could not be closed
> >> and
> >> > >>>they
> >> > >>> > >>will
> >> > >>> > >> >> >>see
> >> > >>> > >> >> >> the error log we put there. So they will get informed
> >> about
> >> > >>>this
> >> > >>> > >> >> >>mis-usage
> >> > >>> > >> >> >> of close() in sender thread the first time they run the
> >> code
> >> > >>> > >>instead
> >> > >>> > >> >>of
> >> > >>> > >> >> >> some time later.
> >> > >>> > >> >> >>
> >> > >>> > >> >> >> Personally I prefer the second one because it is more
> >> > >>>obvious
> >> > >>> that
> >> > >>> > >> >> >> something was wrong.
> >> > >>> > >> >> >>
> >> > >>> > >> >> >> Jiangjie (Becket) Qin
> >> > >>> > >> >> >>
> >> > >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang"
> >><wangguoz@gmail.com
> >> <javascript:;>>
> >> > >>> wrote:
> >> > >>> > >> >> >>
> >> > >>> > >> >> >> >Yeah I agree we should not silently change the
> >>behavior
> >> of
> >> > >>>the
> >> > >>> > >> >>function
> >> > >>> > >> >> >> >with the given parameters; and I would prefer
> >> > >>> > >> >> >>error-logging-and-shutdown
> >> > >>> > >> >> >> >over blocking when close(>0) is used, since as Neha
> >> > >>>suggested
> >> > >>> > >> >>blocking
> >> > >>> > >> >> >> >would also not proceed with sending any data, bu will
> >> just
> >> > >>>let
> >> > >>> > >> >>users to
> >> > >>> > >> >> >> >realize the issue later than sooner.
> >> > >>> > >> >> >> >
> >> > >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
> >> > >>> > >><neha@confluent.io <javascript:;>>
> >> > >>> > >> >> >>wrote:
> >> > >>> > >> >> >> >
> >> > >>> > >> >> >> >> >
> >> > >>> > >> >> >> >> > And I also agree it is better if we can make
> >>producer
> >> > >>>block
> >> > >>> > >>when
> >> > >>> > >> >> >> >> > close() is called from sender thread so user will
> >> > >>>notice
> >> > >>> > >> >>something
> >> > >>> > >> >> >> >>went
> >> > >>> > >> >> >> >> > wrong.
> >> > >>> > >> >> >> >>
> >> > >>> > >> >> >> >>
> >> > >>> > >> >> >> >> This isn't a great experience either. Why can't we
> >>just
> >> > >>>throw
> >> > >>> > >>an
> >> > >>> > >> >> >> >>exception
> >> > >>> > >> >> >> >> for a behavior we know is incorrect and we'd like
> >>the
> >> > >>>user to
> >> > >>> > >> >>know.
> >> > >>> > >> >> >> >> Blocking as a means of doing that seems wrong and
> >> > >>>annoying.
> >> > >>> > >> >> >> >>
> >> > >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
> >> > >>> > >><jay.kreps@gmail.com <javascript:;>>
> >> > >>> > >> >> >> wrote:
> >> > >>> > >> >> >> >>
> >> > >>> > >> >> >> >> > Cool.
> >> > >>> > >> >> >> >> >
> >> > >>> > >> >> >> >> > I think blocking is good or alternately throwing
> >>an
> >> > >>> exception
> >> > >>> > >> >> >>directly
> >> > >>> > >> >> >> >> from
> >> > >>> > >> >> >> >> > close(). Basically I would just worry about subtly
> >> > >>>doing
> >> > >>> > >> >>something
> >> > >>> > >> >> >> >> slightly
> >> > >>> > >> >> >> >> > different from what the user asked for as it will
> >>be
> >> > >>>hard
> >> > >>> to
> >> > >>> > >> >>notice
> >> > >>> > >> >> >> >>that
> >> > >>> > >> >> >> >> > behavior difference.
> >> > >>> > >> >> >> >> >
> >> > >>> > >> >> >> >> > -Jay
> >> > >>> > >> >> >> >> >
> >> > >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> >> > >>> > >> >> >> >><jqin@linkedin.com.invalid
> >> > >>> > >> >> >> >> >
> >> > >>> > >> >> >> >> > wrote:
> >> > >>> > >> >> >> >> >
> >> > >>> > >> >> >> >> > > Hi Jay,
> >> > >>> > >> >> >> >> > >
> >> > >>> > >> >> >> >> > > I have modified the KIP as you suggested. I
> >>thinks
> >> as
> >> > >>> long
> >> > >>> > >>as
> >> > >>> > >> >>we
> >> > >>> > >> >> >> >>have
> >> > >>> > >> >> >> >> > > consistent define for timeout across Kafka
> >> interface,
> >> > >>> there
> >> > >>> > >> >>would
> >> > >>> > >> >> >> >>be no
> >> > >>> > >> >> >> >> > > problem. And I also agree it is better if we can
> >> make
> >> > >>> > >>producer
> >> > >>> > >> >> >>block
> >> > >>> > >> >> >> >> when
> >> > >>> > >> >> >> >> > > close() is called from sender thread so user
> >>will
> >> > >>>notice
> >> > >>> > >> >> >>something
> >> > >>> > >> >> >> >>went
> >> > >>> > >> >> >> >> > > wrong.
> >> > >>> > >> >> >> >> > >
> >> > >>> > >> >> >> >> > > Thanks.
> >> > >>> > >> >> >> >> > >
> >> > >>> > >> >> >> >> > > Jiangjie (Becket) Qin
> >> > >>> > >> >> >> >> > >
> >> > >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
> >> > >>><jay.kreps@gmail.com <javascript:;>>
> >> > >>> > >> wrote:
> >> > >>> > >> >> >> >> > >
> >> > >>> > >> >> >> >> > > >Hey Jiangjie,
> >> > >>> > >> >> >> >> > > >
> >> > >>> > >> >> >> >> > > >I think this is going to be very confusing that
> >> > >>> > >> >> >> >> > > >  close(0) waits indefinitely and
> >> > >>> > >> >> >> >> > > >  close(-1) waits for 0.
> >> > >>> > >> >> >> >> > > >I understand this appears in other apis, but it
> >> is a
> >> > >>> > >>constant
> >> > >>> > >> >> >> >>cause of
> >> > >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
> >> > >>> > >> >> >> >> > > >
> >> > >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't need a
> >> way
> >> > >>>to
> >> > >>> > >>wait
> >> > >>> > >> >> >> >> indefinitely
> >> > >>> > >> >> >> >> > > >as
> >> > >>> > >> >> >> >> > > >we already have close() so having a magical
> >> constant
> >> > >>>for
> >> > >>> > >> >>that is
> >> > >>> > >> >> >> >> > > >redundant.
> >> > >>> > >> >> >> >> > > >
> >> > >>> > >> >> >> >> > > >Calling close() from the I/O thread was already
> >> > >>>possible
> >> > >>> > >>and
> >> > >>> > >> >> >>would
> >> > >>> > >> >> >> >> block
> >> > >>> > >> >> >> >> > > >indefinitely. I think trying to silently change
> >> the
> >> > >>> > >>behavior
> >> > >>> > >> >>is
> >> > >>> > >> >> >> >> probably
> >> > >>> > >> >> >> >> > > >not right. I.e. if the user calls close() in
> >>the
> >> > >>> callback
> >> > >>> > >> >>there
> >> > >>> > >> >> >>is
> >> > >>> > >> >> >> >> > > >actually
> >> > >>> > >> >> >> >> > > >some misunderstanding and they need to think
> >>more,
> >> > >>> > >>silently
> >> > >>> > >> >> >>making
> >> > >>> > >> >> >> >> this
> >> > >>> > >> >> >> >> > > >not
> >> > >>> > >> >> >> >> > > >block will hide the problem from them which is
> >>the
> >> > >>> > >>opposite
> >> > >>> > >> >>of
> >> > >>> > >> >> >> >>what we
> >> > >>> > >> >> >> >> > > >want.
> >> > >>> > >> >> >> >> > > >
> >> > >>> > >> >> >> >> > > >-Jay
> >> > >>> > >> >> >> >> > > >
> >> > >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> >> > >>> > >> >> >> >> <jqin@linkedin.com.invalid
> >> > >>> > >> >> >> >> > >
> >> > >>> > >> >> >> >> > > >wrote:
> >> > >>> > >> >> >> >> > > >
> >> > >>> > >> >> >> >> > > >> Hey Joe & Jay,
> >> > >>> > >> >> >> >> > > >>
> >> > >>> > >> >> >> >> > > >> Thanks for the comments on the voting thread.
> >> > >>>Since it
> >> > >>> > >> >>seems
> >> > >>> > >> >> >>we
> >> > >>> > >> >> >> >> > probably
> >> > >>> > >> >> >> >> > > >> will have more discussion on this, I am just
> >> > >>>replying
> >> > >>> > >>from
> >> > >>> > >> >>the
> >> > >>> > >> >> >> >> > > >>discussion
> >> > >>> > >> >> >> >> > > >> thread here.
> >> > >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less
> >>like
> >> > >>> > >>half-baked,
> >> > >>> > >> >> >> >>apologize
> >> > >>> > >> >> >> >> > for
> >> > >>> > >> >> >> >> > > >> the rush...
> >> > >>> > >> >> >> >> > > >>
> >> > >>> > >> >> >> >> > > >> The contract in current KIP is:
> >> > >>> > >> >> >> >> > > >>   1. close() - wait until all requests either
> >> are
> >> > >>>sent
> >> > >>> > >>or
> >> > >>> > >> >> >>reach
> >> > >>> > >> >> >> >> > request
> >> > >>> > >> >> >> >> > > >> timeout.
> >> > >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close
> >> > >>> > >>immediately
> >> > >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) -
> >> equivalent
> >> > >>>to
> >> > >>> > >> >>close(),
> >> > >>> > >> >> >> >>i.e.
> >> > >>> > >> >> >> >> > Wait
> >> > >>> > >> >> >> >> > > >> until all requests are sent or reach request
> >> > >>>timeout
> >> > >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try
> >>the
> >> > >>>best to
> >> > >>> > >> >>finish
> >> > >>> > >> >> >> >> sending
> >> > >>> > >> >> >> >> > > >>in 5
> >> > >>> > >> >> >> >> > > >> milliseconds, if something went wrong, just
> >> > >>>shutdown
> >> > >>> the
> >> > >>> > >> >> >>producer
> >> > >>> > >> >> >> >> > > >>anyway,
> >> > >>> > >> >> >> >> > > >> my callback will handle the failures.
> >> > >>> > >> >> >> >> > > >>
> >> > >>> > >> >> >> >> > > >> About how we define what timeout value stands
> >> for,
> >> > >>>I
> >> > >>> > >> >>actually
> >> > >>> > >> >> >> >> > struggled
> >> > >>> > >> >> >> >> > > >>a
> >> > >>> > >> >> >> >> > > >> little bit when wrote the patch. Intuitively,
> >> > >>>close(0)
> >> > >>> > >> >>should
> >> > >>> > >> >> >> >>mean
> >> > >>> > >> >> >> >> > > >> immediately, however it seems that all the
> >> > >>>existing
> >> > >>> java
> >> > >>> > >> >>class
> >> > >>> > >> >> >> >>have
> >> > >>> > >> >> >> >> > this
> >> > >>> > >> >> >> >> > > >> convention of timeout=0 means no timeout or
> >> never
> >> > >>> > >>timeout
> >> > >>> > >> >> >> >> > > >>(Thread.join(0),
> >> > >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is
> >> > >>>either we
> >> > >>> > >> >>follow
> >> > >>> > >> >> >>the
> >> > >>> > >> >> >> >> > > >> intuition or we follow the convention. What I
> >> > >>>chose is
> >> > >>> > >>to
> >> > >>> > >> >> >>follow
> >> > >>> > >> >> >> >>the
> >> > >>> > >> >> >> >> > > >> convention but document the interface to let
> >> user
> >> > >>>be
> >> > >>> > >>aware
> >> > >>> > >> >>of
> >> > >>> > >> >> >>the
> >> > >>> > >> >> >> >> > usage.
> >> > >>> > >> >> >> >> > > >> The reason is that I think producer.close()
> >>is a
> >> > >>> public
> >> > >>> > >> >> >> >>interface so
> >> > >>> > >> >> >> >> > it
> >> > >>> > >> >> >> >> > > >> might be better to follow java convention.
> >> Whereas
> >> > >>> > >> >>selector is
> >> > >>> > >> >> >> >>not a
> >> > >>> > >> >> >> >> > > >> public interface that used by user, so as
> >>long
> >> as
> >> > >>>it
> >> > >>> > >>makes
> >> > >>> > >> >> >>sense
> >> > >>> > >> >> >> >>to
> >> > >>> > >> >> >> >> > us,
> >> > >>> > >> >> >> >> > > >>it
> >> > >>> > >> >> >> >> > > >> is less a problem to be different from java
> >> > >>> convention.
> >> > >>> > >> >>That
> >> > >>> > >> >> >>said
> >> > >>> > >> >> >> >> > since
> >> > >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public
> >> interface,
> >> > >>>I
> >> > >>> > >>think
> >> > >>> > >> >>it
> >> > >>> > >> >> >> >>also
> >> > >>> > >> >> >> >> > makes
> >> > >>> > >> >> >> >> > > >> sense to make producer.close() to have the
> >>same
> >> > >>> > >>definition
> >> > >>> > >> >>of
> >> > >>> > >> >> >> >> > > >> consumer.poll(timeout).
> >> > >>> > >> >> >> >> > > >>
> >> > >>> > >> >> >> >> > > >> The main argument for keeping a timeout in
> >>close
> >> > >>>would
> >> > >>> > >>be
> >> > >>> > >> >> >> >>separating
> >> > >>> > >> >> >> >> > the
> >> > >>> > >> >> >> >> > > >> close timeout from request timeout, which
> >> probably
> >> > >>> makes
> >> > >>> > >> >> >>sense. I
> >> > >>> > >> >> >> >> > would
> >> > >>> > >> >> >> >> > > >> guess typically the request timeout would be
> >> long
> >> > >>> (e.g.
> >> > >>> > >>60
> >> > >>> > >> >> >> >>seconds)
> >> > >>> > >> >> >> >> > > >> because we might want to consider retries
> >>with
> >> > >>>back
> >> > >>> off
> >> > >>> > >> >>time.
> >> > >>> > >> >> >>If
> >> > >>> > >> >> >> >>we
> >> > >>> > >> >> >> >> > have
> >> > >>> > >> >> >> >> > > >> multiple batches in accumulator, in worst
> >>case
> >> > >>>that
> >> > >>> > >>could
> >> > >>> > >> >>take
> >> > >>> > >> >> >> >>up to
> >> > >>> > >> >> >> >> > > >> several minutes to complete all the requests.
> >> But
> >> > >>>when
> >> > >>> > >>we
> >> > >>> > >> >> >>close a
> >> > >>> > >> >> >> >> > > >> producer, we might not want to wait for that
> >> long
> >> > >>>as
> >> > >>> it
> >> > >>> > >> >>might
> >> > >>> > >> >> >> >>cause
> >> > >>> > >> >> >> >> > some
> >> > >>> > >> >> >> >> > > >> other problem like deployment tool timeout.
> >> > >>> > >> >> >> >> > > >>
> >> > >>> > >> >> >> >> > > >> There is also a subtle difference between
> >> > >>> close(timeout)
> >> > >>> > >> >>and
> >> > >>> > >> >> >> >> > > >> flush(timeout). The only purpose for flush()
> >>is
> >> to
> >> > >>> write
> >> > >>> > >> >>data
> >> > >>> > >> >> >>to
> >> > >>> > >> >> >> >>the
> >> > >>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait
> >>until
> >> > >>> request
> >> > >>> > >> >> >>timeout.
> >> > >>> > >> >> >> >>I
> >> > >>> > >> >> >> >> > think
> >> > >>> > >> >> >> >> > > >> that is why flush(timeout) looks strange. On
> >>the
> >> > >>>other
> >> > >>> > >> >>hand,
> >> > >>> > >> >> >>the
> >> > >>> > >> >> >> >>top
> >> > >>> > >> >> >> >> > > >> priority for close() is to close the producer
> >> > >>>rather
> >> > >>> > >>than
> >> > >>> > >> >> >>flush()
> >> > >>> > >> >> >> >> > data,
> >> > >>> > >> >> >> >> > > >>so
> >> > >>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded
> >> waiting
> >> > >>>for
> >> > >>> > >>its
> >> > >>> > >> >>main
> >> > >>> > >> >> >> >>job.
> >> > >>> > >> >> >> >> > > >>
> >> > >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose
> >>flag.
> >> It
> >> > >>>is
> >> > >>> > >>not a
> >> > >>> > >> >> >>public
> >> > >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed Changes
> >> > >>>section
> >> > >>> > >> >>which I
> >> > >>> > >> >> >> >> thought
> >> > >>> > >> >> >> >> > > >>was
> >> > >>> > >> >> >> >> > > >> supposed to provide implementation details.
> >> > >>> > >> >> >> >> > > >>
> >> > >>> > >> >> >> >> > > >> Thanks again for all the comments and
> >> suggestions!
> >> > >>> > >> >> >> >> > > >>
> >> > >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
> >> > >>> > >> >> >> >> > > >>
> >> > >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
> >> > >>> jqin@linkedin.com <javascript:;>>
> >> > >>> > >> >> wrote:
> >> > >>> > >> >> >> >> > > >>
> >> > >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s
> >> comments.
> >> > >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process if
> >>no
> >> > >>>further
> >> > >>> > >> >> >>comments
> >> > >>> > >> >> >> >>are
> >> > >>> > >> >> >> >> > > >> >received by tomorrow.
> >> > >>> > >> >> >> >> > > >> >
> >> > >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
> >> > >>> > >> >> >> >> > > >> >
> >> > >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
> >> > >>><jay.kreps@gmail.com <javascript:;>
> >> > >>> >
> >> > >>> > >> >>wrote:
> >> > >>> > >> >> >> >> > > >> >
> >> > >>> > >> >> >> >> > > >> >>Hey Jiangjie,
> >> > >>> > >> >> >> >> > > >> >>
> >> > >>> > >> >> >> >> > > >> >>Can you capture the full motivation and use
> >> > >>>cases
> >> > >>> for
> >> > >>> > >>the
> >> > >>> > >> >> >> >>feature?
> >> > >>> > >> >> >> >> > > >>This
> >> > >>> > >> >> >> >> > > >> >>mentions your interest in having a way of
> >> > >>>aborting
> >> > >>> > >>from
> >> > >>> > >> >> >>inside
> >> > >>> > >> >> >> >>the
> >> > >>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain
> >>that
> >> > >>>usage
> >> > >>> or
> >> > >>> > >>why
> >> > >>> > >> >> >>other
> >> > >>> > >> >> >> >> > people
> >> > >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't list
> >> the
> >> > >>> > >>primary
> >> > >>> > >> >>use
> >> > >>> > >> >> >> >>case
> >> > >>> > >> >> >> >> for
> >> > >>> > >> >> >> >> > > >> >>having
> >> > >>> > >> >> >> >> > > >> >>close with a bounded timeout which was to
> >> avoid
> >> > >>> > >>blocking
> >> > >>> > >> >>too
> >> > >>> > >> >> >> >>long
> >> > >>> > >> >> >> >> on
> >> > >>> > >> >> >> >> > > >> >>shutdown.
> >> > >>> > >> >> >> >> > > >> >>
> >> > >>> > >> >> >> >> > > >> >>-Jay
> >> > >>> > >> >> >> >> > > >> >>
> >> > >>> > >> >> >> >> > > >> >>
> >> > >>> > >> >> >> >> > > >> >>
> >> > >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie
> >>Qin
> >> > >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
> >> > >>> > >> >> >> >> > > >> >
> >> > >>> > >> >> >> >> > > >> >>wrote:
> >> > >>> > >> >> >> >> > > >> >>
> >> > >>> > >> >> >> >> > > >> >>> Hi,
> >> > >>> > >> >> >> >> > > >> >>>
> >> > >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
> >> > >>>close(timeout)
> >> > >>> to
> >> > >>> > >>new
> >> > >>> > >> >> >> >> producer.
> >> > >>> > >> >> >> >> > > >>Most
> >> > >>> > >> >> >> >> > > >> >>>of
> >> > >>> > >> >> >> >> > > >> >>> the previous discussions are in
> >>KAFKA-1660
> >> > >>>where
> >> > >>> > >>Parth
> >> > >>> > >> >> >> >> Brahmbhatt
> >> > >>> > >> >> >> >> > > >>has
> >> > >>> > >> >> >> >> > > >> >>> already done a lot of work.
> >> > >>> > >> >> >> >> > > >> >>> Since this is an interface change so we
> >>are
> >> > >>>going
> >> > >>> > >> >>through
> >> > >>> > >> >> >>the
> >> > >>> > >> >> >> >> KIP
> >> > >>> > >> >> >> >> > > >> >>>process.
> >> > >>> > >> >> >> >> > > >> >>> Here is the KIP link:
> >> > >>> > >> >> >> >> > > >> >>>
> >> > >>> > >> >> >> >> > > >> >>>
> >> > >>> > >> >> >> >> > > >>
> >> > >>> > >> >> >> >> > >
> >> > >>> > >> >> >> >>
> >> > >>> > >> >> >>
> >> > >>> > >> >>
> >> > >>> > >>
> >> > >>> >
> >> > >>>
> >> >
> >>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >> > >>> > >> >> >> >> > > >> >>>2
> >> > >>> > >> >> >> >> > > >> >>>
> >> > >>> > >> >> >> >> > > >> >>> Thanks.
> >> > >>> > >> >> >> >> > > >> >>>
> >> > >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> >> > >>> > >> >> >> >> > > >> >>>
> >> > >>> > >> >> >> >> > > >> >
> >> > >>> > >> >> >> >> > > >>
> >> > >>> > >> >> >> >> > > >>
> >> > >>> > >> >> >> >> > >
> >> > >>> > >> >> >> >> > >
> >> > >>> > >> >> >> >> >
> >> > >>> > >> >> >> >>
> >> > >>> > >> >> >> >>
> >> > >>> > >> >> >> >>
> >> > >>> > >> >> >> >> --
> >> > >>> > >> >> >> >> Thanks,
> >> > >>> > >> >> >> >> Neha
> >> > >>> > >> >> >> >>
> >> > >>> > >> >> >> >
> >> > >>> > >> >> >> >
> >> > >>> > >> >> >> >
> >> > >>> > >> >> >> >--
> >> > >>> > >> >> >> >-- Guozhang
> >> > >>> > >> >> >>
> >> > >>> > >> >> >>
> >> > >>> > >> >> >
> >> > >>> > >> >> >
> >> > >>> > >> >> >--
> >> > >>> > >> >> >-- Guozhang
> >> > >>> > >> >>
> >> > >>> > >> >>
> >> > >>> > >> >
> >> > >>> > >> >
> >> > >>> > >> >--
> >> > >>> > >> >-- Guozhang
> >> > >>> > >>
> >> > >>> > >>
> >> > >>> >
> >> > >>> >
> >> > >>>
> >> > >>
> >> > >>
> >> > >>
> >> > >>--
> >> > >>-- Guozhang
> >> > >
> >> >
> >> >
> >>
> >
> >
> >--
> >Sent from Gmail Mobile
>
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

I think the we agreed that we are going to log an error and block. By
doing this we can make sure the error log to be checked by user in all
cases. 
If we silently replace close() to close(0) in sender thread, in some cases
such as send error during a normal close(), user might not notice
something went wrong.

On 3/19/15, 10:34 PM, "Joel Koshy" <jj...@gmail.com> wrote:

>close can probably just getCurrentThread and check if == senderThread.
>I'm actually not sure from this thread if there was clear agreement on
>whether it should change close(timeout)/close() to close(0) or if it
>should
>log an error and block up to the timeout.
>
>On Thursday, March 19, 2015, Jun Rao <ju...@confluent.io> wrote:
>
>> So in (1), if a close() or close(timeout) is called from a callback, we
>> will just turn that into a close(0)? Implementation wise, how do we know
>> whether a close() call is made from the sender thread or not?
>>
>> Thanks,
>>
>> Jun
>>
>> On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin
>><jq...@linkedin.com.invalid>
>> wrote:
>>
>> > It looks we have another option and are now deciding between the
>> following
>> > two interfaces:
>> >
>> > 1. Close() + close(timeout)
>> >   - timeout could be either positive or zero.
>> >   - only close(0) can be called from sender thread
>> >
>> > 2. Close() + abort() + close(timeout)
>> >   - timeout can either be positive or zero
>> >   - only abort() can be called from sender thread
>> >
>> >   - abort() is equivalent to close(0) in 1) but does not join sender
>> > thread and does not close metrics.
>> >   - Another thread has to call close() or close(timeout) in order to
>>make
>> > sure the resources in producer are gone.
>> >
>> > The tow approach provides the same function we need, the difference is
>> > approach 2) follows convention of close() and abort(). On the other
>>hand,
>> > approach 1) saves one interface compared with approach 2) but does not
>> > follow the convention.
>> >
>> > When the two approaches come to user code, it is probably something
>>like
>> > this:
>> >
>> > Try {
>> >   While(!finished)
>> >     Producer.send(record, callback)
>> > } catch (Exception e) {
>> >   Producer.close(5)
>> > }
>> >
>> > Class CallbackImpl implements Callback {
>> >   onCompletion(RecordMetadata metadata Exception e) {
>> >     If (e != null)
>> >       Abort() / close()
>> >   }
>> > }
>> >
>> > Because the two approach leads to almost the same user code, assuming
>> > users are always calling producer.close() as a clean up step,
>>personally
>> I
>> > prefer approach 2) as it follows convention.
>> >
>> > Any thoughts?
>> >
>> > Jiangjie (Becket) Qin
>> >
>> >
>> > On 3/17/15, 10:25 AM, "Jiangjie Qin" <jqin@linkedin.com
>><javascript:;>>
>> wrote:
>> >
>> > >Hi Jun,
>> > >
>> > >Yes, as Guozhang said, the main reason we set a flag is because
>>close(0)
>> > >is expected to be called by sender thread itself.
>> > >If we want to maintain the semantic meaning of close(), one
>>alternative
>> is
>> > >to have an abort() method does the same thing as close(0) except
>> cleanup.
>> > >And in close(timeout), after timeout we call abort() and join the
>>sender
>> > >thread. This was one of the previous proposal. We merged abort to
>> close(0)
>> > >because they are almost doing the same thing. But from what you
>> mentioned,
>> > >it might make sense to have two separate methods.
>> > >
>> > >Thanks.
>> > >
>> > >Jiangjie (Becket) Qin
>> > >
>> > >On 3/16/15, 10:31 PM, "Guozhang Wang" <wangguoz@gmail.com
>> <javascript:;>> wrote:
>> > >
>> > >>Yeah in this sense the sender thread will not exist immediately in
>>the
>> > >>close(0) call, but will only terminate after the current response
>>batch
>> > >>has
>> > >>been processed, as will the producer instance itself.
>> > >>
>> > >>There is a reason for this though: for a clean shutdown the caller
>> thread
>> > >>has to wait for the sender thread to join before closing the
>>producer
>> > >>instance, but this cannot be achieve if close(0) is called by the
>> sender
>> > >>thread itself (for example in KAFKA-1659, there is a proposal from
>> Andrew
>> > >>Stein on using thread.interrupt and thread.stop, but if it is
>>called by
>> > >>the
>> > >>ioThread itself the stop call will fail). Hence we came up with the
>> flag
>> > >>approach to let the sender thread to close as soon as it is at the
>> > >>barrier
>> > >>of the run loop.
>> > >>
>> > >>Guozhang
>> > >>
>> > >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <jun@confluent.io
>> <javascript:;>> wrote:
>> > >>
>> > >>> Hmm, does that mean that after close(0), the sender thread is not
>> > >>>necessary
>> > >>> gone? Normally, after closing an entity, we expect all internal
>> threads
>> > >>> associated with the entity are shut down completely.
>> > >>>
>> > >>> Thanks,
>> > >>>
>> > >>> Jun
>> > >>>
>> > >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
>> > >>><jq...@linkedin.com.invalid>
>> > >>> wrote:
>> > >>>
>> > >>> > Hi Jun,
>> > >>> >
>> > >>> > Close(0) will set two flags in sender. Running=false and a newly
>> > >>>added
>> > >>> > forceClose=true. It will also set accumulator.closed=true so no
>> > >>>further
>> > >>> > producer.send() will succeed.
>> > >>> > The sender thread will finish executing all the callbacks in
>> current
>> > >>> batch
>> > >>> > of responses, then it will see the forceClose flag. It will just
>> fail
>> > >>>all
>> > >>> > the incomplete batches in the producer and exit.
>> > >>> > So close(0) is a non-blocking call and sender thread will not
>>try
>> to
>> > >>>join
>> > >>> > itself in close(0).
>> > >>> >
>> > >>> > Thanks.
>> > >>> >
>> > >>> > Jiangjie (Becket) Qin
>> > >>> >
>> > >>> > On 3/16/15, 2:50 PM, "Jun Rao" <jun@confluent.io <javascript:;>>
>> wrote:
>> > >>> >
>> > >>> > >How does close(0) work if it's called from the sender thread?
>>If
>> > >>> close(0)
>> > >>> > >needs to wait for the sender thread to join, wouldn't this
>>cause a
>> > >>> > >deadlock?
>> > >>> > >
>> > >>> > >Thanks,
>> > >>> > >
>> > >>> > >Jun
>> > >>> > >
>> > >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
>> > >>><jqin@linkedin.com.invalid
>> > >>> >
>> > >>> > >wrote:
>> > >>> > >
>> > >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered
>>without
>> > >>> > >> discussing with you :)
>> > >>> > >>
>> > >>> > >> Jiangjie (Becket) Qin
>> > >>> > >>
>> > >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wangguoz@gmail.com
>> <javascript:;>> wrote:
>> > >>> > >>
>> > >>> > >> >Thanks Jiangjie,
>> > >>> > >> >
>> > >>> > >> >After talking to you offline on this, I have been convinced
>>and
>> > >>> > >>changed my
>> > >>> > >> >preference to blocking. The immediate shutdown approach does
>> have
>> > >>> some
>> > >>> > >> >unsafeness in some cases.
>> > >>> > >> >
>> > >>> > >> >Guozhang
>> > >>> > >> >
>> > >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
>> > >>> > >><jqin@linkedin.com.invalid
>> > >>> > >> >
>> > >>> > >> >wrote:
>> > >>> > >> >
>> > >>> > >> >> It looks that the problem we want to solve and the
>>purpose we
>> > >>>want
>> > >>> to
>> > >>> > >> >> achieve is:
>> > >>> > >> >> If user uses close() in callback, we want to let user be
>> aware
>> > >>>that
>> > >>> > >>they
>> > >>> > >> >> should use close(0) instead of close() in the callback.
>> > >>> > >> >>
>> > >>> > >> >> We have agreed that we will have an error log to inform
>>user
>> > >>>about
>> > >>> > >>this
>> > >>> > >> >> mis-usage. The options differ in the way how we can force
>> user
>> > >>>to
>> > >>> > >>take a
>> > >>> > >> >> look at that error log.
>> > >>> > >> >> There are two scenarios:
>> > >>> > >> >> 1. User does not expect the program to exit.
>> > >>> > >> >> 2. User expect the program to exit.
>> > >>> > >> >>
>> > >>> > >> >> For scenario 1), blocking will probably delay the
>>discovery
>> of
>> > >>>the
>> > >>> > >> >> problem. Calling close(0) exposes the problem quicker. In
>> this
>> > >>> > >>scenario
>> > >>> > >> >> producer just encounter a send failure when running
>>normally.
>> > >>> > >> >> For scenario 2), blocking will expose the problem quick.
>> > >>>Calling
>> > >>> > >> >>close(-1)
>> > >>> > >> >> might hide the problem. This scenario might include: a)
>>Unit
>> > >>>test
>> > >>> > >>for a
>> > >>> > >> >> send failure. b) Message sending during a close() call
>>from a
>> > >>>user
>> > >>> > >> >>thread.
>> > >>> > >> >>
>> > >>> > >> >> So as a summary table:
>> > >>> > >> >>
>> > >>> > >> >>                   Scenario 1)
>>  Scenario
>> > >>>2)
>> > >>> > >> >>
>> > >>> > >> >> Blocking      Delay problem discovery         Guaranteed
>> > >>>problem
>> > >>> > >> >>discovery
>> > >>> > >> >>
>> > >>> > >> >> Close(-1)     Immediate problem discovery     Problem
>>might
>> be
>> > >>> hidden
>> > >>> > >> >>
>> > >>> > >> >>
>> > >>> > >> >> Personally I prefer blocking because it seems providing
>>more
>> > >>> > >>guarantees
>> > >>> > >> >> and safer.
>> > >>> > >> >>
>> > >>> > >> >> Thanks.
>> > >>> > >> >>
>> > >>> > >> >> Jiangjie (Becket) Qin
>> > >>> > >> >>
>> > >>> > >> >>
>> > >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wangguoz@gmail.com
>> <javascript:;>>
>> > >>>wrote:
>> > >>> > >> >>
>> > >>> > >> >> >HI Jiangjie,
>> > >>> > >> >> >
>> > >>> > >> >> >As far as I understand calling close() in the ioThread is
>> not
>> > >>> > >>common,
>> > >>> > >> >>as
>> > >>> > >> >> >it
>> > >>> > >> >> >may only trigger when we saw some non-retriable error.
>>Hence
>> > >>>when
>> > >>> > >>user
>> > >>> > >> >>run
>> > >>> > >> >> >their program it is unlikely that close() will be
>>triggered
>> > >>>and
>> > >>> > >>problem
>> > >>> > >> >> >will be detected. So it seems to me that from the error
>> > >>>detection
>> > >>> > >> >>aspect
>> > >>> > >> >> >these two options seems to be the same as people will
>> usually
>> > >>> > >>detect it
>> > >>> > >> >> >from the producer metrics all dropping to 0.
>> > >>> > >> >> >
>> > >>> > >> >> >Guozhang
>> > >>> > >> >> >
>> > >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
>> > >>> > >> >><jq...@linkedin.com.invalid>
>> > >>> > >> >> >wrote:
>> > >>> > >> >> >
>> > >>> > >> >> >> It seems there are two options we can choose from when
>> > >>>close()
>> > >>> is
>> > >>> > >> >>called
>> > >>> > >> >> >> from sender thread (callback):
>> > >>> > >> >> >> 1. Log an error and close the producer using close(-1)
>> > >>> > >> >> >> 2. Log an error and block.
>> > >>> > >> >> >> (Throwing an exception will not work because we catch
>>all
>> > >>>the
>> > >>> > >> >>exception
>> > >>> > >> >> >> thrown from user callback. It will just lead to an
>>error
>> > >>>log.)
>> > >>> > >> >> >>
>> > >>> > >> >> >> My concern for the first option is that the producer
>>will
>> be
>> > >>> > >>closed
>> > >>> > >> >>even
>> > >>> > >> >> >> if we logged and error. I am wondering if some user
>>would
>> > >>>not
>> > >>> even
>> > >>> > >> >>take
>> > >>> > >> >> >>a
>> > >>> > >> >> >> look at the log if producer is closed normally. Because
>> from
>> > >>>the
>> > >>> > >> >> >>programs
>> > >>> > >> >> >> behavior, everything looks good. If that is the case,
>>the
>> > >>>error
>> > >>> > >> >>message
>> > >>> > >> >> >>we
>> > >>> > >> >> >> logged probably will just be ignored until some day
>>when
>> > >>>people
>> > >>> > >>check
>> > >>> > >> >> >>the
>> > >>> > >> >> >> log and see it.
>> > >>> > >> >> >>
>> > >>> > >> >> >> As for the second option, because producer does not
>>close
>> > >>>but
>> > >>> > >>blocks.
>> > >>> > >> >> >>User
>> > >>> > >> >> >> will notice this the first time they run the program.
>>They
>> > >>> > >>probably
>> > >>> > >> >>will
>> > >>> > >> >> >> look at the log to see why producer could not be closed
>> and
>> > >>>they
>> > >>> > >>will
>> > >>> > >> >> >>see
>> > >>> > >> >> >> the error log we put there. So they will get informed
>> about
>> > >>>this
>> > >>> > >> >> >>mis-usage
>> > >>> > >> >> >> of close() in sender thread the first time they run the
>> code
>> > >>> > >>instead
>> > >>> > >> >>of
>> > >>> > >> >> >> some time later.
>> > >>> > >> >> >>
>> > >>> > >> >> >> Personally I prefer the second one because it is more
>> > >>>obvious
>> > >>> that
>> > >>> > >> >> >> something was wrong.
>> > >>> > >> >> >>
>> > >>> > >> >> >> Jiangjie (Becket) Qin
>> > >>> > >> >> >>
>> > >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang"
>><wangguoz@gmail.com
>> <javascript:;>>
>> > >>> wrote:
>> > >>> > >> >> >>
>> > >>> > >> >> >> >Yeah I agree we should not silently change the
>>behavior
>> of
>> > >>>the
>> > >>> > >> >>function
>> > >>> > >> >> >> >with the given parameters; and I would prefer
>> > >>> > >> >> >>error-logging-and-shutdown
>> > >>> > >> >> >> >over blocking when close(>0) is used, since as Neha
>> > >>>suggested
>> > >>> > >> >>blocking
>> > >>> > >> >> >> >would also not proceed with sending any data, bu will
>> just
>> > >>>let
>> > >>> > >> >>users to
>> > >>> > >> >> >> >realize the issue later than sooner.
>> > >>> > >> >> >> >
>> > >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
>> > >>> > >><neha@confluent.io <javascript:;>>
>> > >>> > >> >> >>wrote:
>> > >>> > >> >> >> >
>> > >>> > >> >> >> >> >
>> > >>> > >> >> >> >> > And I also agree it is better if we can make
>>producer
>> > >>>block
>> > >>> > >>when
>> > >>> > >> >> >> >> > close() is called from sender thread so user will
>> > >>>notice
>> > >>> > >> >>something
>> > >>> > >> >> >> >>went
>> > >>> > >> >> >> >> > wrong.
>> > >>> > >> >> >> >>
>> > >>> > >> >> >> >>
>> > >>> > >> >> >> >> This isn't a great experience either. Why can't we
>>just
>> > >>>throw
>> > >>> > >>an
>> > >>> > >> >> >> >>exception
>> > >>> > >> >> >> >> for a behavior we know is incorrect and we'd like
>>the
>> > >>>user to
>> > >>> > >> >>know.
>> > >>> > >> >> >> >> Blocking as a means of doing that seems wrong and
>> > >>>annoying.
>> > >>> > >> >> >> >>
>> > >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
>> > >>> > >><jay.kreps@gmail.com <javascript:;>>
>> > >>> > >> >> >> wrote:
>> > >>> > >> >> >> >>
>> > >>> > >> >> >> >> > Cool.
>> > >>> > >> >> >> >> >
>> > >>> > >> >> >> >> > I think blocking is good or alternately throwing
>>an
>> > >>> exception
>> > >>> > >> >> >>directly
>> > >>> > >> >> >> >> from
>> > >>> > >> >> >> >> > close(). Basically I would just worry about subtly
>> > >>>doing
>> > >>> > >> >>something
>> > >>> > >> >> >> >> slightly
>> > >>> > >> >> >> >> > different from what the user asked for as it will
>>be
>> > >>>hard
>> > >>> to
>> > >>> > >> >>notice
>> > >>> > >> >> >> >>that
>> > >>> > >> >> >> >> > behavior difference.
>> > >>> > >> >> >> >> >
>> > >>> > >> >> >> >> > -Jay
>> > >>> > >> >> >> >> >
>> > >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
>> > >>> > >> >> >> >><jqin@linkedin.com.invalid
>> > >>> > >> >> >> >> >
>> > >>> > >> >> >> >> > wrote:
>> > >>> > >> >> >> >> >
>> > >>> > >> >> >> >> > > Hi Jay,
>> > >>> > >> >> >> >> > >
>> > >>> > >> >> >> >> > > I have modified the KIP as you suggested. I
>>thinks
>> as
>> > >>> long
>> > >>> > >>as
>> > >>> > >> >>we
>> > >>> > >> >> >> >>have
>> > >>> > >> >> >> >> > > consistent define for timeout across Kafka
>> interface,
>> > >>> there
>> > >>> > >> >>would
>> > >>> > >> >> >> >>be no
>> > >>> > >> >> >> >> > > problem. And I also agree it is better if we can
>> make
>> > >>> > >>producer
>> > >>> > >> >> >>block
>> > >>> > >> >> >> >> when
>> > >>> > >> >> >> >> > > close() is called from sender thread so user
>>will
>> > >>>notice
>> > >>> > >> >> >>something
>> > >>> > >> >> >> >>went
>> > >>> > >> >> >> >> > > wrong.
>> > >>> > >> >> >> >> > >
>> > >>> > >> >> >> >> > > Thanks.
>> > >>> > >> >> >> >> > >
>> > >>> > >> >> >> >> > > Jiangjie (Becket) Qin
>> > >>> > >> >> >> >> > >
>> > >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
>> > >>><jay.kreps@gmail.com <javascript:;>>
>> > >>> > >> wrote:
>> > >>> > >> >> >> >> > >
>> > >>> > >> >> >> >> > > >Hey Jiangjie,
>> > >>> > >> >> >> >> > > >
>> > >>> > >> >> >> >> > > >I think this is going to be very confusing that
>> > >>> > >> >> >> >> > > >  close(0) waits indefinitely and
>> > >>> > >> >> >> >> > > >  close(-1) waits for 0.
>> > >>> > >> >> >> >> > > >I understand this appears in other apis, but it
>> is a
>> > >>> > >>constant
>> > >>> > >> >> >> >>cause of
>> > >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
>> > >>> > >> >> >> >> > > >
>> > >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't need a
>> way
>> > >>>to
>> > >>> > >>wait
>> > >>> > >> >> >> >> indefinitely
>> > >>> > >> >> >> >> > > >as
>> > >>> > >> >> >> >> > > >we already have close() so having a magical
>> constant
>> > >>>for
>> > >>> > >> >>that is
>> > >>> > >> >> >> >> > > >redundant.
>> > >>> > >> >> >> >> > > >
>> > >>> > >> >> >> >> > > >Calling close() from the I/O thread was already
>> > >>>possible
>> > >>> > >>and
>> > >>> > >> >> >>would
>> > >>> > >> >> >> >> block
>> > >>> > >> >> >> >> > > >indefinitely. I think trying to silently change
>> the
>> > >>> > >>behavior
>> > >>> > >> >>is
>> > >>> > >> >> >> >> probably
>> > >>> > >> >> >> >> > > >not right. I.e. if the user calls close() in
>>the
>> > >>> callback
>> > >>> > >> >>there
>> > >>> > >> >> >>is
>> > >>> > >> >> >> >> > > >actually
>> > >>> > >> >> >> >> > > >some misunderstanding and they need to think
>>more,
>> > >>> > >>silently
>> > >>> > >> >> >>making
>> > >>> > >> >> >> >> this
>> > >>> > >> >> >> >> > > >not
>> > >>> > >> >> >> >> > > >block will hide the problem from them which is
>>the
>> > >>> > >>opposite
>> > >>> > >> >>of
>> > >>> > >> >> >> >>what we
>> > >>> > >> >> >> >> > > >want.
>> > >>> > >> >> >> >> > > >
>> > >>> > >> >> >> >> > > >-Jay
>> > >>> > >> >> >> >> > > >
>> > >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
>> > >>> > >> >> >> >> <jqin@linkedin.com.invalid
>> > >>> > >> >> >> >> > >
>> > >>> > >> >> >> >> > > >wrote:
>> > >>> > >> >> >> >> > > >
>> > >>> > >> >> >> >> > > >> Hey Joe & Jay,
>> > >>> > >> >> >> >> > > >>
>> > >>> > >> >> >> >> > > >> Thanks for the comments on the voting thread.
>> > >>>Since it
>> > >>> > >> >>seems
>> > >>> > >> >> >>we
>> > >>> > >> >> >> >> > probably
>> > >>> > >> >> >> >> > > >> will have more discussion on this, I am just
>> > >>>replying
>> > >>> > >>from
>> > >>> > >> >>the
>> > >>> > >> >> >> >> > > >>discussion
>> > >>> > >> >> >> >> > > >> thread here.
>> > >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less
>>like
>> > >>> > >>half-baked,
>> > >>> > >> >> >> >>apologize
>> > >>> > >> >> >> >> > for
>> > >>> > >> >> >> >> > > >> the rush...
>> > >>> > >> >> >> >> > > >>
>> > >>> > >> >> >> >> > > >> The contract in current KIP is:
>> > >>> > >> >> >> >> > > >>   1. close() - wait until all requests either
>> are
>> > >>>sent
>> > >>> > >>or
>> > >>> > >> >> >>reach
>> > >>> > >> >> >> >> > request
>> > >>> > >> >> >> >> > > >> timeout.
>> > >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close
>> > >>> > >>immediately
>> > >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) -
>> equivalent
>> > >>>to
>> > >>> > >> >>close(),
>> > >>> > >> >> >> >>i.e.
>> > >>> > >> >> >> >> > Wait
>> > >>> > >> >> >> >> > > >> until all requests are sent or reach request
>> > >>>timeout
>> > >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try
>>the
>> > >>>best to
>> > >>> > >> >>finish
>> > >>> > >> >> >> >> sending
>> > >>> > >> >> >> >> > > >>in 5
>> > >>> > >> >> >> >> > > >> milliseconds, if something went wrong, just
>> > >>>shutdown
>> > >>> the
>> > >>> > >> >> >>producer
>> > >>> > >> >> >> >> > > >>anyway,
>> > >>> > >> >> >> >> > > >> my callback will handle the failures.
>> > >>> > >> >> >> >> > > >>
>> > >>> > >> >> >> >> > > >> About how we define what timeout value stands
>> for,
>> > >>>I
>> > >>> > >> >>actually
>> > >>> > >> >> >> >> > struggled
>> > >>> > >> >> >> >> > > >>a
>> > >>> > >> >> >> >> > > >> little bit when wrote the patch. Intuitively,
>> > >>>close(0)
>> > >>> > >> >>should
>> > >>> > >> >> >> >>mean
>> > >>> > >> >> >> >> > > >> immediately, however it seems that all the
>> > >>>existing
>> > >>> java
>> > >>> > >> >>class
>> > >>> > >> >> >> >>have
>> > >>> > >> >> >> >> > this
>> > >>> > >> >> >> >> > > >> convention of timeout=0 means no timeout or
>> never
>> > >>> > >>timeout
>> > >>> > >> >> >> >> > > >>(Thread.join(0),
>> > >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is
>> > >>>either we
>> > >>> > >> >>follow
>> > >>> > >> >> >>the
>> > >>> > >> >> >> >> > > >> intuition or we follow the convention. What I
>> > >>>chose is
>> > >>> > >>to
>> > >>> > >> >> >>follow
>> > >>> > >> >> >> >>the
>> > >>> > >> >> >> >> > > >> convention but document the interface to let
>> user
>> > >>>be
>> > >>> > >>aware
>> > >>> > >> >>of
>> > >>> > >> >> >>the
>> > >>> > >> >> >> >> > usage.
>> > >>> > >> >> >> >> > > >> The reason is that I think producer.close()
>>is a
>> > >>> public
>> > >>> > >> >> >> >>interface so
>> > >>> > >> >> >> >> > it
>> > >>> > >> >> >> >> > > >> might be better to follow java convention.
>> Whereas
>> > >>> > >> >>selector is
>> > >>> > >> >> >> >>not a
>> > >>> > >> >> >> >> > > >> public interface that used by user, so as
>>long
>> as
>> > >>>it
>> > >>> > >>makes
>> > >>> > >> >> >>sense
>> > >>> > >> >> >> >>to
>> > >>> > >> >> >> >> > us,
>> > >>> > >> >> >> >> > > >>it
>> > >>> > >> >> >> >> > > >> is less a problem to be different from java
>> > >>> convention.
>> > >>> > >> >>That
>> > >>> > >> >> >>said
>> > >>> > >> >> >> >> > since
>> > >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public
>> interface,
>> > >>>I
>> > >>> > >>think
>> > >>> > >> >>it
>> > >>> > >> >> >> >>also
>> > >>> > >> >> >> >> > makes
>> > >>> > >> >> >> >> > > >> sense to make producer.close() to have the
>>same
>> > >>> > >>definition
>> > >>> > >> >>of
>> > >>> > >> >> >> >> > > >> consumer.poll(timeout).
>> > >>> > >> >> >> >> > > >>
>> > >>> > >> >> >> >> > > >> The main argument for keeping a timeout in
>>close
>> > >>>would
>> > >>> > >>be
>> > >>> > >> >> >> >>separating
>> > >>> > >> >> >> >> > the
>> > >>> > >> >> >> >> > > >> close timeout from request timeout, which
>> probably
>> > >>> makes
>> > >>> > >> >> >>sense. I
>> > >>> > >> >> >> >> > would
>> > >>> > >> >> >> >> > > >> guess typically the request timeout would be
>> long
>> > >>> (e.g.
>> > >>> > >>60
>> > >>> > >> >> >> >>seconds)
>> > >>> > >> >> >> >> > > >> because we might want to consider retries
>>with
>> > >>>back
>> > >>> off
>> > >>> > >> >>time.
>> > >>> > >> >> >>If
>> > >>> > >> >> >> >>we
>> > >>> > >> >> >> >> > have
>> > >>> > >> >> >> >> > > >> multiple batches in accumulator, in worst
>>case
>> > >>>that
>> > >>> > >>could
>> > >>> > >> >>take
>> > >>> > >> >> >> >>up to
>> > >>> > >> >> >> >> > > >> several minutes to complete all the requests.
>> But
>> > >>>when
>> > >>> > >>we
>> > >>> > >> >> >>close a
>> > >>> > >> >> >> >> > > >> producer, we might not want to wait for that
>> long
>> > >>>as
>> > >>> it
>> > >>> > >> >>might
>> > >>> > >> >> >> >>cause
>> > >>> > >> >> >> >> > some
>> > >>> > >> >> >> >> > > >> other problem like deployment tool timeout.
>> > >>> > >> >> >> >> > > >>
>> > >>> > >> >> >> >> > > >> There is also a subtle difference between
>> > >>> close(timeout)
>> > >>> > >> >>and
>> > >>> > >> >> >> >> > > >> flush(timeout). The only purpose for flush()
>>is
>> to
>> > >>> write
>> > >>> > >> >>data
>> > >>> > >> >> >>to
>> > >>> > >> >> >> >>the
>> > >>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait
>>until
>> > >>> request
>> > >>> > >> >> >>timeout.
>> > >>> > >> >> >> >>I
>> > >>> > >> >> >> >> > think
>> > >>> > >> >> >> >> > > >> that is why flush(timeout) looks strange. On
>>the
>> > >>>other
>> > >>> > >> >>hand,
>> > >>> > >> >> >>the
>> > >>> > >> >> >> >>top
>> > >>> > >> >> >> >> > > >> priority for close() is to close the producer
>> > >>>rather
>> > >>> > >>than
>> > >>> > >> >> >>flush()
>> > >>> > >> >> >> >> > data,
>> > >>> > >> >> >> >> > > >>so
>> > >>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded
>> waiting
>> > >>>for
>> > >>> > >>its
>> > >>> > >> >>main
>> > >>> > >> >> >> >>job.
>> > >>> > >> >> >> >> > > >>
>> > >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose
>>flag.
>> It
>> > >>>is
>> > >>> > >>not a
>> > >>> > >> >> >>public
>> > >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed Changes
>> > >>>section
>> > >>> > >> >>which I
>> > >>> > >> >> >> >> thought
>> > >>> > >> >> >> >> > > >>was
>> > >>> > >> >> >> >> > > >> supposed to provide implementation details.
>> > >>> > >> >> >> >> > > >>
>> > >>> > >> >> >> >> > > >> Thanks again for all the comments and
>> suggestions!
>> > >>> > >> >> >> >> > > >>
>> > >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
>> > >>> > >> >> >> >> > > >>
>> > >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
>> > >>> jqin@linkedin.com <javascript:;>>
>> > >>> > >> >> wrote:
>> > >>> > >> >> >> >> > > >>
>> > >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s
>> comments.
>> > >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process if
>>no
>> > >>>further
>> > >>> > >> >> >>comments
>> > >>> > >> >> >> >>are
>> > >>> > >> >> >> >> > > >> >received by tomorrow.
>> > >>> > >> >> >> >> > > >> >
>> > >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
>> > >>> > >> >> >> >> > > >> >
>> > >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
>> > >>><jay.kreps@gmail.com <javascript:;>
>> > >>> >
>> > >>> > >> >>wrote:
>> > >>> > >> >> >> >> > > >> >
>> > >>> > >> >> >> >> > > >> >>Hey Jiangjie,
>> > >>> > >> >> >> >> > > >> >>
>> > >>> > >> >> >> >> > > >> >>Can you capture the full motivation and use
>> > >>>cases
>> > >>> for
>> > >>> > >>the
>> > >>> > >> >> >> >>feature?
>> > >>> > >> >> >> >> > > >>This
>> > >>> > >> >> >> >> > > >> >>mentions your interest in having a way of
>> > >>>aborting
>> > >>> > >>from
>> > >>> > >> >> >>inside
>> > >>> > >> >> >> >>the
>> > >>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain
>>that
>> > >>>usage
>> > >>> or
>> > >>> > >>why
>> > >>> > >> >> >>other
>> > >>> > >> >> >> >> > people
>> > >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't list
>> the
>> > >>> > >>primary
>> > >>> > >> >>use
>> > >>> > >> >> >> >>case
>> > >>> > >> >> >> >> for
>> > >>> > >> >> >> >> > > >> >>having
>> > >>> > >> >> >> >> > > >> >>close with a bounded timeout which was to
>> avoid
>> > >>> > >>blocking
>> > >>> > >> >>too
>> > >>> > >> >> >> >>long
>> > >>> > >> >> >> >> on
>> > >>> > >> >> >> >> > > >> >>shutdown.
>> > >>> > >> >> >> >> > > >> >>
>> > >>> > >> >> >> >> > > >> >>-Jay
>> > >>> > >> >> >> >> > > >> >>
>> > >>> > >> >> >> >> > > >> >>
>> > >>> > >> >> >> >> > > >> >>
>> > >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie
>>Qin
>> > >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
>> > >>> > >> >> >> >> > > >> >
>> > >>> > >> >> >> >> > > >> >>wrote:
>> > >>> > >> >> >> >> > > >> >>
>> > >>> > >> >> >> >> > > >> >>> Hi,
>> > >>> > >> >> >> >> > > >> >>>
>> > >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
>> > >>>close(timeout)
>> > >>> to
>> > >>> > >>new
>> > >>> > >> >> >> >> producer.
>> > >>> > >> >> >> >> > > >>Most
>> > >>> > >> >> >> >> > > >> >>>of
>> > >>> > >> >> >> >> > > >> >>> the previous discussions are in
>>KAFKA-1660
>> > >>>where
>> > >>> > >>Parth
>> > >>> > >> >> >> >> Brahmbhatt
>> > >>> > >> >> >> >> > > >>has
>> > >>> > >> >> >> >> > > >> >>> already done a lot of work.
>> > >>> > >> >> >> >> > > >> >>> Since this is an interface change so we
>>are
>> > >>>going
>> > >>> > >> >>through
>> > >>> > >> >> >>the
>> > >>> > >> >> >> >> KIP
>> > >>> > >> >> >> >> > > >> >>>process.
>> > >>> > >> >> >> >> > > >> >>> Here is the KIP link:
>> > >>> > >> >> >> >> > > >> >>>
>> > >>> > >> >> >> >> > > >> >>>
>> > >>> > >> >> >> >> > > >>
>> > >>> > >> >> >> >> > >
>> > >>> > >> >> >> >>
>> > >>> > >> >> >>
>> > >>> > >> >>
>> > >>> > >>
>> > >>> >
>> > >>>
>> > 
>>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>> > >>> > >> >> >> >> > > >> >>>2
>> > >>> > >> >> >> >> > > >> >>>
>> > >>> > >> >> >> >> > > >> >>> Thanks.
>> > >>> > >> >> >> >> > > >> >>>
>> > >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
>> > >>> > >> >> >> >> > > >> >>>
>> > >>> > >> >> >> >> > > >> >
>> > >>> > >> >> >> >> > > >>
>> > >>> > >> >> >> >> > > >>
>> > >>> > >> >> >> >> > >
>> > >>> > >> >> >> >> > >
>> > >>> > >> >> >> >> >
>> > >>> > >> >> >> >>
>> > >>> > >> >> >> >>
>> > >>> > >> >> >> >>
>> > >>> > >> >> >> >> --
>> > >>> > >> >> >> >> Thanks,
>> > >>> > >> >> >> >> Neha
>> > >>> > >> >> >> >>
>> > >>> > >> >> >> >
>> > >>> > >> >> >> >
>> > >>> > >> >> >> >
>> > >>> > >> >> >> >--
>> > >>> > >> >> >> >-- Guozhang
>> > >>> > >> >> >>
>> > >>> > >> >> >>
>> > >>> > >> >> >
>> > >>> > >> >> >
>> > >>> > >> >> >--
>> > >>> > >> >> >-- Guozhang
>> > >>> > >> >>
>> > >>> > >> >>
>> > >>> > >> >
>> > >>> > >> >
>> > >>> > >> >--
>> > >>> > >> >-- Guozhang
>> > >>> > >>
>> > >>> > >>
>> > >>> >
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >>--
>> > >>-- Guozhang
>> > >
>> >
>> >
>>
>
>
>-- 
>Sent from Gmail Mobile

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Joel Koshy <jj...@gmail.com>.

close can probably just getCurrentThread and check if == senderThread.
I'm actually not sure from this thread if there was clear agreement on
whether it should change close(timeout)/close() to close(0) or if it should
log an error and block up to the timeout.

On Thursday, March 19, 2015, Jun Rao <ju...@confluent.io> wrote:

> So in (1), if a close() or close(timeout) is called from a callback, we
> will just turn that into a close(0)? Implementation wise, how do we know
> whether a close() call is made from the sender thread or not?
>
> Thanks,
>
> Jun
>
> On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
> wrote:
>
> > It looks we have another option and are now deciding between the
> following
> > two interfaces:
> >
> > 1. Close() + close(timeout)
> >   - timeout could be either positive or zero.
> >   - only close(0) can be called from sender thread
> >
> > 2. Close() + abort() + close(timeout)
> >   - timeout can either be positive or zero
> >   - only abort() can be called from sender thread
> >
> >   - abort() is equivalent to close(0) in 1) but does not join sender
> > thread and does not close metrics.
> >   - Another thread has to call close() or close(timeout) in order to make
> > sure the resources in producer are gone.
> >
> > The tow approach provides the same function we need, the difference is
> > approach 2) follows convention of close() and abort(). On the other hand,
> > approach 1) saves one interface compared with approach 2) but does not
> > follow the convention.
> >
> > When the two approaches come to user code, it is probably something like
> > this:
> >
> > Try {
> >   While(!finished)
> >     Producer.send(record, callback)
> > } catch (Exception e) {
> >   Producer.close(5)
> > }
> >
> > Class CallbackImpl implements Callback {
> >   onCompletion(RecordMetadata metadata Exception e) {
> >     If (e != null)
> >       Abort() / close()
> >   }
> > }
> >
> > Because the two approach leads to almost the same user code, assuming
> > users are always calling producer.close() as a clean up step, personally
> I
> > prefer approach 2) as it follows convention.
> >
> > Any thoughts?
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On 3/17/15, 10:25 AM, "Jiangjie Qin" <jqin@linkedin.com <javascript:;>>
> wrote:
> >
> > >Hi Jun,
> > >
> > >Yes, as Guozhang said, the main reason we set a flag is because close(0)
> > >is expected to be called by sender thread itself.
> > >If we want to maintain the semantic meaning of close(), one alternative
> is
> > >to have an abort() method does the same thing as close(0) except
> cleanup.
> > >And in close(timeout), after timeout we call abort() and join the sender
> > >thread. This was one of the previous proposal. We merged abort to
> close(0)
> > >because they are almost doing the same thing. But from what you
> mentioned,
> > >it might make sense to have two separate methods.
> > >
> > >Thanks.
> > >
> > >Jiangjie (Becket) Qin
> > >
> > >On 3/16/15, 10:31 PM, "Guozhang Wang" <wangguoz@gmail.com
> <javascript:;>> wrote:
> > >
> > >>Yeah in this sense the sender thread will not exist immediately in the
> > >>close(0) call, but will only terminate after the current response batch
> > >>has
> > >>been processed, as will the producer instance itself.
> > >>
> > >>There is a reason for this though: for a clean shutdown the caller
> thread
> > >>has to wait for the sender thread to join before closing the producer
> > >>instance, but this cannot be achieve if close(0) is called by the
> sender
> > >>thread itself (for example in KAFKA-1659, there is a proposal from
> Andrew
> > >>Stein on using thread.interrupt and thread.stop, but if it is called by
> > >>the
> > >>ioThread itself the stop call will fail). Hence we came up with the
> flag
> > >>approach to let the sender thread to close as soon as it is at the
> > >>barrier
> > >>of the run loop.
> > >>
> > >>Guozhang
> > >>
> > >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <jun@confluent.io
> <javascript:;>> wrote:
> > >>
> > >>> Hmm, does that mean that after close(0), the sender thread is not
> > >>>necessary
> > >>> gone? Normally, after closing an entity, we expect all internal
> threads
> > >>> associated with the entity are shut down completely.
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Jun
> > >>>
> > >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
> > >>><jq...@linkedin.com.invalid>
> > >>> wrote:
> > >>>
> > >>> > Hi Jun,
> > >>> >
> > >>> > Close(0) will set two flags in sender. Running=false and a newly
> > >>>added
> > >>> > forceClose=true. It will also set accumulator.closed=true so no
> > >>>further
> > >>> > producer.send() will succeed.
> > >>> > The sender thread will finish executing all the callbacks in
> current
> > >>> batch
> > >>> > of responses, then it will see the forceClose flag. It will just
> fail
> > >>>all
> > >>> > the incomplete batches in the producer and exit.
> > >>> > So close(0) is a non-blocking call and sender thread will not try
> to
> > >>>join
> > >>> > itself in close(0).
> > >>> >
> > >>> > Thanks.
> > >>> >
> > >>> > Jiangjie (Becket) Qin
> > >>> >
> > >>> > On 3/16/15, 2:50 PM, "Jun Rao" <jun@confluent.io <javascript:;>>
> wrote:
> > >>> >
> > >>> > >How does close(0) work if it's called from the sender thread? If
> > >>> close(0)
> > >>> > >needs to wait for the sender thread to join, wouldn't this cause a
> > >>> > >deadlock?
> > >>> > >
> > >>> > >Thanks,
> > >>> > >
> > >>> > >Jun
> > >>> > >
> > >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
> > >>><jqin@linkedin.com.invalid
> > >>> >
> > >>> > >wrote:
> > >>> > >
> > >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered without
> > >>> > >> discussing with you :)
> > >>> > >>
> > >>> > >> Jiangjie (Becket) Qin
> > >>> > >>
> > >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wangguoz@gmail.com
> <javascript:;>> wrote:
> > >>> > >>
> > >>> > >> >Thanks Jiangjie,
> > >>> > >> >
> > >>> > >> >After talking to you offline on this, I have been convinced and
> > >>> > >>changed my
> > >>> > >> >preference to blocking. The immediate shutdown approach does
> have
> > >>> some
> > >>> > >> >unsafeness in some cases.
> > >>> > >> >
> > >>> > >> >Guozhang
> > >>> > >> >
> > >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
> > >>> > >><jqin@linkedin.com.invalid
> > >>> > >> >
> > >>> > >> >wrote:
> > >>> > >> >
> > >>> > >> >> It looks that the problem we want to solve and the purpose we
> > >>>want
> > >>> to
> > >>> > >> >> achieve is:
> > >>> > >> >> If user uses close() in callback, we want to let user be
> aware
> > >>>that
> > >>> > >>they
> > >>> > >> >> should use close(0) instead of close() in the callback.
> > >>> > >> >>
> > >>> > >> >> We have agreed that we will have an error log to inform user
> > >>>about
> > >>> > >>this
> > >>> > >> >> mis-usage. The options differ in the way how we can force
> user
> > >>>to
> > >>> > >>take a
> > >>> > >> >> look at that error log.
> > >>> > >> >> There are two scenarios:
> > >>> > >> >> 1. User does not expect the program to exit.
> > >>> > >> >> 2. User expect the program to exit.
> > >>> > >> >>
> > >>> > >> >> For scenario 1), blocking will probably delay the discovery
> of
> > >>>the
> > >>> > >> >> problem. Calling close(0) exposes the problem quicker. In
> this
> > >>> > >>scenario
> > >>> > >> >> producer just encounter a send failure when running normally.
> > >>> > >> >> For scenario 2), blocking will expose the problem quick.
> > >>>Calling
> > >>> > >> >>close(-1)
> > >>> > >> >> might hide the problem. This scenario might include: a) Unit
> > >>>test
> > >>> > >>for a
> > >>> > >> >> send failure. b) Message sending during a close() call from a
> > >>>user
> > >>> > >> >>thread.
> > >>> > >> >>
> > >>> > >> >> So as a summary table:
> > >>> > >> >>
> > >>> > >> >>                   Scenario 1)
>  Scenario
> > >>>2)
> > >>> > >> >>
> > >>> > >> >> Blocking      Delay problem discovery         Guaranteed
> > >>>problem
> > >>> > >> >>discovery
> > >>> > >> >>
> > >>> > >> >> Close(-1)     Immediate problem discovery     Problem might
> be
> > >>> hidden
> > >>> > >> >>
> > >>> > >> >>
> > >>> > >> >> Personally I prefer blocking because it seems providing more
> > >>> > >>guarantees
> > >>> > >> >> and safer.
> > >>> > >> >>
> > >>> > >> >> Thanks.
> > >>> > >> >>
> > >>> > >> >> Jiangjie (Becket) Qin
> > >>> > >> >>
> > >>> > >> >>
> > >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wangguoz@gmail.com
> <javascript:;>>
> > >>>wrote:
> > >>> > >> >>
> > >>> > >> >> >HI Jiangjie,
> > >>> > >> >> >
> > >>> > >> >> >As far as I understand calling close() in the ioThread is
> not
> > >>> > >>common,
> > >>> > >> >>as
> > >>> > >> >> >it
> > >>> > >> >> >may only trigger when we saw some non-retriable error. Hence
> > >>>when
> > >>> > >>user
> > >>> > >> >>run
> > >>> > >> >> >their program it is unlikely that close() will be triggered
> > >>>and
> > >>> > >>problem
> > >>> > >> >> >will be detected. So it seems to me that from the error
> > >>>detection
> > >>> > >> >>aspect
> > >>> > >> >> >these two options seems to be the same as people will
> usually
> > >>> > >>detect it
> > >>> > >> >> >from the producer metrics all dropping to 0.
> > >>> > >> >> >
> > >>> > >> >> >Guozhang
> > >>> > >> >> >
> > >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> > >>> > >> >><jq...@linkedin.com.invalid>
> > >>> > >> >> >wrote:
> > >>> > >> >> >
> > >>> > >> >> >> It seems there are two options we can choose from when
> > >>>close()
> > >>> is
> > >>> > >> >>called
> > >>> > >> >> >> from sender thread (callback):
> > >>> > >> >> >> 1. Log an error and close the producer using close(-1)
> > >>> > >> >> >> 2. Log an error and block.
> > >>> > >> >> >> (Throwing an exception will not work because we catch all
> > >>>the
> > >>> > >> >>exception
> > >>> > >> >> >> thrown from user callback. It will just lead to an error
> > >>>log.)
> > >>> > >> >> >>
> > >>> > >> >> >> My concern for the first option is that the producer will
> be
> > >>> > >>closed
> > >>> > >> >>even
> > >>> > >> >> >> if we logged and error. I am wondering if some user would
> > >>>not
> > >>> even
> > >>> > >> >>take
> > >>> > >> >> >>a
> > >>> > >> >> >> look at the log if producer is closed normally. Because
> from
> > >>>the
> > >>> > >> >> >>programs
> > >>> > >> >> >> behavior, everything looks good. If that is the case, the
> > >>>error
> > >>> > >> >>message
> > >>> > >> >> >>we
> > >>> > >> >> >> logged probably will just be ignored until some day when
> > >>>people
> > >>> > >>check
> > >>> > >> >> >>the
> > >>> > >> >> >> log and see it.
> > >>> > >> >> >>
> > >>> > >> >> >> As for the second option, because producer does not close
> > >>>but
> > >>> > >>blocks.
> > >>> > >> >> >>User
> > >>> > >> >> >> will notice this the first time they run the program. They
> > >>> > >>probably
> > >>> > >> >>will
> > >>> > >> >> >> look at the log to see why producer could not be closed
> and
> > >>>they
> > >>> > >>will
> > >>> > >> >> >>see
> > >>> > >> >> >> the error log we put there. So they will get informed
> about
> > >>>this
> > >>> > >> >> >>mis-usage
> > >>> > >> >> >> of close() in sender thread the first time they run the
> code
> > >>> > >>instead
> > >>> > >> >>of
> > >>> > >> >> >> some time later.
> > >>> > >> >> >>
> > >>> > >> >> >> Personally I prefer the second one because it is more
> > >>>obvious
> > >>> that
> > >>> > >> >> >> something was wrong.
> > >>> > >> >> >>
> > >>> > >> >> >> Jiangjie (Becket) Qin
> > >>> > >> >> >>
> > >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wangguoz@gmail.com
> <javascript:;>>
> > >>> wrote:
> > >>> > >> >> >>
> > >>> > >> >> >> >Yeah I agree we should not silently change the behavior
> of
> > >>>the
> > >>> > >> >>function
> > >>> > >> >> >> >with the given parameters; and I would prefer
> > >>> > >> >> >>error-logging-and-shutdown
> > >>> > >> >> >> >over blocking when close(>0) is used, since as Neha
> > >>>suggested
> > >>> > >> >>blocking
> > >>> > >> >> >> >would also not proceed with sending any data, bu will
> just
> > >>>let
> > >>> > >> >>users to
> > >>> > >> >> >> >realize the issue later than sooner.
> > >>> > >> >> >> >
> > >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
> > >>> > >><neha@confluent.io <javascript:;>>
> > >>> > >> >> >>wrote:
> > >>> > >> >> >> >
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >> > And I also agree it is better if we can make producer
> > >>>block
> > >>> > >>when
> > >>> > >> >> >> >> > close() is called from sender thread so user will
> > >>>notice
> > >>> > >> >>something
> > >>> > >> >> >> >>went
> > >>> > >> >> >> >> > wrong.
> > >>> > >> >> >> >>
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> This isn't a great experience either. Why can't we just
> > >>>throw
> > >>> > >>an
> > >>> > >> >> >> >>exception
> > >>> > >> >> >> >> for a behavior we know is incorrect and we'd like the
> > >>>user to
> > >>> > >> >>know.
> > >>> > >> >> >> >> Blocking as a means of doing that seems wrong and
> > >>>annoying.
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
> > >>> > >><jay.kreps@gmail.com <javascript:;>>
> > >>> > >> >> >> wrote:
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> > Cool.
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >> > I think blocking is good or alternately throwing an
> > >>> exception
> > >>> > >> >> >>directly
> > >>> > >> >> >> >> from
> > >>> > >> >> >> >> > close(). Basically I would just worry about subtly
> > >>>doing
> > >>> > >> >>something
> > >>> > >> >> >> >> slightly
> > >>> > >> >> >> >> > different from what the user asked for as it will be
> > >>>hard
> > >>> to
> > >>> > >> >>notice
> > >>> > >> >> >> >>that
> > >>> > >> >> >> >> > behavior difference.
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >> > -Jay
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> > >>> > >> >> >> >><jqin@linkedin.com.invalid
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >> > wrote:
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >> > > Hi Jay,
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > > I have modified the KIP as you suggested. I thinks
> as
> > >>> long
> > >>> > >>as
> > >>> > >> >>we
> > >>> > >> >> >> >>have
> > >>> > >> >> >> >> > > consistent define for timeout across Kafka
> interface,
> > >>> there
> > >>> > >> >>would
> > >>> > >> >> >> >>be no
> > >>> > >> >> >> >> > > problem. And I also agree it is better if we can
> make
> > >>> > >>producer
> > >>> > >> >> >>block
> > >>> > >> >> >> >> when
> > >>> > >> >> >> >> > > close() is called from sender thread so user will
> > >>>notice
> > >>> > >> >> >>something
> > >>> > >> >> >> >>went
> > >>> > >> >> >> >> > > wrong.
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > > Thanks.
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > > Jiangjie (Becket) Qin
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
> > >>><jay.kreps@gmail.com <javascript:;>>
> > >>> > >> wrote:
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > > >Hey Jiangjie,
> > >>> > >> >> >> >> > > >
> > >>> > >> >> >> >> > > >I think this is going to be very confusing that
> > >>> > >> >> >> >> > > >  close(0) waits indefinitely and
> > >>> > >> >> >> >> > > >  close(-1) waits for 0.
> > >>> > >> >> >> >> > > >I understand this appears in other apis, but it
> is a
> > >>> > >>constant
> > >>> > >> >> >> >>cause of
> > >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
> > >>> > >> >> >> >> > > >
> > >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't need a
> way
> > >>>to
> > >>> > >>wait
> > >>> > >> >> >> >> indefinitely
> > >>> > >> >> >> >> > > >as
> > >>> > >> >> >> >> > > >we already have close() so having a magical
> constant
> > >>>for
> > >>> > >> >>that is
> > >>> > >> >> >> >> > > >redundant.
> > >>> > >> >> >> >> > > >
> > >>> > >> >> >> >> > > >Calling close() from the I/O thread was already
> > >>>possible
> > >>> > >>and
> > >>> > >> >> >>would
> > >>> > >> >> >> >> block
> > >>> > >> >> >> >> > > >indefinitely. I think trying to silently change
> the
> > >>> > >>behavior
> > >>> > >> >>is
> > >>> > >> >> >> >> probably
> > >>> > >> >> >> >> > > >not right. I.e. if the user calls close() in the
> > >>> callback
> > >>> > >> >>there
> > >>> > >> >> >>is
> > >>> > >> >> >> >> > > >actually
> > >>> > >> >> >> >> > > >some misunderstanding and they need to think more,
> > >>> > >>silently
> > >>> > >> >> >>making
> > >>> > >> >> >> >> this
> > >>> > >> >> >> >> > > >not
> > >>> > >> >> >> >> > > >block will hide the problem from them which is the
> > >>> > >>opposite
> > >>> > >> >>of
> > >>> > >> >> >> >>what we
> > >>> > >> >> >> >> > > >want.
> > >>> > >> >> >> >> > > >
> > >>> > >> >> >> >> > > >-Jay
> > >>> > >> >> >> >> > > >
> > >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> > >>> > >> >> >> >> <jqin@linkedin.com.invalid
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > > >wrote:
> > >>> > >> >> >> >> > > >
> > >>> > >> >> >> >> > > >> Hey Joe & Jay,
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> Thanks for the comments on the voting thread.
> > >>>Since it
> > >>> > >> >>seems
> > >>> > >> >> >>we
> > >>> > >> >> >> >> > probably
> > >>> > >> >> >> >> > > >> will have more discussion on this, I am just
> > >>>replying
> > >>> > >>from
> > >>> > >> >>the
> > >>> > >> >> >> >> > > >>discussion
> > >>> > >> >> >> >> > > >> thread here.
> > >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less like
> > >>> > >>half-baked,
> > >>> > >> >> >> >>apologize
> > >>> > >> >> >> >> > for
> > >>> > >> >> >> >> > > >> the rush...
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> The contract in current KIP is:
> > >>> > >> >> >> >> > > >>   1. close() - wait until all requests either
> are
> > >>>sent
> > >>> > >>or
> > >>> > >> >> >>reach
> > >>> > >> >> >> >> > request
> > >>> > >> >> >> >> > > >> timeout.
> > >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close
> > >>> > >>immediately
> > >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) -
> equivalent
> > >>>to
> > >>> > >> >>close(),
> > >>> > >> >> >> >>i.e.
> > >>> > >> >> >> >> > Wait
> > >>> > >> >> >> >> > > >> until all requests are sent or reach request
> > >>>timeout
> > >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the
> > >>>best to
> > >>> > >> >>finish
> > >>> > >> >> >> >> sending
> > >>> > >> >> >> >> > > >>in 5
> > >>> > >> >> >> >> > > >> milliseconds, if something went wrong, just
> > >>>shutdown
> > >>> the
> > >>> > >> >> >>producer
> > >>> > >> >> >> >> > > >>anyway,
> > >>> > >> >> >> >> > > >> my callback will handle the failures.
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> About how we define what timeout value stands
> for,
> > >>>I
> > >>> > >> >>actually
> > >>> > >> >> >> >> > struggled
> > >>> > >> >> >> >> > > >>a
> > >>> > >> >> >> >> > > >> little bit when wrote the patch. Intuitively,
> > >>>close(0)
> > >>> > >> >>should
> > >>> > >> >> >> >>mean
> > >>> > >> >> >> >> > > >> immediately, however it seems that all the
> > >>>existing
> > >>> java
> > >>> > >> >>class
> > >>> > >> >> >> >>have
> > >>> > >> >> >> >> > this
> > >>> > >> >> >> >> > > >> convention of timeout=0 means no timeout or
> never
> > >>> > >>timeout
> > >>> > >> >> >> >> > > >>(Thread.join(0),
> > >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is
> > >>>either we
> > >>> > >> >>follow
> > >>> > >> >> >>the
> > >>> > >> >> >> >> > > >> intuition or we follow the convention. What I
> > >>>chose is
> > >>> > >>to
> > >>> > >> >> >>follow
> > >>> > >> >> >> >>the
> > >>> > >> >> >> >> > > >> convention but document the interface to let
> user
> > >>>be
> > >>> > >>aware
> > >>> > >> >>of
> > >>> > >> >> >>the
> > >>> > >> >> >> >> > usage.
> > >>> > >> >> >> >> > > >> The reason is that I think producer.close() is a
> > >>> public
> > >>> > >> >> >> >>interface so
> > >>> > >> >> >> >> > it
> > >>> > >> >> >> >> > > >> might be better to follow java convention.
> Whereas
> > >>> > >> >>selector is
> > >>> > >> >> >> >>not a
> > >>> > >> >> >> >> > > >> public interface that used by user, so as long
> as
> > >>>it
> > >>> > >>makes
> > >>> > >> >> >>sense
> > >>> > >> >> >> >>to
> > >>> > >> >> >> >> > us,
> > >>> > >> >> >> >> > > >>it
> > >>> > >> >> >> >> > > >> is less a problem to be different from java
> > >>> convention.
> > >>> > >> >>That
> > >>> > >> >> >>said
> > >>> > >> >> >> >> > since
> > >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public
> interface,
> > >>>I
> > >>> > >>think
> > >>> > >> >>it
> > >>> > >> >> >> >>also
> > >>> > >> >> >> >> > makes
> > >>> > >> >> >> >> > > >> sense to make producer.close() to have the same
> > >>> > >>definition
> > >>> > >> >>of
> > >>> > >> >> >> >> > > >> consumer.poll(timeout).
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> The main argument for keeping a timeout in close
> > >>>would
> > >>> > >>be
> > >>> > >> >> >> >>separating
> > >>> > >> >> >> >> > the
> > >>> > >> >> >> >> > > >> close timeout from request timeout, which
> probably
> > >>> makes
> > >>> > >> >> >>sense. I
> > >>> > >> >> >> >> > would
> > >>> > >> >> >> >> > > >> guess typically the request timeout would be
> long
> > >>> (e.g.
> > >>> > >>60
> > >>> > >> >> >> >>seconds)
> > >>> > >> >> >> >> > > >> because we might want to consider retries with
> > >>>back
> > >>> off
> > >>> > >> >>time.
> > >>> > >> >> >>If
> > >>> > >> >> >> >>we
> > >>> > >> >> >> >> > have
> > >>> > >> >> >> >> > > >> multiple batches in accumulator, in worst case
> > >>>that
> > >>> > >>could
> > >>> > >> >>take
> > >>> > >> >> >> >>up to
> > >>> > >> >> >> >> > > >> several minutes to complete all the requests.
> But
> > >>>when
> > >>> > >>we
> > >>> > >> >> >>close a
> > >>> > >> >> >> >> > > >> producer, we might not want to wait for that
> long
> > >>>as
> > >>> it
> > >>> > >> >>might
> > >>> > >> >> >> >>cause
> > >>> > >> >> >> >> > some
> > >>> > >> >> >> >> > > >> other problem like deployment tool timeout.
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> There is also a subtle difference between
> > >>> close(timeout)
> > >>> > >> >>and
> > >>> > >> >> >> >> > > >> flush(timeout). The only purpose for flush() is
> to
> > >>> write
> > >>> > >> >>data
> > >>> > >> >> >>to
> > >>> > >> >> >> >>the
> > >>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait until
> > >>> request
> > >>> > >> >> >>timeout.
> > >>> > >> >> >> >>I
> > >>> > >> >> >> >> > think
> > >>> > >> >> >> >> > > >> that is why flush(timeout) looks strange. On the
> > >>>other
> > >>> > >> >>hand,
> > >>> > >> >> >>the
> > >>> > >> >> >> >>top
> > >>> > >> >> >> >> > > >> priority for close() is to close the producer
> > >>>rather
> > >>> > >>than
> > >>> > >> >> >>flush()
> > >>> > >> >> >> >> > data,
> > >>> > >> >> >> >> > > >>so
> > >>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded
> waiting
> > >>>for
> > >>> > >>its
> > >>> > >> >>main
> > >>> > >> >> >> >>job.
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose flag.
> It
> > >>>is
> > >>> > >>not a
> > >>> > >> >> >>public
> > >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed Changes
> > >>>section
> > >>> > >> >>which I
> > >>> > >> >> >> >> thought
> > >>> > >> >> >> >> > > >>was
> > >>> > >> >> >> >> > > >> supposed to provide implementation details.
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> Thanks again for all the comments and
> suggestions!
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
> > >>> jqin@linkedin.com <javascript:;>>
> > >>> > >> >> wrote:
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s
> comments.
> > >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process if no
> > >>>further
> > >>> > >> >> >>comments
> > >>> > >> >> >> >>are
> > >>> > >> >> >> >> > > >> >received by tomorrow.
> > >>> > >> >> >> >> > > >> >
> > >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
> > >>> > >> >> >> >> > > >> >
> > >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
> > >>><jay.kreps@gmail.com <javascript:;>
> > >>> >
> > >>> > >> >>wrote:
> > >>> > >> >> >> >> > > >> >
> > >>> > >> >> >> >> > > >> >>Hey Jiangjie,
> > >>> > >> >> >> >> > > >> >>
> > >>> > >> >> >> >> > > >> >>Can you capture the full motivation and use
> > >>>cases
> > >>> for
> > >>> > >>the
> > >>> > >> >> >> >>feature?
> > >>> > >> >> >> >> > > >>This
> > >>> > >> >> >> >> > > >> >>mentions your interest in having a way of
> > >>>aborting
> > >>> > >>from
> > >>> > >> >> >>inside
> > >>> > >> >> >> >>the
> > >>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain that
> > >>>usage
> > >>> or
> > >>> > >>why
> > >>> > >> >> >>other
> > >>> > >> >> >> >> > people
> > >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't list
> the
> > >>> > >>primary
> > >>> > >> >>use
> > >>> > >> >> >> >>case
> > >>> > >> >> >> >> for
> > >>> > >> >> >> >> > > >> >>having
> > >>> > >> >> >> >> > > >> >>close with a bounded timeout which was to
> avoid
> > >>> > >>blocking
> > >>> > >> >>too
> > >>> > >> >> >> >>long
> > >>> > >> >> >> >> on
> > >>> > >> >> >> >> > > >> >>shutdown.
> > >>> > >> >> >> >> > > >> >>
> > >>> > >> >> >> >> > > >> >>-Jay
> > >>> > >> >> >> >> > > >> >>
> > >>> > >> >> >> >> > > >> >>
> > >>> > >> >> >> >> > > >> >>
> > >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> > >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
> > >>> > >> >> >> >> > > >> >
> > >>> > >> >> >> >> > > >> >>wrote:
> > >>> > >> >> >> >> > > >> >>
> > >>> > >> >> >> >> > > >> >>> Hi,
> > >>> > >> >> >> >> > > >> >>>
> > >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
> > >>>close(timeout)
> > >>> to
> > >>> > >>new
> > >>> > >> >> >> >> producer.
> > >>> > >> >> >> >> > > >>Most
> > >>> > >> >> >> >> > > >> >>>of
> > >>> > >> >> >> >> > > >> >>> the previous discussions are in KAFKA-1660
> > >>>where
> > >>> > >>Parth
> > >>> > >> >> >> >> Brahmbhatt
> > >>> > >> >> >> >> > > >>has
> > >>> > >> >> >> >> > > >> >>> already done a lot of work.
> > >>> > >> >> >> >> > > >> >>> Since this is an interface change so we are
> > >>>going
> > >>> > >> >>through
> > >>> > >> >> >>the
> > >>> > >> >> >> >> KIP
> > >>> > >> >> >> >> > > >> >>>process.
> > >>> > >> >> >> >> > > >> >>> Here is the KIP link:
> > >>> > >> >> >> >> > > >> >>>
> > >>> > >> >> >> >> > > >> >>>
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >>
> > >>> > >> >> >>
> > >>> > >> >>
> > >>> > >>
> > >>> >
> > >>>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> > >>> > >> >> >> >> > > >> >>>2
> > >>> > >> >> >> >> > > >> >>>
> > >>> > >> >> >> >> > > >> >>> Thanks.
> > >>> > >> >> >> >> > > >> >>>
> > >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> > >>> > >> >> >> >> > > >> >>>
> > >>> > >> >> >> >> > > >> >
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >>
> > >>> > >> >> >> >>
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> --
> > >>> > >> >> >> >> Thanks,
> > >>> > >> >> >> >> Neha
> > >>> > >> >> >> >>
> > >>> > >> >> >> >
> > >>> > >> >> >> >
> > >>> > >> >> >> >
> > >>> > >> >> >> >--
> > >>> > >> >> >> >-- Guozhang
> > >>> > >> >> >>
> > >>> > >> >> >>
> > >>> > >> >> >
> > >>> > >> >> >
> > >>> > >> >> >--
> > >>> > >> >> >-- Guozhang
> > >>> > >> >>
> > >>> > >> >>
> > >>> > >> >
> > >>> > >> >
> > >>> > >> >--
> > >>> > >> >-- Guozhang
> > >>> > >>
> > >>> > >>
> > >>> >
> > >>> >
> > >>>
> > >>
> > >>
> > >>
> > >>--
> > >>-- Guozhang
> > >
> >
> >
>


-- 
Sent from Gmail Mobile

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jun Rao <ju...@confluent.io>.

So in (1), if a close() or close(timeout) is called from a callback, we
will just turn that into a close(0)? Implementation wise, how do we know
whether a close() call is made from the sender thread or not?

Thanks,

Jun

On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> It looks we have another option and are now deciding between the following
> two interfaces:
>
> 1. Close() + close(timeout)
>   - timeout could be either positive or zero.
>   - only close(0) can be called from sender thread
>
> 2. Close() + abort() + close(timeout)
>   - timeout can either be positive or zero
>   - only abort() can be called from sender thread
>
>   - abort() is equivalent to close(0) in 1) but does not join sender
> thread and does not close metrics.
>   - Another thread has to call close() or close(timeout) in order to make
> sure the resources in producer are gone.
>
> The tow approach provides the same function we need, the difference is
> approach 2) follows convention of close() and abort(). On the other hand,
> approach 1) saves one interface compared with approach 2) but does not
> follow the convention.
>
> When the two approaches come to user code, it is probably something like
> this:
>
> Try {
>   While(!finished)
>     Producer.send(record, callback)
> } catch (Exception e) {
>   Producer.close(5)
> }
>
> Class CallbackImpl implements Callback {
>   onCompletion(RecordMetadata metadata Exception e) {
>     If (e != null)
>       Abort() / close()
>   }
> }
>
> Because the two approach leads to almost the same user code, assuming
> users are always calling producer.close() as a clean up step, personally I
> prefer approach 2) as it follows convention.
>
> Any thoughts?
>
> Jiangjie (Becket) Qin
>
>
> On 3/17/15, 10:25 AM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>
> >Hi Jun,
> >
> >Yes, as Guozhang said, the main reason we set a flag is because close(0)
> >is expected to be called by sender thread itself.
> >If we want to maintain the semantic meaning of close(), one alternative is
> >to have an abort() method does the same thing as close(0) except cleanup.
> >And in close(timeout), after timeout we call abort() and join the sender
> >thread. This was one of the previous proposal. We merged abort to close(0)
> >because they are almost doing the same thing. But from what you mentioned,
> >it might make sense to have two separate methods.
> >
> >Thanks.
> >
> >Jiangjie (Becket) Qin
> >
> >On 3/16/15, 10:31 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
> >
> >>Yeah in this sense the sender thread will not exist immediately in the
> >>close(0) call, but will only terminate after the current response batch
> >>has
> >>been processed, as will the producer instance itself.
> >>
> >>There is a reason for this though: for a clean shutdown the caller thread
> >>has to wait for the sender thread to join before closing the producer
> >>instance, but this cannot be achieve if close(0) is called by the sender
> >>thread itself (for example in KAFKA-1659, there is a proposal from Andrew
> >>Stein on using thread.interrupt and thread.stop, but if it is called by
> >>the
> >>ioThread itself the stop call will fail). Hence we came up with the flag
> >>approach to let the sender thread to close as soon as it is at the
> >>barrier
> >>of the run loop.
> >>
> >>Guozhang
> >>
> >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <ju...@confluent.io> wrote:
> >>
> >>> Hmm, does that mean that after close(0), the sender thread is not
> >>>necessary
> >>> gone? Normally, after closing an entity, we expect all internal threads
> >>> associated with the entity are shut down completely.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
> >>><jq...@linkedin.com.invalid>
> >>> wrote:
> >>>
> >>> > Hi Jun,
> >>> >
> >>> > Close(0) will set two flags in sender. Running=false and a newly
> >>>added
> >>> > forceClose=true. It will also set accumulator.closed=true so no
> >>>further
> >>> > producer.send() will succeed.
> >>> > The sender thread will finish executing all the callbacks in current
> >>> batch
> >>> > of responses, then it will see the forceClose flag. It will just fail
> >>>all
> >>> > the incomplete batches in the producer and exit.
> >>> > So close(0) is a non-blocking call and sender thread will not try to
> >>>join
> >>> > itself in close(0).
> >>> >
> >>> > Thanks.
> >>> >
> >>> > Jiangjie (Becket) Qin
> >>> >
> >>> > On 3/16/15, 2:50 PM, "Jun Rao" <ju...@confluent.io> wrote:
> >>> >
> >>> > >How does close(0) work if it's called from the sender thread? If
> >>> close(0)
> >>> > >needs to wait for the sender thread to join, wouldn't this cause a
> >>> > >deadlock?
> >>> > >
> >>> > >Thanks,
> >>> > >
> >>> > >Jun
> >>> > >
> >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
> >>><jqin@linkedin.com.invalid
> >>> >
> >>> > >wrote:
> >>> > >
> >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered without
> >>> > >> discussing with you :)
> >>> > >>
> >>> > >> Jiangjie (Becket) Qin
> >>> > >>
> >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
> >>> > >>
> >>> > >> >Thanks Jiangjie,
> >>> > >> >
> >>> > >> >After talking to you offline on this, I have been convinced and
> >>> > >>changed my
> >>> > >> >preference to blocking. The immediate shutdown approach does have
> >>> some
> >>> > >> >unsafeness in some cases.
> >>> > >> >
> >>> > >> >Guozhang
> >>> > >> >
> >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
> >>> > >><jqin@linkedin.com.invalid
> >>> > >> >
> >>> > >> >wrote:
> >>> > >> >
> >>> > >> >> It looks that the problem we want to solve and the purpose we
> >>>want
> >>> to
> >>> > >> >> achieve is:
> >>> > >> >> If user uses close() in callback, we want to let user be aware
> >>>that
> >>> > >>they
> >>> > >> >> should use close(0) instead of close() in the callback.
> >>> > >> >>
> >>> > >> >> We have agreed that we will have an error log to inform user
> >>>about
> >>> > >>this
> >>> > >> >> mis-usage. The options differ in the way how we can force user
> >>>to
> >>> > >>take a
> >>> > >> >> look at that error log.
> >>> > >> >> There are two scenarios:
> >>> > >> >> 1. User does not expect the program to exit.
> >>> > >> >> 2. User expect the program to exit.
> >>> > >> >>
> >>> > >> >> For scenario 1), blocking will probably delay the discovery of
> >>>the
> >>> > >> >> problem. Calling close(0) exposes the problem quicker. In this
> >>> > >>scenario
> >>> > >> >> producer just encounter a send failure when running normally.
> >>> > >> >> For scenario 2), blocking will expose the problem quick.
> >>>Calling
> >>> > >> >>close(-1)
> >>> > >> >> might hide the problem. This scenario might include: a) Unit
> >>>test
> >>> > >>for a
> >>> > >> >> send failure. b) Message sending during a close() call from a
> >>>user
> >>> > >> >>thread.
> >>> > >> >>
> >>> > >> >> So as a summary table:
> >>> > >> >>
> >>> > >> >>                   Scenario 1)                         Scenario
> >>>2)
> >>> > >> >>
> >>> > >> >> Blocking      Delay problem discovery         Guaranteed
> >>>problem
> >>> > >> >>discovery
> >>> > >> >>
> >>> > >> >> Close(-1)     Immediate problem discovery     Problem might be
> >>> hidden
> >>> > >> >>
> >>> > >> >>
> >>> > >> >> Personally I prefer blocking because it seems providing more
> >>> > >>guarantees
> >>> > >> >> and safer.
> >>> > >> >>
> >>> > >> >> Thanks.
> >>> > >> >>
> >>> > >> >> Jiangjie (Becket) Qin
> >>> > >> >>
> >>> > >> >>
> >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wa...@gmail.com>
> >>>wrote:
> >>> > >> >>
> >>> > >> >> >HI Jiangjie,
> >>> > >> >> >
> >>> > >> >> >As far as I understand calling close() in the ioThread is not
> >>> > >>common,
> >>> > >> >>as
> >>> > >> >> >it
> >>> > >> >> >may only trigger when we saw some non-retriable error. Hence
> >>>when
> >>> > >>user
> >>> > >> >>run
> >>> > >> >> >their program it is unlikely that close() will be triggered
> >>>and
> >>> > >>problem
> >>> > >> >> >will be detected. So it seems to me that from the error
> >>>detection
> >>> > >> >>aspect
> >>> > >> >> >these two options seems to be the same as people will usually
> >>> > >>detect it
> >>> > >> >> >from the producer metrics all dropping to 0.
> >>> > >> >> >
> >>> > >> >> >Guozhang
> >>> > >> >> >
> >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> >>> > >> >><jq...@linkedin.com.invalid>
> >>> > >> >> >wrote:
> >>> > >> >> >
> >>> > >> >> >> It seems there are two options we can choose from when
> >>>close()
> >>> is
> >>> > >> >>called
> >>> > >> >> >> from sender thread (callback):
> >>> > >> >> >> 1. Log an error and close the producer using close(-1)
> >>> > >> >> >> 2. Log an error and block.
> >>> > >> >> >> (Throwing an exception will not work because we catch all
> >>>the
> >>> > >> >>exception
> >>> > >> >> >> thrown from user callback. It will just lead to an error
> >>>log.)
> >>> > >> >> >>
> >>> > >> >> >> My concern for the first option is that the producer will be
> >>> > >>closed
> >>> > >> >>even
> >>> > >> >> >> if we logged and error. I am wondering if some user would
> >>>not
> >>> even
> >>> > >> >>take
> >>> > >> >> >>a
> >>> > >> >> >> look at the log if producer is closed normally. Because from
> >>>the
> >>> > >> >> >>programs
> >>> > >> >> >> behavior, everything looks good. If that is the case, the
> >>>error
> >>> > >> >>message
> >>> > >> >> >>we
> >>> > >> >> >> logged probably will just be ignored until some day when
> >>>people
> >>> > >>check
> >>> > >> >> >>the
> >>> > >> >> >> log and see it.
> >>> > >> >> >>
> >>> > >> >> >> As for the second option, because producer does not close
> >>>but
> >>> > >>blocks.
> >>> > >> >> >>User
> >>> > >> >> >> will notice this the first time they run the program. They
> >>> > >>probably
> >>> > >> >>will
> >>> > >> >> >> look at the log to see why producer could not be closed and
> >>>they
> >>> > >>will
> >>> > >> >> >>see
> >>> > >> >> >> the error log we put there. So they will get informed about
> >>>this
> >>> > >> >> >>mis-usage
> >>> > >> >> >> of close() in sender thread the first time they run the code
> >>> > >>instead
> >>> > >> >>of
> >>> > >> >> >> some time later.
> >>> > >> >> >>
> >>> > >> >> >> Personally I prefer the second one because it is more
> >>>obvious
> >>> that
> >>> > >> >> >> something was wrong.
> >>> > >> >> >>
> >>> > >> >> >> Jiangjie (Becket) Qin
> >>> > >> >> >>
> >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com>
> >>> wrote:
> >>> > >> >> >>
> >>> > >> >> >> >Yeah I agree we should not silently change the behavior of
> >>>the
> >>> > >> >>function
> >>> > >> >> >> >with the given parameters; and I would prefer
> >>> > >> >> >>error-logging-and-shutdown
> >>> > >> >> >> >over blocking when close(>0) is used, since as Neha
> >>>suggested
> >>> > >> >>blocking
> >>> > >> >> >> >would also not proceed with sending any data, bu will just
> >>>let
> >>> > >> >>users to
> >>> > >> >> >> >realize the issue later than sooner.
> >>> > >> >> >> >
> >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
> >>> > >><ne...@confluent.io>
> >>> > >> >> >>wrote:
> >>> > >> >> >> >
> >>> > >> >> >> >> >
> >>> > >> >> >> >> > And I also agree it is better if we can make producer
> >>>block
> >>> > >>when
> >>> > >> >> >> >> > close() is called from sender thread so user will
> >>>notice
> >>> > >> >>something
> >>> > >> >> >> >>went
> >>> > >> >> >> >> > wrong.
> >>> > >> >> >> >>
> >>> > >> >> >> >>
> >>> > >> >> >> >> This isn't a great experience either. Why can't we just
> >>>throw
> >>> > >>an
> >>> > >> >> >> >>exception
> >>> > >> >> >> >> for a behavior we know is incorrect and we'd like the
> >>>user to
> >>> > >> >>know.
> >>> > >> >> >> >> Blocking as a means of doing that seems wrong and
> >>>annoying.
> >>> > >> >> >> >>
> >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
> >>> > >><ja...@gmail.com>
> >>> > >> >> >> wrote:
> >>> > >> >> >> >>
> >>> > >> >> >> >> > Cool.
> >>> > >> >> >> >> >
> >>> > >> >> >> >> > I think blocking is good or alternately throwing an
> >>> exception
> >>> > >> >> >>directly
> >>> > >> >> >> >> from
> >>> > >> >> >> >> > close(). Basically I would just worry about subtly
> >>>doing
> >>> > >> >>something
> >>> > >> >> >> >> slightly
> >>> > >> >> >> >> > different from what the user asked for as it will be
> >>>hard
> >>> to
> >>> > >> >>notice
> >>> > >> >> >> >>that
> >>> > >> >> >> >> > behavior difference.
> >>> > >> >> >> >> >
> >>> > >> >> >> >> > -Jay
> >>> > >> >> >> >> >
> >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> >>> > >> >> >> >><jqin@linkedin.com.invalid
> >>> > >> >> >> >> >
> >>> > >> >> >> >> > wrote:
> >>> > >> >> >> >> >
> >>> > >> >> >> >> > > Hi Jay,
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > > I have modified the KIP as you suggested. I thinks as
> >>> long
> >>> > >>as
> >>> > >> >>we
> >>> > >> >> >> >>have
> >>> > >> >> >> >> > > consistent define for timeout across Kafka interface,
> >>> there
> >>> > >> >>would
> >>> > >> >> >> >>be no
> >>> > >> >> >> >> > > problem. And I also agree it is better if we can make
> >>> > >>producer
> >>> > >> >> >>block
> >>> > >> >> >> >> when
> >>> > >> >> >> >> > > close() is called from sender thread so user will
> >>>notice
> >>> > >> >> >>something
> >>> > >> >> >> >>went
> >>> > >> >> >> >> > > wrong.
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > > Thanks.
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > > Jiangjie (Becket) Qin
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
> >>><ja...@gmail.com>
> >>> > >> wrote:
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > > >Hey Jiangjie,
> >>> > >> >> >> >> > > >
> >>> > >> >> >> >> > > >I think this is going to be very confusing that
> >>> > >> >> >> >> > > >  close(0) waits indefinitely and
> >>> > >> >> >> >> > > >  close(-1) waits for 0.
> >>> > >> >> >> >> > > >I understand this appears in other apis, but it is a
> >>> > >>constant
> >>> > >> >> >> >>cause of
> >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
> >>> > >> >> >> >> > > >
> >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't need a way
> >>>to
> >>> > >>wait
> >>> > >> >> >> >> indefinitely
> >>> > >> >> >> >> > > >as
> >>> > >> >> >> >> > > >we already have close() so having a magical constant
> >>>for
> >>> > >> >>that is
> >>> > >> >> >> >> > > >redundant.
> >>> > >> >> >> >> > > >
> >>> > >> >> >> >> > > >Calling close() from the I/O thread was already
> >>>possible
> >>> > >>and
> >>> > >> >> >>would
> >>> > >> >> >> >> block
> >>> > >> >> >> >> > > >indefinitely. I think trying to silently change the
> >>> > >>behavior
> >>> > >> >>is
> >>> > >> >> >> >> probably
> >>> > >> >> >> >> > > >not right. I.e. if the user calls close() in the
> >>> callback
> >>> > >> >>there
> >>> > >> >> >>is
> >>> > >> >> >> >> > > >actually
> >>> > >> >> >> >> > > >some misunderstanding and they need to think more,
> >>> > >>silently
> >>> > >> >> >>making
> >>> > >> >> >> >> this
> >>> > >> >> >> >> > > >not
> >>> > >> >> >> >> > > >block will hide the problem from them which is the
> >>> > >>opposite
> >>> > >> >>of
> >>> > >> >> >> >>what we
> >>> > >> >> >> >> > > >want.
> >>> > >> >> >> >> > > >
> >>> > >> >> >> >> > > >-Jay
> >>> > >> >> >> >> > > >
> >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> >>> > >> >> >> >> <jqin@linkedin.com.invalid
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > > >wrote:
> >>> > >> >> >> >> > > >
> >>> > >> >> >> >> > > >> Hey Joe & Jay,
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> Thanks for the comments on the voting thread.
> >>>Since it
> >>> > >> >>seems
> >>> > >> >> >>we
> >>> > >> >> >> >> > probably
> >>> > >> >> >> >> > > >> will have more discussion on this, I am just
> >>>replying
> >>> > >>from
> >>> > >> >>the
> >>> > >> >> >> >> > > >>discussion
> >>> > >> >> >> >> > > >> thread here.
> >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less like
> >>> > >>half-baked,
> >>> > >> >> >> >>apologize
> >>> > >> >> >> >> > for
> >>> > >> >> >> >> > > >> the rush...
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> The contract in current KIP is:
> >>> > >> >> >> >> > > >>   1. close() - wait until all requests either are
> >>>sent
> >>> > >>or
> >>> > >> >> >>reach
> >>> > >> >> >> >> > request
> >>> > >> >> >> >> > > >> timeout.
> >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close
> >>> > >>immediately
> >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent
> >>>to
> >>> > >> >>close(),
> >>> > >> >> >> >>i.e.
> >>> > >> >> >> >> > Wait
> >>> > >> >> >> >> > > >> until all requests are sent or reach request
> >>>timeout
> >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the
> >>>best to
> >>> > >> >>finish
> >>> > >> >> >> >> sending
> >>> > >> >> >> >> > > >>in 5
> >>> > >> >> >> >> > > >> milliseconds, if something went wrong, just
> >>>shutdown
> >>> the
> >>> > >> >> >>producer
> >>> > >> >> >> >> > > >>anyway,
> >>> > >> >> >> >> > > >> my callback will handle the failures.
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> About how we define what timeout value stands for,
> >>>I
> >>> > >> >>actually
> >>> > >> >> >> >> > struggled
> >>> > >> >> >> >> > > >>a
> >>> > >> >> >> >> > > >> little bit when wrote the patch. Intuitively,
> >>>close(0)
> >>> > >> >>should
> >>> > >> >> >> >>mean
> >>> > >> >> >> >> > > >> immediately, however it seems that all the
> >>>existing
> >>> java
> >>> > >> >>class
> >>> > >> >> >> >>have
> >>> > >> >> >> >> > this
> >>> > >> >> >> >> > > >> convention of timeout=0 means no timeout or never
> >>> > >>timeout
> >>> > >> >> >> >> > > >>(Thread.join(0),
> >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is
> >>>either we
> >>> > >> >>follow
> >>> > >> >> >>the
> >>> > >> >> >> >> > > >> intuition or we follow the convention. What I
> >>>chose is
> >>> > >>to
> >>> > >> >> >>follow
> >>> > >> >> >> >>the
> >>> > >> >> >> >> > > >> convention but document the interface to let user
> >>>be
> >>> > >>aware
> >>> > >> >>of
> >>> > >> >> >>the
> >>> > >> >> >> >> > usage.
> >>> > >> >> >> >> > > >> The reason is that I think producer.close() is a
> >>> public
> >>> > >> >> >> >>interface so
> >>> > >> >> >> >> > it
> >>> > >> >> >> >> > > >> might be better to follow java convention. Whereas
> >>> > >> >>selector is
> >>> > >> >> >> >>not a
> >>> > >> >> >> >> > > >> public interface that used by user, so as long as
> >>>it
> >>> > >>makes
> >>> > >> >> >>sense
> >>> > >> >> >> >>to
> >>> > >> >> >> >> > us,
> >>> > >> >> >> >> > > >>it
> >>> > >> >> >> >> > > >> is less a problem to be different from java
> >>> convention.
> >>> > >> >>That
> >>> > >> >> >>said
> >>> > >> >> >> >> > since
> >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public interface,
> >>>I
> >>> > >>think
> >>> > >> >>it
> >>> > >> >> >> >>also
> >>> > >> >> >> >> > makes
> >>> > >> >> >> >> > > >> sense to make producer.close() to have the same
> >>> > >>definition
> >>> > >> >>of
> >>> > >> >> >> >> > > >> consumer.poll(timeout).
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> The main argument for keeping a timeout in close
> >>>would
> >>> > >>be
> >>> > >> >> >> >>separating
> >>> > >> >> >> >> > the
> >>> > >> >> >> >> > > >> close timeout from request timeout, which probably
> >>> makes
> >>> > >> >> >>sense. I
> >>> > >> >> >> >> > would
> >>> > >> >> >> >> > > >> guess typically the request timeout would be long
> >>> (e.g.
> >>> > >>60
> >>> > >> >> >> >>seconds)
> >>> > >> >> >> >> > > >> because we might want to consider retries with
> >>>back
> >>> off
> >>> > >> >>time.
> >>> > >> >> >>If
> >>> > >> >> >> >>we
> >>> > >> >> >> >> > have
> >>> > >> >> >> >> > > >> multiple batches in accumulator, in worst case
> >>>that
> >>> > >>could
> >>> > >> >>take
> >>> > >> >> >> >>up to
> >>> > >> >> >> >> > > >> several minutes to complete all the requests. But
> >>>when
> >>> > >>we
> >>> > >> >> >>close a
> >>> > >> >> >> >> > > >> producer, we might not want to wait for that long
> >>>as
> >>> it
> >>> > >> >>might
> >>> > >> >> >> >>cause
> >>> > >> >> >> >> > some
> >>> > >> >> >> >> > > >> other problem like deployment tool timeout.
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> There is also a subtle difference between
> >>> close(timeout)
> >>> > >> >>and
> >>> > >> >> >> >> > > >> flush(timeout). The only purpose for flush() is to
> >>> write
> >>> > >> >>data
> >>> > >> >> >>to
> >>> > >> >> >> >>the
> >>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait until
> >>> request
> >>> > >> >> >>timeout.
> >>> > >> >> >> >>I
> >>> > >> >> >> >> > think
> >>> > >> >> >> >> > > >> that is why flush(timeout) looks strange. On the
> >>>other
> >>> > >> >>hand,
> >>> > >> >> >>the
> >>> > >> >> >> >>top
> >>> > >> >> >> >> > > >> priority for close() is to close the producer
> >>>rather
> >>> > >>than
> >>> > >> >> >>flush()
> >>> > >> >> >> >> > data,
> >>> > >> >> >> >> > > >>so
> >>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded waiting
> >>>for
> >>> > >>its
> >>> > >> >>main
> >>> > >> >> >> >>job.
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose flag. It
> >>>is
> >>> > >>not a
> >>> > >> >> >>public
> >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed Changes
> >>>section
> >>> > >> >>which I
> >>> > >> >> >> >> thought
> >>> > >> >> >> >> > > >>was
> >>> > >> >> >> >> > > >> supposed to provide implementation details.
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> Thanks again for all the comments and suggestions!
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
> >>> jqin@linkedin.com>
> >>> > >> >> wrote:
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s comments.
> >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process if no
> >>>further
> >>> > >> >> >>comments
> >>> > >> >> >> >>are
> >>> > >> >> >> >> > > >> >received by tomorrow.
> >>> > >> >> >> >> > > >> >
> >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
> >>> > >> >> >> >> > > >> >
> >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
> >>><jay.kreps@gmail.com
> >>> >
> >>> > >> >>wrote:
> >>> > >> >> >> >> > > >> >
> >>> > >> >> >> >> > > >> >>Hey Jiangjie,
> >>> > >> >> >> >> > > >> >>
> >>> > >> >> >> >> > > >> >>Can you capture the full motivation and use
> >>>cases
> >>> for
> >>> > >>the
> >>> > >> >> >> >>feature?
> >>> > >> >> >> >> > > >>This
> >>> > >> >> >> >> > > >> >>mentions your interest in having a way of
> >>>aborting
> >>> > >>from
> >>> > >> >> >>inside
> >>> > >> >> >> >>the
> >>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain that
> >>>usage
> >>> or
> >>> > >>why
> >>> > >> >> >>other
> >>> > >> >> >> >> > people
> >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't list the
> >>> > >>primary
> >>> > >> >>use
> >>> > >> >> >> >>case
> >>> > >> >> >> >> for
> >>> > >> >> >> >> > > >> >>having
> >>> > >> >> >> >> > > >> >>close with a bounded timeout which was to avoid
> >>> > >>blocking
> >>> > >> >>too
> >>> > >> >> >> >>long
> >>> > >> >> >> >> on
> >>> > >> >> >> >> > > >> >>shutdown.
> >>> > >> >> >> >> > > >> >>
> >>> > >> >> >> >> > > >> >>-Jay
> >>> > >> >> >> >> > > >> >>
> >>> > >> >> >> >> > > >> >>
> >>> > >> >> >> >> > > >> >>
> >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
> >>> > >> >> >> >> > > >> >
> >>> > >> >> >> >> > > >> >>wrote:
> >>> > >> >> >> >> > > >> >>
> >>> > >> >> >> >> > > >> >>> Hi,
> >>> > >> >> >> >> > > >> >>>
> >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
> >>>close(timeout)
> >>> to
> >>> > >>new
> >>> > >> >> >> >> producer.
> >>> > >> >> >> >> > > >>Most
> >>> > >> >> >> >> > > >> >>>of
> >>> > >> >> >> >> > > >> >>> the previous discussions are in KAFKA-1660
> >>>where
> >>> > >>Parth
> >>> > >> >> >> >> Brahmbhatt
> >>> > >> >> >> >> > > >>has
> >>> > >> >> >> >> > > >> >>> already done a lot of work.
> >>> > >> >> >> >> > > >> >>> Since this is an interface change so we are
> >>>going
> >>> > >> >>through
> >>> > >> >> >>the
> >>> > >> >> >> >> KIP
> >>> > >> >> >> >> > > >> >>>process.
> >>> > >> >> >> >> > > >> >>> Here is the KIP link:
> >>> > >> >> >> >> > > >> >>>
> >>> > >> >> >> >> > > >> >>>
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > >
> >>> > >> >> >> >>
> >>> > >> >> >>
> >>> > >> >>
> >>> > >>
> >>> >
> >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >>> > >> >> >> >> > > >> >>>2
> >>> > >> >> >> >> > > >> >>>
> >>> > >> >> >> >> > > >> >>> Thanks.
> >>> > >> >> >> >> > > >> >>>
> >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> >>> > >> >> >> >> > > >> >>>
> >>> > >> >> >> >> > > >> >
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> >
> >>> > >> >> >> >>
> >>> > >> >> >> >>
> >>> > >> >> >> >>
> >>> > >> >> >> >> --
> >>> > >> >> >> >> Thanks,
> >>> > >> >> >> >> Neha
> >>> > >> >> >> >>
> >>> > >> >> >> >
> >>> > >> >> >> >
> >>> > >> >> >> >
> >>> > >> >> >> >--
> >>> > >> >> >> >-- Guozhang
> >>> > >> >> >>
> >>> > >> >> >>
> >>> > >> >> >
> >>> > >> >> >
> >>> > >> >> >--
> >>> > >> >> >-- Guozhang
> >>> > >> >>
> >>> > >> >>
> >>> > >> >
> >>> > >> >
> >>> > >> >--
> >>> > >> >-- Guozhang
> >>> > >>
> >>> > >>
> >>> >
> >>> >
> >>>
> >>
> >>
> >>
> >>--
> >>-- Guozhang
> >
>
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Neha Narkhede <ne...@confluent.io>.

I'm in favor of (1) for the sake of simplicity and as Jay mentions to
reduce the number of different APIs. Can you explain when (1) does not work?

On Wed, Mar 18, 2015 at 2:52 PM, Jay Kreps <ja...@gmail.com> wrote:

> Personally I'm in favor of (1) just to reduce the number of different APIs.
> People will find the difference between abort and close subtle and
> confusing and the only instance where you want it is this somewhat unusual
> case you guys are pursuing, right?
>
> -Jay
>
> On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
> wrote:
>
> > It looks we have another option and are now deciding between the
> following
> > two interfaces:
> >
> > 1. Close() + close(timeout)
> >   - timeout could be either positive or zero.
> >   - only close(0) can be called from sender thread
> >
> > 2. Close() + abort() + close(timeout)
> >   - timeout can either be positive or zero
> >   - only abort() can be called from sender thread
> >
> >   - abort() is equivalent to close(0) in 1) but does not join sender
> > thread and does not close metrics.
> >   - Another thread has to call close() or close(timeout) in order to make
> > sure the resources in producer are gone.
> >
> > The tow approach provides the same function we need, the difference is
> > approach 2) follows convention of close() and abort(). On the other hand,
> > approach 1) saves one interface compared with approach 2) but does not
> > follow the convention.
> >
> > When the two approaches come to user code, it is probably something like
> > this:
> >
> > Try {
> >   While(!finished)
> >     Producer.send(record, callback)
> > } catch (Exception e) {
> >   Producer.close(5)
> > }
> >
> > Class CallbackImpl implements Callback {
> >   onCompletion(RecordMetadata metadata Exception e) {
> >     If (e != null)
> >       Abort() / close()
> >   }
> > }
> >
> > Because the two approach leads to almost the same user code, assuming
> > users are always calling producer.close() as a clean up step, personally
> I
> > prefer approach 2) as it follows convention.
> >
> > Any thoughts?
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On 3/17/15, 10:25 AM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
> >
> > >Hi Jun,
> > >
> > >Yes, as Guozhang said, the main reason we set a flag is because close(0)
> > >is expected to be called by sender thread itself.
> > >If we want to maintain the semantic meaning of close(), one alternative
> is
> > >to have an abort() method does the same thing as close(0) except
> cleanup.
> > >And in close(timeout), after timeout we call abort() and join the sender
> > >thread. This was one of the previous proposal. We merged abort to
> close(0)
> > >because they are almost doing the same thing. But from what you
> mentioned,
> > >it might make sense to have two separate methods.
> > >
> > >Thanks.
> > >
> > >Jiangjie (Becket) Qin
> > >
> > >On 3/16/15, 10:31 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
> > >
> > >>Yeah in this sense the sender thread will not exist immediately in the
> > >>close(0) call, but will only terminate after the current response batch
> > >>has
> > >>been processed, as will the producer instance itself.
> > >>
> > >>There is a reason for this though: for a clean shutdown the caller
> thread
> > >>has to wait for the sender thread to join before closing the producer
> > >>instance, but this cannot be achieve if close(0) is called by the
> sender
> > >>thread itself (for example in KAFKA-1659, there is a proposal from
> Andrew
> > >>Stein on using thread.interrupt and thread.stop, but if it is called by
> > >>the
> > >>ioThread itself the stop call will fail). Hence we came up with the
> flag
> > >>approach to let the sender thread to close as soon as it is at the
> > >>barrier
> > >>of the run loop.
> > >>
> > >>Guozhang
> > >>
> > >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <ju...@confluent.io> wrote:
> > >>
> > >>> Hmm, does that mean that after close(0), the sender thread is not
> > >>>necessary
> > >>> gone? Normally, after closing an entity, we expect all internal
> threads
> > >>> associated with the entity are shut down completely.
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Jun
> > >>>
> > >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
> > >>><jq...@linkedin.com.invalid>
> > >>> wrote:
> > >>>
> > >>> > Hi Jun,
> > >>> >
> > >>> > Close(0) will set two flags in sender. Running=false and a newly
> > >>>added
> > >>> > forceClose=true. It will also set accumulator.closed=true so no
> > >>>further
> > >>> > producer.send() will succeed.
> > >>> > The sender thread will finish executing all the callbacks in
> current
> > >>> batch
> > >>> > of responses, then it will see the forceClose flag. It will just
> fail
> > >>>all
> > >>> > the incomplete batches in the producer and exit.
> > >>> > So close(0) is a non-blocking call and sender thread will not try
> to
> > >>>join
> > >>> > itself in close(0).
> > >>> >
> > >>> > Thanks.
> > >>> >
> > >>> > Jiangjie (Becket) Qin
> > >>> >
> > >>> > On 3/16/15, 2:50 PM, "Jun Rao" <ju...@confluent.io> wrote:
> > >>> >
> > >>> > >How does close(0) work if it's called from the sender thread? If
> > >>> close(0)
> > >>> > >needs to wait for the sender thread to join, wouldn't this cause a
> > >>> > >deadlock?
> > >>> > >
> > >>> > >Thanks,
> > >>> > >
> > >>> > >Jun
> > >>> > >
> > >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
> > >>><jqin@linkedin.com.invalid
> > >>> >
> > >>> > >wrote:
> > >>> > >
> > >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered without
> > >>> > >> discussing with you :)
> > >>> > >>
> > >>> > >> Jiangjie (Becket) Qin
> > >>> > >>
> > >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wa...@gmail.com>
> wrote:
> > >>> > >>
> > >>> > >> >Thanks Jiangjie,
> > >>> > >> >
> > >>> > >> >After talking to you offline on this, I have been convinced and
> > >>> > >>changed my
> > >>> > >> >preference to blocking. The immediate shutdown approach does
> have
> > >>> some
> > >>> > >> >unsafeness in some cases.
> > >>> > >> >
> > >>> > >> >Guozhang
> > >>> > >> >
> > >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
> > >>> > >><jqin@linkedin.com.invalid
> > >>> > >> >
> > >>> > >> >wrote:
> > >>> > >> >
> > >>> > >> >> It looks that the problem we want to solve and the purpose we
> > >>>want
> > >>> to
> > >>> > >> >> achieve is:
> > >>> > >> >> If user uses close() in callback, we want to let user be
> aware
> > >>>that
> > >>> > >>they
> > >>> > >> >> should use close(0) instead of close() in the callback.
> > >>> > >> >>
> > >>> > >> >> We have agreed that we will have an error log to inform user
> > >>>about
> > >>> > >>this
> > >>> > >> >> mis-usage. The options differ in the way how we can force
> user
> > >>>to
> > >>> > >>take a
> > >>> > >> >> look at that error log.
> > >>> > >> >> There are two scenarios:
> > >>> > >> >> 1. User does not expect the program to exit.
> > >>> > >> >> 2. User expect the program to exit.
> > >>> > >> >>
> > >>> > >> >> For scenario 1), blocking will probably delay the discovery
> of
> > >>>the
> > >>> > >> >> problem. Calling close(0) exposes the problem quicker. In
> this
> > >>> > >>scenario
> > >>> > >> >> producer just encounter a send failure when running normally.
> > >>> > >> >> For scenario 2), blocking will expose the problem quick.
> > >>>Calling
> > >>> > >> >>close(-1)
> > >>> > >> >> might hide the problem. This scenario might include: a) Unit
> > >>>test
> > >>> > >>for a
> > >>> > >> >> send failure. b) Message sending during a close() call from a
> > >>>user
> > >>> > >> >>thread.
> > >>> > >> >>
> > >>> > >> >> So as a summary table:
> > >>> > >> >>
> > >>> > >> >>                   Scenario 1)
>  Scenario
> > >>>2)
> > >>> > >> >>
> > >>> > >> >> Blocking      Delay problem discovery         Guaranteed
> > >>>problem
> > >>> > >> >>discovery
> > >>> > >> >>
> > >>> > >> >> Close(-1)     Immediate problem discovery     Problem might
> be
> > >>> hidden
> > >>> > >> >>
> > >>> > >> >>
> > >>> > >> >> Personally I prefer blocking because it seems providing more
> > >>> > >>guarantees
> > >>> > >> >> and safer.
> > >>> > >> >>
> > >>> > >> >> Thanks.
> > >>> > >> >>
> > >>> > >> >> Jiangjie (Becket) Qin
> > >>> > >> >>
> > >>> > >> >>
> > >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wa...@gmail.com>
> > >>>wrote:
> > >>> > >> >>
> > >>> > >> >> >HI Jiangjie,
> > >>> > >> >> >
> > >>> > >> >> >As far as I understand calling close() in the ioThread is
> not
> > >>> > >>common,
> > >>> > >> >>as
> > >>> > >> >> >it
> > >>> > >> >> >may only trigger when we saw some non-retriable error. Hence
> > >>>when
> > >>> > >>user
> > >>> > >> >>run
> > >>> > >> >> >their program it is unlikely that close() will be triggered
> > >>>and
> > >>> > >>problem
> > >>> > >> >> >will be detected. So it seems to me that from the error
> > >>>detection
> > >>> > >> >>aspect
> > >>> > >> >> >these two options seems to be the same as people will
> usually
> > >>> > >>detect it
> > >>> > >> >> >from the producer metrics all dropping to 0.
> > >>> > >> >> >
> > >>> > >> >> >Guozhang
> > >>> > >> >> >
> > >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> > >>> > >> >><jq...@linkedin.com.invalid>
> > >>> > >> >> >wrote:
> > >>> > >> >> >
> > >>> > >> >> >> It seems there are two options we can choose from when
> > >>>close()
> > >>> is
> > >>> > >> >>called
> > >>> > >> >> >> from sender thread (callback):
> > >>> > >> >> >> 1. Log an error and close the producer using close(-1)
> > >>> > >> >> >> 2. Log an error and block.
> > >>> > >> >> >> (Throwing an exception will not work because we catch all
> > >>>the
> > >>> > >> >>exception
> > >>> > >> >> >> thrown from user callback. It will just lead to an error
> > >>>log.)
> > >>> > >> >> >>
> > >>> > >> >> >> My concern for the first option is that the producer will
> be
> > >>> > >>closed
> > >>> > >> >>even
> > >>> > >> >> >> if we logged and error. I am wondering if some user would
> > >>>not
> > >>> even
> > >>> > >> >>take
> > >>> > >> >> >>a
> > >>> > >> >> >> look at the log if producer is closed normally. Because
> from
> > >>>the
> > >>> > >> >> >>programs
> > >>> > >> >> >> behavior, everything looks good. If that is the case, the
> > >>>error
> > >>> > >> >>message
> > >>> > >> >> >>we
> > >>> > >> >> >> logged probably will just be ignored until some day when
> > >>>people
> > >>> > >>check
> > >>> > >> >> >>the
> > >>> > >> >> >> log and see it.
> > >>> > >> >> >>
> > >>> > >> >> >> As for the second option, because producer does not close
> > >>>but
> > >>> > >>blocks.
> > >>> > >> >> >>User
> > >>> > >> >> >> will notice this the first time they run the program. They
> > >>> > >>probably
> > >>> > >> >>will
> > >>> > >> >> >> look at the log to see why producer could not be closed
> and
> > >>>they
> > >>> > >>will
> > >>> > >> >> >>see
> > >>> > >> >> >> the error log we put there. So they will get informed
> about
> > >>>this
> > >>> > >> >> >>mis-usage
> > >>> > >> >> >> of close() in sender thread the first time they run the
> code
> > >>> > >>instead
> > >>> > >> >>of
> > >>> > >> >> >> some time later.
> > >>> > >> >> >>
> > >>> > >> >> >> Personally I prefer the second one because it is more
> > >>>obvious
> > >>> that
> > >>> > >> >> >> something was wrong.
> > >>> > >> >> >>
> > >>> > >> >> >> Jiangjie (Becket) Qin
> > >>> > >> >> >>
> > >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com>
> > >>> wrote:
> > >>> > >> >> >>
> > >>> > >> >> >> >Yeah I agree we should not silently change the behavior
> of
> > >>>the
> > >>> > >> >>function
> > >>> > >> >> >> >with the given parameters; and I would prefer
> > >>> > >> >> >>error-logging-and-shutdown
> > >>> > >> >> >> >over blocking when close(>0) is used, since as Neha
> > >>>suggested
> > >>> > >> >>blocking
> > >>> > >> >> >> >would also not proceed with sending any data, bu will
> just
> > >>>let
> > >>> > >> >>users to
> > >>> > >> >> >> >realize the issue later than sooner.
> > >>> > >> >> >> >
> > >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
> > >>> > >><ne...@confluent.io>
> > >>> > >> >> >>wrote:
> > >>> > >> >> >> >
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >> > And I also agree it is better if we can make producer
> > >>>block
> > >>> > >>when
> > >>> > >> >> >> >> > close() is called from sender thread so user will
> > >>>notice
> > >>> > >> >>something
> > >>> > >> >> >> >>went
> > >>> > >> >> >> >> > wrong.
> > >>> > >> >> >> >>
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> This isn't a great experience either. Why can't we just
> > >>>throw
> > >>> > >>an
> > >>> > >> >> >> >>exception
> > >>> > >> >> >> >> for a behavior we know is incorrect and we'd like the
> > >>>user to
> > >>> > >> >>know.
> > >>> > >> >> >> >> Blocking as a means of doing that seems wrong and
> > >>>annoying.
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
> > >>> > >><ja...@gmail.com>
> > >>> > >> >> >> wrote:
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> > Cool.
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >> > I think blocking is good or alternately throwing an
> > >>> exception
> > >>> > >> >> >>directly
> > >>> > >> >> >> >> from
> > >>> > >> >> >> >> > close(). Basically I would just worry about subtly
> > >>>doing
> > >>> > >> >>something
> > >>> > >> >> >> >> slightly
> > >>> > >> >> >> >> > different from what the user asked for as it will be
> > >>>hard
> > >>> to
> > >>> > >> >>notice
> > >>> > >> >> >> >>that
> > >>> > >> >> >> >> > behavior difference.
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >> > -Jay
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> > >>> > >> >> >> >><jqin@linkedin.com.invalid
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >> > wrote:
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >> > > Hi Jay,
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > > I have modified the KIP as you suggested. I thinks
> as
> > >>> long
> > >>> > >>as
> > >>> > >> >>we
> > >>> > >> >> >> >>have
> > >>> > >> >> >> >> > > consistent define for timeout across Kafka
> interface,
> > >>> there
> > >>> > >> >>would
> > >>> > >> >> >> >>be no
> > >>> > >> >> >> >> > > problem. And I also agree it is better if we can
> make
> > >>> > >>producer
> > >>> > >> >> >>block
> > >>> > >> >> >> >> when
> > >>> > >> >> >> >> > > close() is called from sender thread so user will
> > >>>notice
> > >>> > >> >> >>something
> > >>> > >> >> >> >>went
> > >>> > >> >> >> >> > > wrong.
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > > Thanks.
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > > Jiangjie (Becket) Qin
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
> > >>><ja...@gmail.com>
> > >>> > >> wrote:
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > > >Hey Jiangjie,
> > >>> > >> >> >> >> > > >
> > >>> > >> >> >> >> > > >I think this is going to be very confusing that
> > >>> > >> >> >> >> > > >  close(0) waits indefinitely and
> > >>> > >> >> >> >> > > >  close(-1) waits for 0.
> > >>> > >> >> >> >> > > >I understand this appears in other apis, but it
> is a
> > >>> > >>constant
> > >>> > >> >> >> >>cause of
> > >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
> > >>> > >> >> >> >> > > >
> > >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't need a
> way
> > >>>to
> > >>> > >>wait
> > >>> > >> >> >> >> indefinitely
> > >>> > >> >> >> >> > > >as
> > >>> > >> >> >> >> > > >we already have close() so having a magical
> constant
> > >>>for
> > >>> > >> >>that is
> > >>> > >> >> >> >> > > >redundant.
> > >>> > >> >> >> >> > > >
> > >>> > >> >> >> >> > > >Calling close() from the I/O thread was already
> > >>>possible
> > >>> > >>and
> > >>> > >> >> >>would
> > >>> > >> >> >> >> block
> > >>> > >> >> >> >> > > >indefinitely. I think trying to silently change
> the
> > >>> > >>behavior
> > >>> > >> >>is
> > >>> > >> >> >> >> probably
> > >>> > >> >> >> >> > > >not right. I.e. if the user calls close() in the
> > >>> callback
> > >>> > >> >>there
> > >>> > >> >> >>is
> > >>> > >> >> >> >> > > >actually
> > >>> > >> >> >> >> > > >some misunderstanding and they need to think more,
> > >>> > >>silently
> > >>> > >> >> >>making
> > >>> > >> >> >> >> this
> > >>> > >> >> >> >> > > >not
> > >>> > >> >> >> >> > > >block will hide the problem from them which is the
> > >>> > >>opposite
> > >>> > >> >>of
> > >>> > >> >> >> >>what we
> > >>> > >> >> >> >> > > >want.
> > >>> > >> >> >> >> > > >
> > >>> > >> >> >> >> > > >-Jay
> > >>> > >> >> >> >> > > >
> > >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> > >>> > >> >> >> >> <jqin@linkedin.com.invalid
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > > >wrote:
> > >>> > >> >> >> >> > > >
> > >>> > >> >> >> >> > > >> Hey Joe & Jay,
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> Thanks for the comments on the voting thread.
> > >>>Since it
> > >>> > >> >>seems
> > >>> > >> >> >>we
> > >>> > >> >> >> >> > probably
> > >>> > >> >> >> >> > > >> will have more discussion on this, I am just
> > >>>replying
> > >>> > >>from
> > >>> > >> >>the
> > >>> > >> >> >> >> > > >>discussion
> > >>> > >> >> >> >> > > >> thread here.
> > >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less like
> > >>> > >>half-baked,
> > >>> > >> >> >> >>apologize
> > >>> > >> >> >> >> > for
> > >>> > >> >> >> >> > > >> the rush...
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> The contract in current KIP is:
> > >>> > >> >> >> >> > > >>   1. close() - wait until all requests either
> are
> > >>>sent
> > >>> > >>or
> > >>> > >> >> >>reach
> > >>> > >> >> >> >> > request
> > >>> > >> >> >> >> > > >> timeout.
> > >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close
> > >>> > >>immediately
> > >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) -
> equivalent
> > >>>to
> > >>> > >> >>close(),
> > >>> > >> >> >> >>i.e.
> > >>> > >> >> >> >> > Wait
> > >>> > >> >> >> >> > > >> until all requests are sent or reach request
> > >>>timeout
> > >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the
> > >>>best to
> > >>> > >> >>finish
> > >>> > >> >> >> >> sending
> > >>> > >> >> >> >> > > >>in 5
> > >>> > >> >> >> >> > > >> milliseconds, if something went wrong, just
> > >>>shutdown
> > >>> the
> > >>> > >> >> >>producer
> > >>> > >> >> >> >> > > >>anyway,
> > >>> > >> >> >> >> > > >> my callback will handle the failures.
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> About how we define what timeout value stands
> for,
> > >>>I
> > >>> > >> >>actually
> > >>> > >> >> >> >> > struggled
> > >>> > >> >> >> >> > > >>a
> > >>> > >> >> >> >> > > >> little bit when wrote the patch. Intuitively,
> > >>>close(0)
> > >>> > >> >>should
> > >>> > >> >> >> >>mean
> > >>> > >> >> >> >> > > >> immediately, however it seems that all the
> > >>>existing
> > >>> java
> > >>> > >> >>class
> > >>> > >> >> >> >>have
> > >>> > >> >> >> >> > this
> > >>> > >> >> >> >> > > >> convention of timeout=0 means no timeout or
> never
> > >>> > >>timeout
> > >>> > >> >> >> >> > > >>(Thread.join(0),
> > >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is
> > >>>either we
> > >>> > >> >>follow
> > >>> > >> >> >>the
> > >>> > >> >> >> >> > > >> intuition or we follow the convention. What I
> > >>>chose is
> > >>> > >>to
> > >>> > >> >> >>follow
> > >>> > >> >> >> >>the
> > >>> > >> >> >> >> > > >> convention but document the interface to let
> user
> > >>>be
> > >>> > >>aware
> > >>> > >> >>of
> > >>> > >> >> >>the
> > >>> > >> >> >> >> > usage.
> > >>> > >> >> >> >> > > >> The reason is that I think producer.close() is a
> > >>> public
> > >>> > >> >> >> >>interface so
> > >>> > >> >> >> >> > it
> > >>> > >> >> >> >> > > >> might be better to follow java convention.
> Whereas
> > >>> > >> >>selector is
> > >>> > >> >> >> >>not a
> > >>> > >> >> >> >> > > >> public interface that used by user, so as long
> as
> > >>>it
> > >>> > >>makes
> > >>> > >> >> >>sense
> > >>> > >> >> >> >>to
> > >>> > >> >> >> >> > us,
> > >>> > >> >> >> >> > > >>it
> > >>> > >> >> >> >> > > >> is less a problem to be different from java
> > >>> convention.
> > >>> > >> >>That
> > >>> > >> >> >>said
> > >>> > >> >> >> >> > since
> > >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public
> interface,
> > >>>I
> > >>> > >>think
> > >>> > >> >>it
> > >>> > >> >> >> >>also
> > >>> > >> >> >> >> > makes
> > >>> > >> >> >> >> > > >> sense to make producer.close() to have the same
> > >>> > >>definition
> > >>> > >> >>of
> > >>> > >> >> >> >> > > >> consumer.poll(timeout).
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> The main argument for keeping a timeout in close
> > >>>would
> > >>> > >>be
> > >>> > >> >> >> >>separating
> > >>> > >> >> >> >> > the
> > >>> > >> >> >> >> > > >> close timeout from request timeout, which
> probably
> > >>> makes
> > >>> > >> >> >>sense. I
> > >>> > >> >> >> >> > would
> > >>> > >> >> >> >> > > >> guess typically the request timeout would be
> long
> > >>> (e.g.
> > >>> > >>60
> > >>> > >> >> >> >>seconds)
> > >>> > >> >> >> >> > > >> because we might want to consider retries with
> > >>>back
> > >>> off
> > >>> > >> >>time.
> > >>> > >> >> >>If
> > >>> > >> >> >> >>we
> > >>> > >> >> >> >> > have
> > >>> > >> >> >> >> > > >> multiple batches in accumulator, in worst case
> > >>>that
> > >>> > >>could
> > >>> > >> >>take
> > >>> > >> >> >> >>up to
> > >>> > >> >> >> >> > > >> several minutes to complete all the requests.
> But
> > >>>when
> > >>> > >>we
> > >>> > >> >> >>close a
> > >>> > >> >> >> >> > > >> producer, we might not want to wait for that
> long
> > >>>as
> > >>> it
> > >>> > >> >>might
> > >>> > >> >> >> >>cause
> > >>> > >> >> >> >> > some
> > >>> > >> >> >> >> > > >> other problem like deployment tool timeout.
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> There is also a subtle difference between
> > >>> close(timeout)
> > >>> > >> >>and
> > >>> > >> >> >> >> > > >> flush(timeout). The only purpose for flush() is
> to
> > >>> write
> > >>> > >> >>data
> > >>> > >> >> >>to
> > >>> > >> >> >> >>the
> > >>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait until
> > >>> request
> > >>> > >> >> >>timeout.
> > >>> > >> >> >> >>I
> > >>> > >> >> >> >> > think
> > >>> > >> >> >> >> > > >> that is why flush(timeout) looks strange. On the
> > >>>other
> > >>> > >> >>hand,
> > >>> > >> >> >>the
> > >>> > >> >> >> >>top
> > >>> > >> >> >> >> > > >> priority for close() is to close the producer
> > >>>rather
> > >>> > >>than
> > >>> > >> >> >>flush()
> > >>> > >> >> >> >> > data,
> > >>> > >> >> >> >> > > >>so
> > >>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded
> waiting
> > >>>for
> > >>> > >>its
> > >>> > >> >>main
> > >>> > >> >> >> >>job.
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose flag.
> It
> > >>>is
> > >>> > >>not a
> > >>> > >> >> >>public
> > >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed Changes
> > >>>section
> > >>> > >> >>which I
> > >>> > >> >> >> >> thought
> > >>> > >> >> >> >> > > >>was
> > >>> > >> >> >> >> > > >> supposed to provide implementation details.
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> Thanks again for all the comments and
> suggestions!
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
> > >>> jqin@linkedin.com>
> > >>> > >> >> wrote:
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s
> comments.
> > >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process if no
> > >>>further
> > >>> > >> >> >>comments
> > >>> > >> >> >> >>are
> > >>> > >> >> >> >> > > >> >received by tomorrow.
> > >>> > >> >> >> >> > > >> >
> > >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
> > >>> > >> >> >> >> > > >> >
> > >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
> > >>><jay.kreps@gmail.com
> > >>> >
> > >>> > >> >>wrote:
> > >>> > >> >> >> >> > > >> >
> > >>> > >> >> >> >> > > >> >>Hey Jiangjie,
> > >>> > >> >> >> >> > > >> >>
> > >>> > >> >> >> >> > > >> >>Can you capture the full motivation and use
> > >>>cases
> > >>> for
> > >>> > >>the
> > >>> > >> >> >> >>feature?
> > >>> > >> >> >> >> > > >>This
> > >>> > >> >> >> >> > > >> >>mentions your interest in having a way of
> > >>>aborting
> > >>> > >>from
> > >>> > >> >> >>inside
> > >>> > >> >> >> >>the
> > >>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain that
> > >>>usage
> > >>> or
> > >>> > >>why
> > >>> > >> >> >>other
> > >>> > >> >> >> >> > people
> > >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't list
> the
> > >>> > >>primary
> > >>> > >> >>use
> > >>> > >> >> >> >>case
> > >>> > >> >> >> >> for
> > >>> > >> >> >> >> > > >> >>having
> > >>> > >> >> >> >> > > >> >>close with a bounded timeout which was to
> avoid
> > >>> > >>blocking
> > >>> > >> >>too
> > >>> > >> >> >> >>long
> > >>> > >> >> >> >> on
> > >>> > >> >> >> >> > > >> >>shutdown.
> > >>> > >> >> >> >> > > >> >>
> > >>> > >> >> >> >> > > >> >>-Jay
> > >>> > >> >> >> >> > > >> >>
> > >>> > >> >> >> >> > > >> >>
> > >>> > >> >> >> >> > > >> >>
> > >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> > >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
> > >>> > >> >> >> >> > > >> >
> > >>> > >> >> >> >> > > >> >>wrote:
> > >>> > >> >> >> >> > > >> >>
> > >>> > >> >> >> >> > > >> >>> Hi,
> > >>> > >> >> >> >> > > >> >>>
> > >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
> > >>>close(timeout)
> > >>> to
> > >>> > >>new
> > >>> > >> >> >> >> producer.
> > >>> > >> >> >> >> > > >>Most
> > >>> > >> >> >> >> > > >> >>>of
> > >>> > >> >> >> >> > > >> >>> the previous discussions are in KAFKA-1660
> > >>>where
> > >>> > >>Parth
> > >>> > >> >> >> >> Brahmbhatt
> > >>> > >> >> >> >> > > >>has
> > >>> > >> >> >> >> > > >> >>> already done a lot of work.
> > >>> > >> >> >> >> > > >> >>> Since this is an interface change so we are
> > >>>going
> > >>> > >> >>through
> > >>> > >> >> >>the
> > >>> > >> >> >> >> KIP
> > >>> > >> >> >> >> > > >> >>>process.
> > >>> > >> >> >> >> > > >> >>> Here is the KIP link:
> > >>> > >> >> >> >> > > >> >>>
> > >>> > >> >> >> >> > > >> >>>
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >>
> > >>> > >> >> >>
> > >>> > >> >>
> > >>> > >>
> > >>> >
> > >>>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> > >>> > >> >> >> >> > > >> >>>2
> > >>> > >> >> >> >> > > >> >>>
> > >>> > >> >> >> >> > > >> >>> Thanks.
> > >>> > >> >> >> >> > > >> >>>
> > >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> > >>> > >> >> >> >> > > >> >>>
> > >>> > >> >> >> >> > > >> >
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > > >>
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> > >
> > >>> > >> >> >> >> >
> > >>> > >> >> >> >>
> > >>> > >> >> >> >>
> > >>> > >> >> >> >>
> > >>> > >> >> >> >> --
> > >>> > >> >> >> >> Thanks,
> > >>> > >> >> >> >> Neha
> > >>> > >> >> >> >>
> > >>> > >> >> >> >
> > >>> > >> >> >> >
> > >>> > >> >> >> >
> > >>> > >> >> >> >--
> > >>> > >> >> >> >-- Guozhang
> > >>> > >> >> >>
> > >>> > >> >> >>
> > >>> > >> >> >
> > >>> > >> >> >
> > >>> > >> >> >--
> > >>> > >> >> >-- Guozhang
> > >>> > >> >>
> > >>> > >> >>
> > >>> > >> >
> > >>> > >> >
> > >>> > >> >--
> > >>> > >> >-- Guozhang
> > >>> > >>
> > >>> > >>
> > >>> >
> > >>> >
> > >>>
> > >>
> > >>
> > >>
> > >>--
> > >>-- Guozhang
> > >
> >
> >
>



-- 
Thanks,
Neha

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jay Kreps <ja...@gmail.com>.

Personally I'm in favor of (1) just to reduce the number of different APIs.
People will find the difference between abort and close subtle and
confusing and the only instance where you want it is this somewhat unusual
case you guys are pursuing, right?

-Jay

On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> It looks we have another option and are now deciding between the following
> two interfaces:
>
> 1. Close() + close(timeout)
>   - timeout could be either positive or zero.
>   - only close(0) can be called from sender thread
>
> 2. Close() + abort() + close(timeout)
>   - timeout can either be positive or zero
>   - only abort() can be called from sender thread
>
>   - abort() is equivalent to close(0) in 1) but does not join sender
> thread and does not close metrics.
>   - Another thread has to call close() or close(timeout) in order to make
> sure the resources in producer are gone.
>
> The tow approach provides the same function we need, the difference is
> approach 2) follows convention of close() and abort(). On the other hand,
> approach 1) saves one interface compared with approach 2) but does not
> follow the convention.
>
> When the two approaches come to user code, it is probably something like
> this:
>
> Try {
>   While(!finished)
>     Producer.send(record, callback)
> } catch (Exception e) {
>   Producer.close(5)
> }
>
> Class CallbackImpl implements Callback {
>   onCompletion(RecordMetadata metadata Exception e) {
>     If (e != null)
>       Abort() / close()
>   }
> }
>
> Because the two approach leads to almost the same user code, assuming
> users are always calling producer.close() as a clean up step, personally I
> prefer approach 2) as it follows convention.
>
> Any thoughts?
>
> Jiangjie (Becket) Qin
>
>
> On 3/17/15, 10:25 AM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>
> >Hi Jun,
> >
> >Yes, as Guozhang said, the main reason we set a flag is because close(0)
> >is expected to be called by sender thread itself.
> >If we want to maintain the semantic meaning of close(), one alternative is
> >to have an abort() method does the same thing as close(0) except cleanup.
> >And in close(timeout), after timeout we call abort() and join the sender
> >thread. This was one of the previous proposal. We merged abort to close(0)
> >because they are almost doing the same thing. But from what you mentioned,
> >it might make sense to have two separate methods.
> >
> >Thanks.
> >
> >Jiangjie (Becket) Qin
> >
> >On 3/16/15, 10:31 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
> >
> >>Yeah in this sense the sender thread will not exist immediately in the
> >>close(0) call, but will only terminate after the current response batch
> >>has
> >>been processed, as will the producer instance itself.
> >>
> >>There is a reason for this though: for a clean shutdown the caller thread
> >>has to wait for the sender thread to join before closing the producer
> >>instance, but this cannot be achieve if close(0) is called by the sender
> >>thread itself (for example in KAFKA-1659, there is a proposal from Andrew
> >>Stein on using thread.interrupt and thread.stop, but if it is called by
> >>the
> >>ioThread itself the stop call will fail). Hence we came up with the flag
> >>approach to let the sender thread to close as soon as it is at the
> >>barrier
> >>of the run loop.
> >>
> >>Guozhang
> >>
> >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <ju...@confluent.io> wrote:
> >>
> >>> Hmm, does that mean that after close(0), the sender thread is not
> >>>necessary
> >>> gone? Normally, after closing an entity, we expect all internal threads
> >>> associated with the entity are shut down completely.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
> >>><jq...@linkedin.com.invalid>
> >>> wrote:
> >>>
> >>> > Hi Jun,
> >>> >
> >>> > Close(0) will set two flags in sender. Running=false and a newly
> >>>added
> >>> > forceClose=true. It will also set accumulator.closed=true so no
> >>>further
> >>> > producer.send() will succeed.
> >>> > The sender thread will finish executing all the callbacks in current
> >>> batch
> >>> > of responses, then it will see the forceClose flag. It will just fail
> >>>all
> >>> > the incomplete batches in the producer and exit.
> >>> > So close(0) is a non-blocking call and sender thread will not try to
> >>>join
> >>> > itself in close(0).
> >>> >
> >>> > Thanks.
> >>> >
> >>> > Jiangjie (Becket) Qin
> >>> >
> >>> > On 3/16/15, 2:50 PM, "Jun Rao" <ju...@confluent.io> wrote:
> >>> >
> >>> > >How does close(0) work if it's called from the sender thread? If
> >>> close(0)
> >>> > >needs to wait for the sender thread to join, wouldn't this cause a
> >>> > >deadlock?
> >>> > >
> >>> > >Thanks,
> >>> > >
> >>> > >Jun
> >>> > >
> >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
> >>><jqin@linkedin.com.invalid
> >>> >
> >>> > >wrote:
> >>> > >
> >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered without
> >>> > >> discussing with you :)
> >>> > >>
> >>> > >> Jiangjie (Becket) Qin
> >>> > >>
> >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
> >>> > >>
> >>> > >> >Thanks Jiangjie,
> >>> > >> >
> >>> > >> >After talking to you offline on this, I have been convinced and
> >>> > >>changed my
> >>> > >> >preference to blocking. The immediate shutdown approach does have
> >>> some
> >>> > >> >unsafeness in some cases.
> >>> > >> >
> >>> > >> >Guozhang
> >>> > >> >
> >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
> >>> > >><jqin@linkedin.com.invalid
> >>> > >> >
> >>> > >> >wrote:
> >>> > >> >
> >>> > >> >> It looks that the problem we want to solve and the purpose we
> >>>want
> >>> to
> >>> > >> >> achieve is:
> >>> > >> >> If user uses close() in callback, we want to let user be aware
> >>>that
> >>> > >>they
> >>> > >> >> should use close(0) instead of close() in the callback.
> >>> > >> >>
> >>> > >> >> We have agreed that we will have an error log to inform user
> >>>about
> >>> > >>this
> >>> > >> >> mis-usage. The options differ in the way how we can force user
> >>>to
> >>> > >>take a
> >>> > >> >> look at that error log.
> >>> > >> >> There are two scenarios:
> >>> > >> >> 1. User does not expect the program to exit.
> >>> > >> >> 2. User expect the program to exit.
> >>> > >> >>
> >>> > >> >> For scenario 1), blocking will probably delay the discovery of
> >>>the
> >>> > >> >> problem. Calling close(0) exposes the problem quicker. In this
> >>> > >>scenario
> >>> > >> >> producer just encounter a send failure when running normally.
> >>> > >> >> For scenario 2), blocking will expose the problem quick.
> >>>Calling
> >>> > >> >>close(-1)
> >>> > >> >> might hide the problem. This scenario might include: a) Unit
> >>>test
> >>> > >>for a
> >>> > >> >> send failure. b) Message sending during a close() call from a
> >>>user
> >>> > >> >>thread.
> >>> > >> >>
> >>> > >> >> So as a summary table:
> >>> > >> >>
> >>> > >> >>                   Scenario 1)                         Scenario
> >>>2)
> >>> > >> >>
> >>> > >> >> Blocking      Delay problem discovery         Guaranteed
> >>>problem
> >>> > >> >>discovery
> >>> > >> >>
> >>> > >> >> Close(-1)     Immediate problem discovery     Problem might be
> >>> hidden
> >>> > >> >>
> >>> > >> >>
> >>> > >> >> Personally I prefer blocking because it seems providing more
> >>> > >>guarantees
> >>> > >> >> and safer.
> >>> > >> >>
> >>> > >> >> Thanks.
> >>> > >> >>
> >>> > >> >> Jiangjie (Becket) Qin
> >>> > >> >>
> >>> > >> >>
> >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wa...@gmail.com>
> >>>wrote:
> >>> > >> >>
> >>> > >> >> >HI Jiangjie,
> >>> > >> >> >
> >>> > >> >> >As far as I understand calling close() in the ioThread is not
> >>> > >>common,
> >>> > >> >>as
> >>> > >> >> >it
> >>> > >> >> >may only trigger when we saw some non-retriable error. Hence
> >>>when
> >>> > >>user
> >>> > >> >>run
> >>> > >> >> >their program it is unlikely that close() will be triggered
> >>>and
> >>> > >>problem
> >>> > >> >> >will be detected. So it seems to me that from the error
> >>>detection
> >>> > >> >>aspect
> >>> > >> >> >these two options seems to be the same as people will usually
> >>> > >>detect it
> >>> > >> >> >from the producer metrics all dropping to 0.
> >>> > >> >> >
> >>> > >> >> >Guozhang
> >>> > >> >> >
> >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> >>> > >> >><jq...@linkedin.com.invalid>
> >>> > >> >> >wrote:
> >>> > >> >> >
> >>> > >> >> >> It seems there are two options we can choose from when
> >>>close()
> >>> is
> >>> > >> >>called
> >>> > >> >> >> from sender thread (callback):
> >>> > >> >> >> 1. Log an error and close the producer using close(-1)
> >>> > >> >> >> 2. Log an error and block.
> >>> > >> >> >> (Throwing an exception will not work because we catch all
> >>>the
> >>> > >> >>exception
> >>> > >> >> >> thrown from user callback. It will just lead to an error
> >>>log.)
> >>> > >> >> >>
> >>> > >> >> >> My concern for the first option is that the producer will be
> >>> > >>closed
> >>> > >> >>even
> >>> > >> >> >> if we logged and error. I am wondering if some user would
> >>>not
> >>> even
> >>> > >> >>take
> >>> > >> >> >>a
> >>> > >> >> >> look at the log if producer is closed normally. Because from
> >>>the
> >>> > >> >> >>programs
> >>> > >> >> >> behavior, everything looks good. If that is the case, the
> >>>error
> >>> > >> >>message
> >>> > >> >> >>we
> >>> > >> >> >> logged probably will just be ignored until some day when
> >>>people
> >>> > >>check
> >>> > >> >> >>the
> >>> > >> >> >> log and see it.
> >>> > >> >> >>
> >>> > >> >> >> As for the second option, because producer does not close
> >>>but
> >>> > >>blocks.
> >>> > >> >> >>User
> >>> > >> >> >> will notice this the first time they run the program. They
> >>> > >>probably
> >>> > >> >>will
> >>> > >> >> >> look at the log to see why producer could not be closed and
> >>>they
> >>> > >>will
> >>> > >> >> >>see
> >>> > >> >> >> the error log we put there. So they will get informed about
> >>>this
> >>> > >> >> >>mis-usage
> >>> > >> >> >> of close() in sender thread the first time they run the code
> >>> > >>instead
> >>> > >> >>of
> >>> > >> >> >> some time later.
> >>> > >> >> >>
> >>> > >> >> >> Personally I prefer the second one because it is more
> >>>obvious
> >>> that
> >>> > >> >> >> something was wrong.
> >>> > >> >> >>
> >>> > >> >> >> Jiangjie (Becket) Qin
> >>> > >> >> >>
> >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com>
> >>> wrote:
> >>> > >> >> >>
> >>> > >> >> >> >Yeah I agree we should not silently change the behavior of
> >>>the
> >>> > >> >>function
> >>> > >> >> >> >with the given parameters; and I would prefer
> >>> > >> >> >>error-logging-and-shutdown
> >>> > >> >> >> >over blocking when close(>0) is used, since as Neha
> >>>suggested
> >>> > >> >>blocking
> >>> > >> >> >> >would also not proceed with sending any data, bu will just
> >>>let
> >>> > >> >>users to
> >>> > >> >> >> >realize the issue later than sooner.
> >>> > >> >> >> >
> >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
> >>> > >><ne...@confluent.io>
> >>> > >> >> >>wrote:
> >>> > >> >> >> >
> >>> > >> >> >> >> >
> >>> > >> >> >> >> > And I also agree it is better if we can make producer
> >>>block
> >>> > >>when
> >>> > >> >> >> >> > close() is called from sender thread so user will
> >>>notice
> >>> > >> >>something
> >>> > >> >> >> >>went
> >>> > >> >> >> >> > wrong.
> >>> > >> >> >> >>
> >>> > >> >> >> >>
> >>> > >> >> >> >> This isn't a great experience either. Why can't we just
> >>>throw
> >>> > >>an
> >>> > >> >> >> >>exception
> >>> > >> >> >> >> for a behavior we know is incorrect and we'd like the
> >>>user to
> >>> > >> >>know.
> >>> > >> >> >> >> Blocking as a means of doing that seems wrong and
> >>>annoying.
> >>> > >> >> >> >>
> >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
> >>> > >><ja...@gmail.com>
> >>> > >> >> >> wrote:
> >>> > >> >> >> >>
> >>> > >> >> >> >> > Cool.
> >>> > >> >> >> >> >
> >>> > >> >> >> >> > I think blocking is good or alternately throwing an
> >>> exception
> >>> > >> >> >>directly
> >>> > >> >> >> >> from
> >>> > >> >> >> >> > close(). Basically I would just worry about subtly
> >>>doing
> >>> > >> >>something
> >>> > >> >> >> >> slightly
> >>> > >> >> >> >> > different from what the user asked for as it will be
> >>>hard
> >>> to
> >>> > >> >>notice
> >>> > >> >> >> >>that
> >>> > >> >> >> >> > behavior difference.
> >>> > >> >> >> >> >
> >>> > >> >> >> >> > -Jay
> >>> > >> >> >> >> >
> >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> >>> > >> >> >> >><jqin@linkedin.com.invalid
> >>> > >> >> >> >> >
> >>> > >> >> >> >> > wrote:
> >>> > >> >> >> >> >
> >>> > >> >> >> >> > > Hi Jay,
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > > I have modified the KIP as you suggested. I thinks as
> >>> long
> >>> > >>as
> >>> > >> >>we
> >>> > >> >> >> >>have
> >>> > >> >> >> >> > > consistent define for timeout across Kafka interface,
> >>> there
> >>> > >> >>would
> >>> > >> >> >> >>be no
> >>> > >> >> >> >> > > problem. And I also agree it is better if we can make
> >>> > >>producer
> >>> > >> >> >>block
> >>> > >> >> >> >> when
> >>> > >> >> >> >> > > close() is called from sender thread so user will
> >>>notice
> >>> > >> >> >>something
> >>> > >> >> >> >>went
> >>> > >> >> >> >> > > wrong.
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > > Thanks.
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > > Jiangjie (Becket) Qin
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
> >>><ja...@gmail.com>
> >>> > >> wrote:
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > > >Hey Jiangjie,
> >>> > >> >> >> >> > > >
> >>> > >> >> >> >> > > >I think this is going to be very confusing that
> >>> > >> >> >> >> > > >  close(0) waits indefinitely and
> >>> > >> >> >> >> > > >  close(-1) waits for 0.
> >>> > >> >> >> >> > > >I understand this appears in other apis, but it is a
> >>> > >>constant
> >>> > >> >> >> >>cause of
> >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
> >>> > >> >> >> >> > > >
> >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't need a way
> >>>to
> >>> > >>wait
> >>> > >> >> >> >> indefinitely
> >>> > >> >> >> >> > > >as
> >>> > >> >> >> >> > > >we already have close() so having a magical constant
> >>>for
> >>> > >> >>that is
> >>> > >> >> >> >> > > >redundant.
> >>> > >> >> >> >> > > >
> >>> > >> >> >> >> > > >Calling close() from the I/O thread was already
> >>>possible
> >>> > >>and
> >>> > >> >> >>would
> >>> > >> >> >> >> block
> >>> > >> >> >> >> > > >indefinitely. I think trying to silently change the
> >>> > >>behavior
> >>> > >> >>is
> >>> > >> >> >> >> probably
> >>> > >> >> >> >> > > >not right. I.e. if the user calls close() in the
> >>> callback
> >>> > >> >>there
> >>> > >> >> >>is
> >>> > >> >> >> >> > > >actually
> >>> > >> >> >> >> > > >some misunderstanding and they need to think more,
> >>> > >>silently
> >>> > >> >> >>making
> >>> > >> >> >> >> this
> >>> > >> >> >> >> > > >not
> >>> > >> >> >> >> > > >block will hide the problem from them which is the
> >>> > >>opposite
> >>> > >> >>of
> >>> > >> >> >> >>what we
> >>> > >> >> >> >> > > >want.
> >>> > >> >> >> >> > > >
> >>> > >> >> >> >> > > >-Jay
> >>> > >> >> >> >> > > >
> >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> >>> > >> >> >> >> <jqin@linkedin.com.invalid
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > > >wrote:
> >>> > >> >> >> >> > > >
> >>> > >> >> >> >> > > >> Hey Joe & Jay,
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> Thanks for the comments on the voting thread.
> >>>Since it
> >>> > >> >>seems
> >>> > >> >> >>we
> >>> > >> >> >> >> > probably
> >>> > >> >> >> >> > > >> will have more discussion on this, I am just
> >>>replying
> >>> > >>from
> >>> > >> >>the
> >>> > >> >> >> >> > > >>discussion
> >>> > >> >> >> >> > > >> thread here.
> >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less like
> >>> > >>half-baked,
> >>> > >> >> >> >>apologize
> >>> > >> >> >> >> > for
> >>> > >> >> >> >> > > >> the rush...
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> The contract in current KIP is:
> >>> > >> >> >> >> > > >>   1. close() - wait until all requests either are
> >>>sent
> >>> > >>or
> >>> > >> >> >>reach
> >>> > >> >> >> >> > request
> >>> > >> >> >> >> > > >> timeout.
> >>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close
> >>> > >>immediately
> >>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent
> >>>to
> >>> > >> >>close(),
> >>> > >> >> >> >>i.e.
> >>> > >> >> >> >> > Wait
> >>> > >> >> >> >> > > >> until all requests are sent or reach request
> >>>timeout
> >>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the
> >>>best to
> >>> > >> >>finish
> >>> > >> >> >> >> sending
> >>> > >> >> >> >> > > >>in 5
> >>> > >> >> >> >> > > >> milliseconds, if something went wrong, just
> >>>shutdown
> >>> the
> >>> > >> >> >>producer
> >>> > >> >> >> >> > > >>anyway,
> >>> > >> >> >> >> > > >> my callback will handle the failures.
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> About how we define what timeout value stands for,
> >>>I
> >>> > >> >>actually
> >>> > >> >> >> >> > struggled
> >>> > >> >> >> >> > > >>a
> >>> > >> >> >> >> > > >> little bit when wrote the patch. Intuitively,
> >>>close(0)
> >>> > >> >>should
> >>> > >> >> >> >>mean
> >>> > >> >> >> >> > > >> immediately, however it seems that all the
> >>>existing
> >>> java
> >>> > >> >>class
> >>> > >> >> >> >>have
> >>> > >> >> >> >> > this
> >>> > >> >> >> >> > > >> convention of timeout=0 means no timeout or never
> >>> > >>timeout
> >>> > >> >> >> >> > > >>(Thread.join(0),
> >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is
> >>>either we
> >>> > >> >>follow
> >>> > >> >> >>the
> >>> > >> >> >> >> > > >> intuition or we follow the convention. What I
> >>>chose is
> >>> > >>to
> >>> > >> >> >>follow
> >>> > >> >> >> >>the
> >>> > >> >> >> >> > > >> convention but document the interface to let user
> >>>be
> >>> > >>aware
> >>> > >> >>of
> >>> > >> >> >>the
> >>> > >> >> >> >> > usage.
> >>> > >> >> >> >> > > >> The reason is that I think producer.close() is a
> >>> public
> >>> > >> >> >> >>interface so
> >>> > >> >> >> >> > it
> >>> > >> >> >> >> > > >> might be better to follow java convention. Whereas
> >>> > >> >>selector is
> >>> > >> >> >> >>not a
> >>> > >> >> >> >> > > >> public interface that used by user, so as long as
> >>>it
> >>> > >>makes
> >>> > >> >> >>sense
> >>> > >> >> >> >>to
> >>> > >> >> >> >> > us,
> >>> > >> >> >> >> > > >>it
> >>> > >> >> >> >> > > >> is less a problem to be different from java
> >>> convention.
> >>> > >> >>That
> >>> > >> >> >>said
> >>> > >> >> >> >> > since
> >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public interface,
> >>>I
> >>> > >>think
> >>> > >> >>it
> >>> > >> >> >> >>also
> >>> > >> >> >> >> > makes
> >>> > >> >> >> >> > > >> sense to make producer.close() to have the same
> >>> > >>definition
> >>> > >> >>of
> >>> > >> >> >> >> > > >> consumer.poll(timeout).
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> The main argument for keeping a timeout in close
> >>>would
> >>> > >>be
> >>> > >> >> >> >>separating
> >>> > >> >> >> >> > the
> >>> > >> >> >> >> > > >> close timeout from request timeout, which probably
> >>> makes
> >>> > >> >> >>sense. I
> >>> > >> >> >> >> > would
> >>> > >> >> >> >> > > >> guess typically the request timeout would be long
> >>> (e.g.
> >>> > >>60
> >>> > >> >> >> >>seconds)
> >>> > >> >> >> >> > > >> because we might want to consider retries with
> >>>back
> >>> off
> >>> > >> >>time.
> >>> > >> >> >>If
> >>> > >> >> >> >>we
> >>> > >> >> >> >> > have
> >>> > >> >> >> >> > > >> multiple batches in accumulator, in worst case
> >>>that
> >>> > >>could
> >>> > >> >>take
> >>> > >> >> >> >>up to
> >>> > >> >> >> >> > > >> several minutes to complete all the requests. But
> >>>when
> >>> > >>we
> >>> > >> >> >>close a
> >>> > >> >> >> >> > > >> producer, we might not want to wait for that long
> >>>as
> >>> it
> >>> > >> >>might
> >>> > >> >> >> >>cause
> >>> > >> >> >> >> > some
> >>> > >> >> >> >> > > >> other problem like deployment tool timeout.
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> There is also a subtle difference between
> >>> close(timeout)
> >>> > >> >>and
> >>> > >> >> >> >> > > >> flush(timeout). The only purpose for flush() is to
> >>> write
> >>> > >> >>data
> >>> > >> >> >>to
> >>> > >> >> >> >>the
> >>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait until
> >>> request
> >>> > >> >> >>timeout.
> >>> > >> >> >> >>I
> >>> > >> >> >> >> > think
> >>> > >> >> >> >> > > >> that is why flush(timeout) looks strange. On the
> >>>other
> >>> > >> >>hand,
> >>> > >> >> >>the
> >>> > >> >> >> >>top
> >>> > >> >> >> >> > > >> priority for close() is to close the producer
> >>>rather
> >>> > >>than
> >>> > >> >> >>flush()
> >>> > >> >> >> >> > data,
> >>> > >> >> >> >> > > >>so
> >>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded waiting
> >>>for
> >>> > >>its
> >>> > >> >>main
> >>> > >> >> >> >>job.
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose flag. It
> >>>is
> >>> > >>not a
> >>> > >> >> >>public
> >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed Changes
> >>>section
> >>> > >> >>which I
> >>> > >> >> >> >> thought
> >>> > >> >> >> >> > > >>was
> >>> > >> >> >> >> > > >> supposed to provide implementation details.
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> Thanks again for all the comments and suggestions!
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
> >>> jqin@linkedin.com>
> >>> > >> >> wrote:
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s comments.
> >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process if no
> >>>further
> >>> > >> >> >>comments
> >>> > >> >> >> >>are
> >>> > >> >> >> >> > > >> >received by tomorrow.
> >>> > >> >> >> >> > > >> >
> >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
> >>> > >> >> >> >> > > >> >
> >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
> >>><jay.kreps@gmail.com
> >>> >
> >>> > >> >>wrote:
> >>> > >> >> >> >> > > >> >
> >>> > >> >> >> >> > > >> >>Hey Jiangjie,
> >>> > >> >> >> >> > > >> >>
> >>> > >> >> >> >> > > >> >>Can you capture the full motivation and use
> >>>cases
> >>> for
> >>> > >>the
> >>> > >> >> >> >>feature?
> >>> > >> >> >> >> > > >>This
> >>> > >> >> >> >> > > >> >>mentions your interest in having a way of
> >>>aborting
> >>> > >>from
> >>> > >> >> >>inside
> >>> > >> >> >> >>the
> >>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain that
> >>>usage
> >>> or
> >>> > >>why
> >>> > >> >> >>other
> >>> > >> >> >> >> > people
> >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't list the
> >>> > >>primary
> >>> > >> >>use
> >>> > >> >> >> >>case
> >>> > >> >> >> >> for
> >>> > >> >> >> >> > > >> >>having
> >>> > >> >> >> >> > > >> >>close with a bounded timeout which was to avoid
> >>> > >>blocking
> >>> > >> >>too
> >>> > >> >> >> >>long
> >>> > >> >> >> >> on
> >>> > >> >> >> >> > > >> >>shutdown.
> >>> > >> >> >> >> > > >> >>
> >>> > >> >> >> >> > > >> >>-Jay
> >>> > >> >> >> >> > > >> >>
> >>> > >> >> >> >> > > >> >>
> >>> > >> >> >> >> > > >> >>
> >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> >>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
> >>> > >> >> >> >> > > >> >
> >>> > >> >> >> >> > > >> >>wrote:
> >>> > >> >> >> >> > > >> >>
> >>> > >> >> >> >> > > >> >>> Hi,
> >>> > >> >> >> >> > > >> >>>
> >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
> >>>close(timeout)
> >>> to
> >>> > >>new
> >>> > >> >> >> >> producer.
> >>> > >> >> >> >> > > >>Most
> >>> > >> >> >> >> > > >> >>>of
> >>> > >> >> >> >> > > >> >>> the previous discussions are in KAFKA-1660
> >>>where
> >>> > >>Parth
> >>> > >> >> >> >> Brahmbhatt
> >>> > >> >> >> >> > > >>has
> >>> > >> >> >> >> > > >> >>> already done a lot of work.
> >>> > >> >> >> >> > > >> >>> Since this is an interface change so we are
> >>>going
> >>> > >> >>through
> >>> > >> >> >>the
> >>> > >> >> >> >> KIP
> >>> > >> >> >> >> > > >> >>>process.
> >>> > >> >> >> >> > > >> >>> Here is the KIP link:
> >>> > >> >> >> >> > > >> >>>
> >>> > >> >> >> >> > > >> >>>
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > >
> >>> > >> >> >> >>
> >>> > >> >> >>
> >>> > >> >>
> >>> > >>
> >>> >
> >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >>> > >> >> >> >> > > >> >>>2
> >>> > >> >> >> >> > > >> >>>
> >>> > >> >> >> >> > > >> >>> Thanks.
> >>> > >> >> >> >> > > >> >>>
> >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> >>> > >> >> >> >> > > >> >>>
> >>> > >> >> >> >> > > >> >
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > > >>
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> > >
> >>> > >> >> >> >> >
> >>> > >> >> >> >>
> >>> > >> >> >> >>
> >>> > >> >> >> >>
> >>> > >> >> >> >> --
> >>> > >> >> >> >> Thanks,
> >>> > >> >> >> >> Neha
> >>> > >> >> >> >>
> >>> > >> >> >> >
> >>> > >> >> >> >
> >>> > >> >> >> >
> >>> > >> >> >> >--
> >>> > >> >> >> >-- Guozhang
> >>> > >> >> >>
> >>> > >> >> >>
> >>> > >> >> >
> >>> > >> >> >
> >>> > >> >> >--
> >>> > >> >> >-- Guozhang
> >>> > >> >>
> >>> > >> >>
> >>> > >> >
> >>> > >> >
> >>> > >> >--
> >>> > >> >-- Guozhang
> >>> > >>
> >>> > >>
> >>> >
> >>> >
> >>>
> >>
> >>
> >>
> >>--
> >>-- Guozhang
> >
>
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

It looks we have another option and are now deciding between the following
two interfaces:

1. Close() + close(timeout)
  - timeout could be either positive or zero.
  - only close(0) can be called from sender thread

2. Close() + abort() + close(timeout)
  - timeout can either be positive or zero
  - only abort() can be called from sender thread

  - abort() is equivalent to close(0) in 1) but does not join sender
thread and does not close metrics.
  - Another thread has to call close() or close(timeout) in order to make
sure the resources in producer are gone.

The tow approach provides the same function we need, the difference is
approach 2) follows convention of close() and abort(). On the other hand,
approach 1) saves one interface compared with approach 2) but does not
follow the convention.

When the two approaches come to user code, it is probably something like
this:

Try {
  While(!finished)
    Producer.send(record, callback)
} catch (Exception e) {
  Producer.close(5)
}

Class CallbackImpl implements Callback {
  onCompletion(RecordMetadata metadata Exception e) {
    If (e != null)
      Abort() / close()
  }
}

Because the two approach leads to almost the same user code, assuming
users are always calling producer.close() as a clean up step, personally I
prefer approach 2) as it follows convention.

Any thoughts?

Jiangjie (Becket) Qin


On 3/17/15, 10:25 AM, "Jiangjie Qin" <jq...@linkedin.com> wrote:

>Hi Jun,
>
>Yes, as Guozhang said, the main reason we set a flag is because close(0)
>is expected to be called by sender thread itself.
>If we want to maintain the semantic meaning of close(), one alternative is
>to have an abort() method does the same thing as close(0) except cleanup.
>And in close(timeout), after timeout we call abort() and join the sender
>thread. This was one of the previous proposal. We merged abort to close(0)
>because they are almost doing the same thing. But from what you mentioned,
>it might make sense to have two separate methods.
>
>Thanks.
>
>Jiangjie (Becket) Qin
>
>On 3/16/15, 10:31 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
>
>>Yeah in this sense the sender thread will not exist immediately in the
>>close(0) call, but will only terminate after the current response batch
>>has
>>been processed, as will the producer instance itself.
>>
>>There is a reason for this though: for a clean shutdown the caller thread
>>has to wait for the sender thread to join before closing the producer
>>instance, but this cannot be achieve if close(0) is called by the sender
>>thread itself (for example in KAFKA-1659, there is a proposal from Andrew
>>Stein on using thread.interrupt and thread.stop, but if it is called by
>>the
>>ioThread itself the stop call will fail). Hence we came up with the flag
>>approach to let the sender thread to close as soon as it is at the
>>barrier
>>of the run loop.
>>
>>Guozhang
>>
>>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <ju...@confluent.io> wrote:
>>
>>> Hmm, does that mean that after close(0), the sender thread is not
>>>necessary
>>> gone? Normally, after closing an entity, we expect all internal threads
>>> associated with the entity are shut down completely.
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
>>><jq...@linkedin.com.invalid>
>>> wrote:
>>>
>>> > Hi Jun,
>>> >
>>> > Close(0) will set two flags in sender. Running=false and a newly
>>>added
>>> > forceClose=true. It will also set accumulator.closed=true so no
>>>further
>>> > producer.send() will succeed.
>>> > The sender thread will finish executing all the callbacks in current
>>> batch
>>> > of responses, then it will see the forceClose flag. It will just fail
>>>all
>>> > the incomplete batches in the producer and exit.
>>> > So close(0) is a non-blocking call and sender thread will not try to
>>>join
>>> > itself in close(0).
>>> >
>>> > Thanks.
>>> >
>>> > Jiangjie (Becket) Qin
>>> >
>>> > On 3/16/15, 2:50 PM, "Jun Rao" <ju...@confluent.io> wrote:
>>> >
>>> > >How does close(0) work if it's called from the sender thread? If
>>> close(0)
>>> > >needs to wait for the sender thread to join, wouldn't this cause a
>>> > >deadlock?
>>> > >
>>> > >Thanks,
>>> > >
>>> > >Jun
>>> > >
>>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
>>><jqin@linkedin.com.invalid
>>> >
>>> > >wrote:
>>> > >
>>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered without
>>> > >> discussing with you :)
>>> > >>
>>> > >> Jiangjie (Becket) Qin
>>> > >>
>>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
>>> > >>
>>> > >> >Thanks Jiangjie,
>>> > >> >
>>> > >> >After talking to you offline on this, I have been convinced and
>>> > >>changed my
>>> > >> >preference to blocking. The immediate shutdown approach does have
>>> some
>>> > >> >unsafeness in some cases.
>>> > >> >
>>> > >> >Guozhang
>>> > >> >
>>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
>>> > >><jqin@linkedin.com.invalid
>>> > >> >
>>> > >> >wrote:
>>> > >> >
>>> > >> >> It looks that the problem we want to solve and the purpose we
>>>want
>>> to
>>> > >> >> achieve is:
>>> > >> >> If user uses close() in callback, we want to let user be aware
>>>that
>>> > >>they
>>> > >> >> should use close(0) instead of close() in the callback.
>>> > >> >>
>>> > >> >> We have agreed that we will have an error log to inform user
>>>about
>>> > >>this
>>> > >> >> mis-usage. The options differ in the way how we can force user
>>>to
>>> > >>take a
>>> > >> >> look at that error log.
>>> > >> >> There are two scenarios:
>>> > >> >> 1. User does not expect the program to exit.
>>> > >> >> 2. User expect the program to exit.
>>> > >> >>
>>> > >> >> For scenario 1), blocking will probably delay the discovery of
>>>the
>>> > >> >> problem. Calling close(0) exposes the problem quicker. In this
>>> > >>scenario
>>> > >> >> producer just encounter a send failure when running normally.
>>> > >> >> For scenario 2), blocking will expose the problem quick.
>>>Calling
>>> > >> >>close(-1)
>>> > >> >> might hide the problem. This scenario might include: a) Unit
>>>test
>>> > >>for a
>>> > >> >> send failure. b) Message sending during a close() call from a
>>>user
>>> > >> >>thread.
>>> > >> >>
>>> > >> >> So as a summary table:
>>> > >> >>
>>> > >> >>                   Scenario 1)                         Scenario
>>>2)
>>> > >> >>
>>> > >> >> Blocking      Delay problem discovery         Guaranteed
>>>problem
>>> > >> >>discovery
>>> > >> >>
>>> > >> >> Close(-1)     Immediate problem discovery     Problem might be
>>> hidden
>>> > >> >>
>>> > >> >>
>>> > >> >> Personally I prefer blocking because it seems providing more
>>> > >>guarantees
>>> > >> >> and safer.
>>> > >> >>
>>> > >> >> Thanks.
>>> > >> >>
>>> > >> >> Jiangjie (Becket) Qin
>>> > >> >>
>>> > >> >>
>>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wa...@gmail.com>
>>>wrote:
>>> > >> >>
>>> > >> >> >HI Jiangjie,
>>> > >> >> >
>>> > >> >> >As far as I understand calling close() in the ioThread is not
>>> > >>common,
>>> > >> >>as
>>> > >> >> >it
>>> > >> >> >may only trigger when we saw some non-retriable error. Hence
>>>when
>>> > >>user
>>> > >> >>run
>>> > >> >> >their program it is unlikely that close() will be triggered
>>>and
>>> > >>problem
>>> > >> >> >will be detected. So it seems to me that from the error
>>>detection
>>> > >> >>aspect
>>> > >> >> >these two options seems to be the same as people will usually
>>> > >>detect it
>>> > >> >> >from the producer metrics all dropping to 0.
>>> > >> >> >
>>> > >> >> >Guozhang
>>> > >> >> >
>>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
>>> > >> >><jq...@linkedin.com.invalid>
>>> > >> >> >wrote:
>>> > >> >> >
>>> > >> >> >> It seems there are two options we can choose from when
>>>close()
>>> is
>>> > >> >>called
>>> > >> >> >> from sender thread (callback):
>>> > >> >> >> 1. Log an error and close the producer using close(-1)
>>> > >> >> >> 2. Log an error and block.
>>> > >> >> >> (Throwing an exception will not work because we catch all
>>>the
>>> > >> >>exception
>>> > >> >> >> thrown from user callback. It will just lead to an error
>>>log.)
>>> > >> >> >>
>>> > >> >> >> My concern for the first option is that the producer will be
>>> > >>closed
>>> > >> >>even
>>> > >> >> >> if we logged and error. I am wondering if some user would
>>>not
>>> even
>>> > >> >>take
>>> > >> >> >>a
>>> > >> >> >> look at the log if producer is closed normally. Because from
>>>the
>>> > >> >> >>programs
>>> > >> >> >> behavior, everything looks good. If that is the case, the
>>>error
>>> > >> >>message
>>> > >> >> >>we
>>> > >> >> >> logged probably will just be ignored until some day when
>>>people
>>> > >>check
>>> > >> >> >>the
>>> > >> >> >> log and see it.
>>> > >> >> >>
>>> > >> >> >> As for the second option, because producer does not close
>>>but
>>> > >>blocks.
>>> > >> >> >>User
>>> > >> >> >> will notice this the first time they run the program. They
>>> > >>probably
>>> > >> >>will
>>> > >> >> >> look at the log to see why producer could not be closed and
>>>they
>>> > >>will
>>> > >> >> >>see
>>> > >> >> >> the error log we put there. So they will get informed about
>>>this
>>> > >> >> >>mis-usage
>>> > >> >> >> of close() in sender thread the first time they run the code
>>> > >>instead
>>> > >> >>of
>>> > >> >> >> some time later.
>>> > >> >> >>
>>> > >> >> >> Personally I prefer the second one because it is more
>>>obvious
>>> that
>>> > >> >> >> something was wrong.
>>> > >> >> >>
>>> > >> >> >> Jiangjie (Becket) Qin
>>> > >> >> >>
>>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com>
>>> wrote:
>>> > >> >> >>
>>> > >> >> >> >Yeah I agree we should not silently change the behavior of
>>>the
>>> > >> >>function
>>> > >> >> >> >with the given parameters; and I would prefer
>>> > >> >> >>error-logging-and-shutdown
>>> > >> >> >> >over blocking when close(>0) is used, since as Neha
>>>suggested
>>> > >> >>blocking
>>> > >> >> >> >would also not proceed with sending any data, bu will just
>>>let
>>> > >> >>users to
>>> > >> >> >> >realize the issue later than sooner.
>>> > >> >> >> >
>>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
>>> > >><ne...@confluent.io>
>>> > >> >> >>wrote:
>>> > >> >> >> >
>>> > >> >> >> >> >
>>> > >> >> >> >> > And I also agree it is better if we can make producer
>>>block
>>> > >>when
>>> > >> >> >> >> > close() is called from sender thread so user will
>>>notice
>>> > >> >>something
>>> > >> >> >> >>went
>>> > >> >> >> >> > wrong.
>>> > >> >> >> >>
>>> > >> >> >> >>
>>> > >> >> >> >> This isn't a great experience either. Why can't we just
>>>throw
>>> > >>an
>>> > >> >> >> >>exception
>>> > >> >> >> >> for a behavior we know is incorrect and we'd like the
>>>user to
>>> > >> >>know.
>>> > >> >> >> >> Blocking as a means of doing that seems wrong and
>>>annoying.
>>> > >> >> >> >>
>>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
>>> > >><ja...@gmail.com>
>>> > >> >> >> wrote:
>>> > >> >> >> >>
>>> > >> >> >> >> > Cool.
>>> > >> >> >> >> >
>>> > >> >> >> >> > I think blocking is good or alternately throwing an
>>> exception
>>> > >> >> >>directly
>>> > >> >> >> >> from
>>> > >> >> >> >> > close(). Basically I would just worry about subtly
>>>doing
>>> > >> >>something
>>> > >> >> >> >> slightly
>>> > >> >> >> >> > different from what the user asked for as it will be
>>>hard
>>> to
>>> > >> >>notice
>>> > >> >> >> >>that
>>> > >> >> >> >> > behavior difference.
>>> > >> >> >> >> >
>>> > >> >> >> >> > -Jay
>>> > >> >> >> >> >
>>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
>>> > >> >> >> >><jqin@linkedin.com.invalid
>>> > >> >> >> >> >
>>> > >> >> >> >> > wrote:
>>> > >> >> >> >> >
>>> > >> >> >> >> > > Hi Jay,
>>> > >> >> >> >> > >
>>> > >> >> >> >> > > I have modified the KIP as you suggested. I thinks as
>>> long
>>> > >>as
>>> > >> >>we
>>> > >> >> >> >>have
>>> > >> >> >> >> > > consistent define for timeout across Kafka interface,
>>> there
>>> > >> >>would
>>> > >> >> >> >>be no
>>> > >> >> >> >> > > problem. And I also agree it is better if we can make
>>> > >>producer
>>> > >> >> >>block
>>> > >> >> >> >> when
>>> > >> >> >> >> > > close() is called from sender thread so user will
>>>notice
>>> > >> >> >>something
>>> > >> >> >> >>went
>>> > >> >> >> >> > > wrong.
>>> > >> >> >> >> > >
>>> > >> >> >> >> > > Thanks.
>>> > >> >> >> >> > >
>>> > >> >> >> >> > > Jiangjie (Becket) Qin
>>> > >> >> >> >> > >
>>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
>>><ja...@gmail.com>
>>> > >> wrote:
>>> > >> >> >> >> > >
>>> > >> >> >> >> > > >Hey Jiangjie,
>>> > >> >> >> >> > > >
>>> > >> >> >> >> > > >I think this is going to be very confusing that
>>> > >> >> >> >> > > >  close(0) waits indefinitely and
>>> > >> >> >> >> > > >  close(-1) waits for 0.
>>> > >> >> >> >> > > >I understand this appears in other apis, but it is a
>>> > >>constant
>>> > >> >> >> >>cause of
>>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
>>> > >> >> >> >> > > >
>>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't need a way
>>>to
>>> > >>wait
>>> > >> >> >> >> indefinitely
>>> > >> >> >> >> > > >as
>>> > >> >> >> >> > > >we already have close() so having a magical constant
>>>for
>>> > >> >>that is
>>> > >> >> >> >> > > >redundant.
>>> > >> >> >> >> > > >
>>> > >> >> >> >> > > >Calling close() from the I/O thread was already
>>>possible
>>> > >>and
>>> > >> >> >>would
>>> > >> >> >> >> block
>>> > >> >> >> >> > > >indefinitely. I think trying to silently change the
>>> > >>behavior
>>> > >> >>is
>>> > >> >> >> >> probably
>>> > >> >> >> >> > > >not right. I.e. if the user calls close() in the
>>> callback
>>> > >> >>there
>>> > >> >> >>is
>>> > >> >> >> >> > > >actually
>>> > >> >> >> >> > > >some misunderstanding and they need to think more,
>>> > >>silently
>>> > >> >> >>making
>>> > >> >> >> >> this
>>> > >> >> >> >> > > >not
>>> > >> >> >> >> > > >block will hide the problem from them which is the
>>> > >>opposite
>>> > >> >>of
>>> > >> >> >> >>what we
>>> > >> >> >> >> > > >want.
>>> > >> >> >> >> > > >
>>> > >> >> >> >> > > >-Jay
>>> > >> >> >> >> > > >
>>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
>>> > >> >> >> >> <jqin@linkedin.com.invalid
>>> > >> >> >> >> > >
>>> > >> >> >> >> > > >wrote:
>>> > >> >> >> >> > > >
>>> > >> >> >> >> > > >> Hey Joe & Jay,
>>> > >> >> >> >> > > >>
>>> > >> >> >> >> > > >> Thanks for the comments on the voting thread.
>>>Since it
>>> > >> >>seems
>>> > >> >> >>we
>>> > >> >> >> >> > probably
>>> > >> >> >> >> > > >> will have more discussion on this, I am just
>>>replying
>>> > >>from
>>> > >> >>the
>>> > >> >> >> >> > > >>discussion
>>> > >> >> >> >> > > >> thread here.
>>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less like
>>> > >>half-baked,
>>> > >> >> >> >>apologize
>>> > >> >> >> >> > for
>>> > >> >> >> >> > > >> the rush...
>>> > >> >> >> >> > > >>
>>> > >> >> >> >> > > >> The contract in current KIP is:
>>> > >> >> >> >> > > >>   1. close() - wait until all requests either are
>>>sent
>>> > >>or
>>> > >> >> >>reach
>>> > >> >> >> >> > request
>>> > >> >> >> >> > > >> timeout.
>>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close
>>> > >>immediately
>>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent
>>>to
>>> > >> >>close(),
>>> > >> >> >> >>i.e.
>>> > >> >> >> >> > Wait
>>> > >> >> >> >> > > >> until all requests are sent or reach request
>>>timeout
>>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the
>>>best to
>>> > >> >>finish
>>> > >> >> >> >> sending
>>> > >> >> >> >> > > >>in 5
>>> > >> >> >> >> > > >> milliseconds, if something went wrong, just
>>>shutdown
>>> the
>>> > >> >> >>producer
>>> > >> >> >> >> > > >>anyway,
>>> > >> >> >> >> > > >> my callback will handle the failures.
>>> > >> >> >> >> > > >>
>>> > >> >> >> >> > > >> About how we define what timeout value stands for,
>>>I
>>> > >> >>actually
>>> > >> >> >> >> > struggled
>>> > >> >> >> >> > > >>a
>>> > >> >> >> >> > > >> little bit when wrote the patch. Intuitively,
>>>close(0)
>>> > >> >>should
>>> > >> >> >> >>mean
>>> > >> >> >> >> > > >> immediately, however it seems that all the
>>>existing
>>> java
>>> > >> >>class
>>> > >> >> >> >>have
>>> > >> >> >> >> > this
>>> > >> >> >> >> > > >> convention of timeout=0 means no timeout or never
>>> > >>timeout
>>> > >> >> >> >> > > >>(Thread.join(0),
>>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is
>>>either we
>>> > >> >>follow
>>> > >> >> >>the
>>> > >> >> >> >> > > >> intuition or we follow the convention. What I
>>>chose is
>>> > >>to
>>> > >> >> >>follow
>>> > >> >> >> >>the
>>> > >> >> >> >> > > >> convention but document the interface to let user
>>>be
>>> > >>aware
>>> > >> >>of
>>> > >> >> >>the
>>> > >> >> >> >> > usage.
>>> > >> >> >> >> > > >> The reason is that I think producer.close() is a
>>> public
>>> > >> >> >> >>interface so
>>> > >> >> >> >> > it
>>> > >> >> >> >> > > >> might be better to follow java convention. Whereas
>>> > >> >>selector is
>>> > >> >> >> >>not a
>>> > >> >> >> >> > > >> public interface that used by user, so as long as
>>>it
>>> > >>makes
>>> > >> >> >>sense
>>> > >> >> >> >>to
>>> > >> >> >> >> > us,
>>> > >> >> >> >> > > >>it
>>> > >> >> >> >> > > >> is less a problem to be different from java
>>> convention.
>>> > >> >>That
>>> > >> >> >>said
>>> > >> >> >> >> > since
>>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public interface,
>>>I
>>> > >>think
>>> > >> >>it
>>> > >> >> >> >>also
>>> > >> >> >> >> > makes
>>> > >> >> >> >> > > >> sense to make producer.close() to have the same
>>> > >>definition
>>> > >> >>of
>>> > >> >> >> >> > > >> consumer.poll(timeout).
>>> > >> >> >> >> > > >>
>>> > >> >> >> >> > > >> The main argument for keeping a timeout in close
>>>would
>>> > >>be
>>> > >> >> >> >>separating
>>> > >> >> >> >> > the
>>> > >> >> >> >> > > >> close timeout from request timeout, which probably
>>> makes
>>> > >> >> >>sense. I
>>> > >> >> >> >> > would
>>> > >> >> >> >> > > >> guess typically the request timeout would be long
>>> (e.g.
>>> > >>60
>>> > >> >> >> >>seconds)
>>> > >> >> >> >> > > >> because we might want to consider retries with
>>>back
>>> off
>>> > >> >>time.
>>> > >> >> >>If
>>> > >> >> >> >>we
>>> > >> >> >> >> > have
>>> > >> >> >> >> > > >> multiple batches in accumulator, in worst case
>>>that
>>> > >>could
>>> > >> >>take
>>> > >> >> >> >>up to
>>> > >> >> >> >> > > >> several minutes to complete all the requests. But
>>>when
>>> > >>we
>>> > >> >> >>close a
>>> > >> >> >> >> > > >> producer, we might not want to wait for that long
>>>as
>>> it
>>> > >> >>might
>>> > >> >> >> >>cause
>>> > >> >> >> >> > some
>>> > >> >> >> >> > > >> other problem like deployment tool timeout.
>>> > >> >> >> >> > > >>
>>> > >> >> >> >> > > >> There is also a subtle difference between
>>> close(timeout)
>>> > >> >>and
>>> > >> >> >> >> > > >> flush(timeout). The only purpose for flush() is to
>>> write
>>> > >> >>data
>>> > >> >> >>to
>>> > >> >> >> >>the
>>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait until
>>> request
>>> > >> >> >>timeout.
>>> > >> >> >> >>I
>>> > >> >> >> >> > think
>>> > >> >> >> >> > > >> that is why flush(timeout) looks strange. On the
>>>other
>>> > >> >>hand,
>>> > >> >> >>the
>>> > >> >> >> >>top
>>> > >> >> >> >> > > >> priority for close() is to close the producer
>>>rather
>>> > >>than
>>> > >> >> >>flush()
>>> > >> >> >> >> > data,
>>> > >> >> >> >> > > >>so
>>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded waiting
>>>for
>>> > >>its
>>> > >> >>main
>>> > >> >> >> >>job.
>>> > >> >> >> >> > > >>
>>> > >> >> >> >> > > >> Sorry for the confusion about forceClose flag. It
>>>is
>>> > >>not a
>>> > >> >> >>public
>>> > >> >> >> >> > > >> interface. I mentioned it in Proposed Changes
>>>section
>>> > >> >>which I
>>> > >> >> >> >> thought
>>> > >> >> >> >> > > >>was
>>> > >> >> >> >> > > >> supposed to provide implementation details.
>>> > >> >> >> >> > > >>
>>> > >> >> >> >> > > >> Thanks again for all the comments and suggestions!
>>> > >> >> >> >> > > >>
>>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
>>> > >> >> >> >> > > >>
>>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
>>> jqin@linkedin.com>
>>> > >> >> wrote:
>>> > >> >> >> >> > > >>
>>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s comments.
>>> > >> >> >> >> > > >> >I¹d like to initiate the voting process if no
>>>further
>>> > >> >> >>comments
>>> > >> >> >> >>are
>>> > >> >> >> >> > > >> >received by tomorrow.
>>> > >> >> >> >> > > >> >
>>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
>>> > >> >> >> >> > > >> >
>>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
>>><jay.kreps@gmail.com
>>> >
>>> > >> >>wrote:
>>> > >> >> >> >> > > >> >
>>> > >> >> >> >> > > >> >>Hey Jiangjie,
>>> > >> >> >> >> > > >> >>
>>> > >> >> >> >> > > >> >>Can you capture the full motivation and use
>>>cases
>>> for
>>> > >>the
>>> > >> >> >> >>feature?
>>> > >> >> >> >> > > >>This
>>> > >> >> >> >> > > >> >>mentions your interest in having a way of
>>>aborting
>>> > >>from
>>> > >> >> >>inside
>>> > >> >> >> >>the
>>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain that
>>>usage
>>> or
>>> > >>why
>>> > >> >> >>other
>>> > >> >> >> >> > people
>>> > >> >> >> >> > > >> >>would want to do that. It also doesn't list the
>>> > >>primary
>>> > >> >>use
>>> > >> >> >> >>case
>>> > >> >> >> >> for
>>> > >> >> >> >> > > >> >>having
>>> > >> >> >> >> > > >> >>close with a bounded timeout which was to avoid
>>> > >>blocking
>>> > >> >>too
>>> > >> >> >> >>long
>>> > >> >> >> >> on
>>> > >> >> >> >> > > >> >>shutdown.
>>> > >> >> >> >> > > >> >>
>>> > >> >> >> >> > > >> >>-Jay
>>> > >> >> >> >> > > >> >>
>>> > >> >> >> >> > > >> >>
>>> > >> >> >> >> > > >> >>
>>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
>>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
>>> > >> >> >> >> > > >> >
>>> > >> >> >> >> > > >> >>wrote:
>>> > >> >> >> >> > > >> >>
>>> > >> >> >> >> > > >> >>> Hi,
>>> > >> >> >> >> > > >> >>>
>>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
>>>close(timeout)
>>> to
>>> > >>new
>>> > >> >> >> >> producer.
>>> > >> >> >> >> > > >>Most
>>> > >> >> >> >> > > >> >>>of
>>> > >> >> >> >> > > >> >>> the previous discussions are in KAFKA-1660
>>>where
>>> > >>Parth
>>> > >> >> >> >> Brahmbhatt
>>> > >> >> >> >> > > >>has
>>> > >> >> >> >> > > >> >>> already done a lot of work.
>>> > >> >> >> >> > > >> >>> Since this is an interface change so we are
>>>going
>>> > >> >>through
>>> > >> >> >>the
>>> > >> >> >> >> KIP
>>> > >> >> >> >> > > >> >>>process.
>>> > >> >> >> >> > > >> >>> Here is the KIP link:
>>> > >> >> >> >> > > >> >>>
>>> > >> >> >> >> > > >> >>>
>>> > >> >> >> >> > > >>
>>> > >> >> >> >> > >
>>> > >> >> >> >>
>>> > >> >> >>
>>> > >> >>
>>> > >>
>>> > 
>>>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>>> > >> >> >> >> > > >> >>>2
>>> > >> >> >> >> > > >> >>>
>>> > >> >> >> >> > > >> >>> Thanks.
>>> > >> >> >> >> > > >> >>>
>>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
>>> > >> >> >> >> > > >> >>>
>>> > >> >> >> >> > > >> >
>>> > >> >> >> >> > > >>
>>> > >> >> >> >> > > >>
>>> > >> >> >> >> > >
>>> > >> >> >> >> > >
>>> > >> >> >> >> >
>>> > >> >> >> >>
>>> > >> >> >> >>
>>> > >> >> >> >>
>>> > >> >> >> >> --
>>> > >> >> >> >> Thanks,
>>> > >> >> >> >> Neha
>>> > >> >> >> >>
>>> > >> >> >> >
>>> > >> >> >> >
>>> > >> >> >> >
>>> > >> >> >> >--
>>> > >> >> >> >-- Guozhang
>>> > >> >> >>
>>> > >> >> >>
>>> > >> >> >
>>> > >> >> >
>>> > >> >> >--
>>> > >> >> >-- Guozhang
>>> > >> >>
>>> > >> >>
>>> > >> >
>>> > >> >
>>> > >> >--
>>> > >> >-- Guozhang
>>> > >>
>>> > >>
>>> >
>>> >
>>>
>>
>>
>>
>>-- 
>>-- Guozhang
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Hi Jun,

Yes, as Guozhang said, the main reason we set a flag is because close(0)
is expected to be called by sender thread itself.
If we want to maintain the semantic meaning of close(), one alternative is
to have an abort() method does the same thing as close(0) except cleanup.
And in close(timeout), after timeout we call abort() and join the sender
thread. This was one of the previous proposal. We merged abort to close(0)
because they are almost doing the same thing. But from what you mentioned,
it might make sense to have two separate methods.

Thanks.

Jiangjie (Becket) Qin

On 3/16/15, 10:31 PM, "Guozhang Wang" <wa...@gmail.com> wrote:

>Yeah in this sense the sender thread will not exist immediately in the
>close(0) call, but will only terminate after the current response batch
>has
>been processed, as will the producer instance itself.
>
>There is a reason for this though: for a clean shutdown the caller thread
>has to wait for the sender thread to join before closing the producer
>instance, but this cannot be achieve if close(0) is called by the sender
>thread itself (for example in KAFKA-1659, there is a proposal from Andrew
>Stein on using thread.interrupt and thread.stop, but if it is called by
>the
>ioThread itself the stop call will fail). Hence we came up with the flag
>approach to let the sender thread to close as soon as it is at the barrier
>of the run loop.
>
>Guozhang
>
>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <ju...@confluent.io> wrote:
>
>> Hmm, does that mean that after close(0), the sender thread is not
>>necessary
>> gone? Normally, after closing an entity, we expect all internal threads
>> associated with the entity are shut down completely.
>>
>> Thanks,
>>
>> Jun
>>
>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin
>><jq...@linkedin.com.invalid>
>> wrote:
>>
>> > Hi Jun,
>> >
>> > Close(0) will set two flags in sender. Running=false and a newly added
>> > forceClose=true. It will also set accumulator.closed=true so no
>>further
>> > producer.send() will succeed.
>> > The sender thread will finish executing all the callbacks in current
>> batch
>> > of responses, then it will see the forceClose flag. It will just fail
>>all
>> > the incomplete batches in the producer and exit.
>> > So close(0) is a non-blocking call and sender thread will not try to
>>join
>> > itself in close(0).
>> >
>> > Thanks.
>> >
>> > Jiangjie (Becket) Qin
>> >
>> > On 3/16/15, 2:50 PM, "Jun Rao" <ju...@confluent.io> wrote:
>> >
>> > >How does close(0) work if it's called from the sender thread? If
>> close(0)
>> > >needs to wait for the sender thread to join, wouldn't this cause a
>> > >deadlock?
>> > >
>> > >Thanks,
>> > >
>> > >Jun
>> > >
>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin
>><jqin@linkedin.com.invalid
>> >
>> > >wrote:
>> > >
>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered without
>> > >> discussing with you :)
>> > >>
>> > >> Jiangjie (Becket) Qin
>> > >>
>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
>> > >>
>> > >> >Thanks Jiangjie,
>> > >> >
>> > >> >After talking to you offline on this, I have been convinced and
>> > >>changed my
>> > >> >preference to blocking. The immediate shutdown approach does have
>> some
>> > >> >unsafeness in some cases.
>> > >> >
>> > >> >Guozhang
>> > >> >
>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
>> > >><jqin@linkedin.com.invalid
>> > >> >
>> > >> >wrote:
>> > >> >
>> > >> >> It looks that the problem we want to solve and the purpose we
>>want
>> to
>> > >> >> achieve is:
>> > >> >> If user uses close() in callback, we want to let user be aware
>>that
>> > >>they
>> > >> >> should use close(0) instead of close() in the callback.
>> > >> >>
>> > >> >> We have agreed that we will have an error log to inform user
>>about
>> > >>this
>> > >> >> mis-usage. The options differ in the way how we can force user
>>to
>> > >>take a
>> > >> >> look at that error log.
>> > >> >> There are two scenarios:
>> > >> >> 1. User does not expect the program to exit.
>> > >> >> 2. User expect the program to exit.
>> > >> >>
>> > >> >> For scenario 1), blocking will probably delay the discovery of
>>the
>> > >> >> problem. Calling close(0) exposes the problem quicker. In this
>> > >>scenario
>> > >> >> producer just encounter a send failure when running normally.
>> > >> >> For scenario 2), blocking will expose the problem quick. Calling
>> > >> >>close(-1)
>> > >> >> might hide the problem. This scenario might include: a) Unit
>>test
>> > >>for a
>> > >> >> send failure. b) Message sending during a close() call from a
>>user
>> > >> >>thread.
>> > >> >>
>> > >> >> So as a summary table:
>> > >> >>
>> > >> >>                   Scenario 1)                         Scenario
>>2)
>> > >> >>
>> > >> >> Blocking      Delay problem discovery         Guaranteed problem
>> > >> >>discovery
>> > >> >>
>> > >> >> Close(-1)     Immediate problem discovery     Problem might be
>> hidden
>> > >> >>
>> > >> >>
>> > >> >> Personally I prefer blocking because it seems providing more
>> > >>guarantees
>> > >> >> and safer.
>> > >> >>
>> > >> >> Thanks.
>> > >> >>
>> > >> >> Jiangjie (Becket) Qin
>> > >> >>
>> > >> >>
>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wa...@gmail.com>
>>wrote:
>> > >> >>
>> > >> >> >HI Jiangjie,
>> > >> >> >
>> > >> >> >As far as I understand calling close() in the ioThread is not
>> > >>common,
>> > >> >>as
>> > >> >> >it
>> > >> >> >may only trigger when we saw some non-retriable error. Hence
>>when
>> > >>user
>> > >> >>run
>> > >> >> >their program it is unlikely that close() will be triggered and
>> > >>problem
>> > >> >> >will be detected. So it seems to me that from the error
>>detection
>> > >> >>aspect
>> > >> >> >these two options seems to be the same as people will usually
>> > >>detect it
>> > >> >> >from the producer metrics all dropping to 0.
>> > >> >> >
>> > >> >> >Guozhang
>> > >> >> >
>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
>> > >> >><jq...@linkedin.com.invalid>
>> > >> >> >wrote:
>> > >> >> >
>> > >> >> >> It seems there are two options we can choose from when
>>close()
>> is
>> > >> >>called
>> > >> >> >> from sender thread (callback):
>> > >> >> >> 1. Log an error and close the producer using close(-1)
>> > >> >> >> 2. Log an error and block.
>> > >> >> >> (Throwing an exception will not work because we catch all the
>> > >> >>exception
>> > >> >> >> thrown from user callback. It will just lead to an error
>>log.)
>> > >> >> >>
>> > >> >> >> My concern for the first option is that the producer will be
>> > >>closed
>> > >> >>even
>> > >> >> >> if we logged and error. I am wondering if some user would not
>> even
>> > >> >>take
>> > >> >> >>a
>> > >> >> >> look at the log if producer is closed normally. Because from
>>the
>> > >> >> >>programs
>> > >> >> >> behavior, everything looks good. If that is the case, the
>>error
>> > >> >>message
>> > >> >> >>we
>> > >> >> >> logged probably will just be ignored until some day when
>>people
>> > >>check
>> > >> >> >>the
>> > >> >> >> log and see it.
>> > >> >> >>
>> > >> >> >> As for the second option, because producer does not close but
>> > >>blocks.
>> > >> >> >>User
>> > >> >> >> will notice this the first time they run the program. They
>> > >>probably
>> > >> >>will
>> > >> >> >> look at the log to see why producer could not be closed and
>>they
>> > >>will
>> > >> >> >>see
>> > >> >> >> the error log we put there. So they will get informed about
>>this
>> > >> >> >>mis-usage
>> > >> >> >> of close() in sender thread the first time they run the code
>> > >>instead
>> > >> >>of
>> > >> >> >> some time later.
>> > >> >> >>
>> > >> >> >> Personally I prefer the second one because it is more obvious
>> that
>> > >> >> >> something was wrong.
>> > >> >> >>
>> > >> >> >> Jiangjie (Becket) Qin
>> > >> >> >>
>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com>
>> wrote:
>> > >> >> >>
>> > >> >> >> >Yeah I agree we should not silently change the behavior of
>>the
>> > >> >>function
>> > >> >> >> >with the given parameters; and I would prefer
>> > >> >> >>error-logging-and-shutdown
>> > >> >> >> >over blocking when close(>0) is used, since as Neha
>>suggested
>> > >> >>blocking
>> > >> >> >> >would also not proceed with sending any data, bu will just
>>let
>> > >> >>users to
>> > >> >> >> >realize the issue later than sooner.
>> > >> >> >> >
>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
>> > >><ne...@confluent.io>
>> > >> >> >>wrote:
>> > >> >> >> >
>> > >> >> >> >> >
>> > >> >> >> >> > And I also agree it is better if we can make producer
>>block
>> > >>when
>> > >> >> >> >> > close() is called from sender thread so user will notice
>> > >> >>something
>> > >> >> >> >>went
>> > >> >> >> >> > wrong.
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> This isn't a great experience either. Why can't we just
>>throw
>> > >>an
>> > >> >> >> >>exception
>> > >> >> >> >> for a behavior we know is incorrect and we'd like the
>>user to
>> > >> >>know.
>> > >> >> >> >> Blocking as a means of doing that seems wrong and
>>annoying.
>> > >> >> >> >>
>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
>> > >><ja...@gmail.com>
>> > >> >> >> wrote:
>> > >> >> >> >>
>> > >> >> >> >> > Cool.
>> > >> >> >> >> >
>> > >> >> >> >> > I think blocking is good or alternately throwing an
>> exception
>> > >> >> >>directly
>> > >> >> >> >> from
>> > >> >> >> >> > close(). Basically I would just worry about subtly doing
>> > >> >>something
>> > >> >> >> >> slightly
>> > >> >> >> >> > different from what the user asked for as it will be
>>hard
>> to
>> > >> >>notice
>> > >> >> >> >>that
>> > >> >> >> >> > behavior difference.
>> > >> >> >> >> >
>> > >> >> >> >> > -Jay
>> > >> >> >> >> >
>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
>> > >> >> >> >><jqin@linkedin.com.invalid
>> > >> >> >> >> >
>> > >> >> >> >> > wrote:
>> > >> >> >> >> >
>> > >> >> >> >> > > Hi Jay,
>> > >> >> >> >> > >
>> > >> >> >> >> > > I have modified the KIP as you suggested. I thinks as
>> long
>> > >>as
>> > >> >>we
>> > >> >> >> >>have
>> > >> >> >> >> > > consistent define for timeout across Kafka interface,
>> there
>> > >> >>would
>> > >> >> >> >>be no
>> > >> >> >> >> > > problem. And I also agree it is better if we can make
>> > >>producer
>> > >> >> >>block
>> > >> >> >> >> when
>> > >> >> >> >> > > close() is called from sender thread so user will
>>notice
>> > >> >> >>something
>> > >> >> >> >>went
>> > >> >> >> >> > > wrong.
>> > >> >> >> >> > >
>> > >> >> >> >> > > Thanks.
>> > >> >> >> >> > >
>> > >> >> >> >> > > Jiangjie (Becket) Qin
>> > >> >> >> >> > >
>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps"
>><ja...@gmail.com>
>> > >> wrote:
>> > >> >> >> >> > >
>> > >> >> >> >> > > >Hey Jiangjie,
>> > >> >> >> >> > > >
>> > >> >> >> >> > > >I think this is going to be very confusing that
>> > >> >> >> >> > > >  close(0) waits indefinitely and
>> > >> >> >> >> > > >  close(-1) waits for 0.
>> > >> >> >> >> > > >I understand this appears in other apis, but it is a
>> > >>constant
>> > >> >> >> >>cause of
>> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
>> > >> >> >> >> > > >
>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't need a way
>>to
>> > >>wait
>> > >> >> >> >> indefinitely
>> > >> >> >> >> > > >as
>> > >> >> >> >> > > >we already have close() so having a magical constant
>>for
>> > >> >>that is
>> > >> >> >> >> > > >redundant.
>> > >> >> >> >> > > >
>> > >> >> >> >> > > >Calling close() from the I/O thread was already
>>possible
>> > >>and
>> > >> >> >>would
>> > >> >> >> >> block
>> > >> >> >> >> > > >indefinitely. I think trying to silently change the
>> > >>behavior
>> > >> >>is
>> > >> >> >> >> probably
>> > >> >> >> >> > > >not right. I.e. if the user calls close() in the
>> callback
>> > >> >>there
>> > >> >> >>is
>> > >> >> >> >> > > >actually
>> > >> >> >> >> > > >some misunderstanding and they need to think more,
>> > >>silently
>> > >> >> >>making
>> > >> >> >> >> this
>> > >> >> >> >> > > >not
>> > >> >> >> >> > > >block will hide the problem from them which is the
>> > >>opposite
>> > >> >>of
>> > >> >> >> >>what we
>> > >> >> >> >> > > >want.
>> > >> >> >> >> > > >
>> > >> >> >> >> > > >-Jay
>> > >> >> >> >> > > >
>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
>> > >> >> >> >> <jqin@linkedin.com.invalid
>> > >> >> >> >> > >
>> > >> >> >> >> > > >wrote:
>> > >> >> >> >> > > >
>> > >> >> >> >> > > >> Hey Joe & Jay,
>> > >> >> >> >> > > >>
>> > >> >> >> >> > > >> Thanks for the comments on the voting thread.
>>Since it
>> > >> >>seems
>> > >> >> >>we
>> > >> >> >> >> > probably
>> > >> >> >> >> > > >> will have more discussion on this, I am just
>>replying
>> > >>from
>> > >> >>the
>> > >> >> >> >> > > >>discussion
>> > >> >> >> >> > > >> thread here.
>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less like
>> > >>half-baked,
>> > >> >> >> >>apologize
>> > >> >> >> >> > for
>> > >> >> >> >> > > >> the rush...
>> > >> >> >> >> > > >>
>> > >> >> >> >> > > >> The contract in current KIP is:
>> > >> >> >> >> > > >>   1. close() - wait until all requests either are
>>sent
>> > >>or
>> > >> >> >>reach
>> > >> >> >> >> > request
>> > >> >> >> >> > > >> timeout.
>> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close
>> > >>immediately
>> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent
>>to
>> > >> >>close(),
>> > >> >> >> >>i.e.
>> > >> >> >> >> > Wait
>> > >> >> >> >> > > >> until all requests are sent or reach request
>>timeout
>> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the
>>best to
>> > >> >>finish
>> > >> >> >> >> sending
>> > >> >> >> >> > > >>in 5
>> > >> >> >> >> > > >> milliseconds, if something went wrong, just
>>shutdown
>> the
>> > >> >> >>producer
>> > >> >> >> >> > > >>anyway,
>> > >> >> >> >> > > >> my callback will handle the failures.
>> > >> >> >> >> > > >>
>> > >> >> >> >> > > >> About how we define what timeout value stands for,
>>I
>> > >> >>actually
>> > >> >> >> >> > struggled
>> > >> >> >> >> > > >>a
>> > >> >> >> >> > > >> little bit when wrote the patch. Intuitively,
>>close(0)
>> > >> >>should
>> > >> >> >> >>mean
>> > >> >> >> >> > > >> immediately, however it seems that all the existing
>> java
>> > >> >>class
>> > >> >> >> >>have
>> > >> >> >> >> > this
>> > >> >> >> >> > > >> convention of timeout=0 means no timeout or never
>> > >>timeout
>> > >> >> >> >> > > >>(Thread.join(0),
>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is
>>either we
>> > >> >>follow
>> > >> >> >>the
>> > >> >> >> >> > > >> intuition or we follow the convention. What I
>>chose is
>> > >>to
>> > >> >> >>follow
>> > >> >> >> >>the
>> > >> >> >> >> > > >> convention but document the interface to let user
>>be
>> > >>aware
>> > >> >>of
>> > >> >> >>the
>> > >> >> >> >> > usage.
>> > >> >> >> >> > > >> The reason is that I think producer.close() is a
>> public
>> > >> >> >> >>interface so
>> > >> >> >> >> > it
>> > >> >> >> >> > > >> might be better to follow java convention. Whereas
>> > >> >>selector is
>> > >> >> >> >>not a
>> > >> >> >> >> > > >> public interface that used by user, so as long as
>>it
>> > >>makes
>> > >> >> >>sense
>> > >> >> >> >>to
>> > >> >> >> >> > us,
>> > >> >> >> >> > > >>it
>> > >> >> >> >> > > >> is less a problem to be different from java
>> convention.
>> > >> >>That
>> > >> >> >>said
>> > >> >> >> >> > since
>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public interface,
>>I
>> > >>think
>> > >> >>it
>> > >> >> >> >>also
>> > >> >> >> >> > makes
>> > >> >> >> >> > > >> sense to make producer.close() to have the same
>> > >>definition
>> > >> >>of
>> > >> >> >> >> > > >> consumer.poll(timeout).
>> > >> >> >> >> > > >>
>> > >> >> >> >> > > >> The main argument for keeping a timeout in close
>>would
>> > >>be
>> > >> >> >> >>separating
>> > >> >> >> >> > the
>> > >> >> >> >> > > >> close timeout from request timeout, which probably
>> makes
>> > >> >> >>sense. I
>> > >> >> >> >> > would
>> > >> >> >> >> > > >> guess typically the request timeout would be long
>> (e.g.
>> > >>60
>> > >> >> >> >>seconds)
>> > >> >> >> >> > > >> because we might want to consider retries with back
>> off
>> > >> >>time.
>> > >> >> >>If
>> > >> >> >> >>we
>> > >> >> >> >> > have
>> > >> >> >> >> > > >> multiple batches in accumulator, in worst case that
>> > >>could
>> > >> >>take
>> > >> >> >> >>up to
>> > >> >> >> >> > > >> several minutes to complete all the requests. But
>>when
>> > >>we
>> > >> >> >>close a
>> > >> >> >> >> > > >> producer, we might not want to wait for that long
>>as
>> it
>> > >> >>might
>> > >> >> >> >>cause
>> > >> >> >> >> > some
>> > >> >> >> >> > > >> other problem like deployment tool timeout.
>> > >> >> >> >> > > >>
>> > >> >> >> >> > > >> There is also a subtle difference between
>> close(timeout)
>> > >> >>and
>> > >> >> >> >> > > >> flush(timeout). The only purpose for flush() is to
>> write
>> > >> >>data
>> > >> >> >>to
>> > >> >> >> >>the
>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait until
>> request
>> > >> >> >>timeout.
>> > >> >> >> >>I
>> > >> >> >> >> > think
>> > >> >> >> >> > > >> that is why flush(timeout) looks strange. On the
>>other
>> > >> >>hand,
>> > >> >> >>the
>> > >> >> >> >>top
>> > >> >> >> >> > > >> priority for close() is to close the producer
>>rather
>> > >>than
>> > >> >> >>flush()
>> > >> >> >> >> > data,
>> > >> >> >> >> > > >>so
>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded waiting
>>for
>> > >>its
>> > >> >>main
>> > >> >> >> >>job.
>> > >> >> >> >> > > >>
>> > >> >> >> >> > > >> Sorry for the confusion about forceClose flag. It
>>is
>> > >>not a
>> > >> >> >>public
>> > >> >> >> >> > > >> interface. I mentioned it in Proposed Changes
>>section
>> > >> >>which I
>> > >> >> >> >> thought
>> > >> >> >> >> > > >>was
>> > >> >> >> >> > > >> supposed to provide implementation details.
>> > >> >> >> >> > > >>
>> > >> >> >> >> > > >> Thanks again for all the comments and suggestions!
>> > >> >> >> >> > > >>
>> > >> >> >> >> > > >> Jiangjie (Becket) Qin
>> > >> >> >> >> > > >>
>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
>> jqin@linkedin.com>
>> > >> >> wrote:
>> > >> >> >> >> > > >>
>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s comments.
>> > >> >> >> >> > > >> >I¹d like to initiate the voting process if no
>>further
>> > >> >> >>comments
>> > >> >> >> >>are
>> > >> >> >> >> > > >> >received by tomorrow.
>> > >> >> >> >> > > >> >
>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
>> > >> >> >> >> > > >> >
>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps"
>><jay.kreps@gmail.com
>> >
>> > >> >>wrote:
>> > >> >> >> >> > > >> >
>> > >> >> >> >> > > >> >>Hey Jiangjie,
>> > >> >> >> >> > > >> >>
>> > >> >> >> >> > > >> >>Can you capture the full motivation and use cases
>> for
>> > >>the
>> > >> >> >> >>feature?
>> > >> >> >> >> > > >>This
>> > >> >> >> >> > > >> >>mentions your interest in having a way of
>>aborting
>> > >>from
>> > >> >> >>inside
>> > >> >> >> >>the
>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain that
>>usage
>> or
>> > >>why
>> > >> >> >>other
>> > >> >> >> >> > people
>> > >> >> >> >> > > >> >>would want to do that. It also doesn't list the
>> > >>primary
>> > >> >>use
>> > >> >> >> >>case
>> > >> >> >> >> for
>> > >> >> >> >> > > >> >>having
>> > >> >> >> >> > > >> >>close with a bounded timeout which was to avoid
>> > >>blocking
>> > >> >>too
>> > >> >> >> >>long
>> > >> >> >> >> on
>> > >> >> >> >> > > >> >>shutdown.
>> > >> >> >> >> > > >> >>
>> > >> >> >> >> > > >> >>-Jay
>> > >> >> >> >> > > >> >>
>> > >> >> >> >> > > >> >>
>> > >> >> >> >> > > >> >>
>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
>> > >> >> >> >> > > >><jqin@linkedin.com.invalid
>> > >> >> >> >> > > >> >
>> > >> >> >> >> > > >> >>wrote:
>> > >> >> >> >> > > >> >>
>> > >> >> >> >> > > >> >>> Hi,
>> > >> >> >> >> > > >> >>>
>> > >> >> >> >> > > >> >>> I just created a KIP for adding a
>>close(timeout)
>> to
>> > >>new
>> > >> >> >> >> producer.
>> > >> >> >> >> > > >>Most
>> > >> >> >> >> > > >> >>>of
>> > >> >> >> >> > > >> >>> the previous discussions are in KAFKA-1660
>>where
>> > >>Parth
>> > >> >> >> >> Brahmbhatt
>> > >> >> >> >> > > >>has
>> > >> >> >> >> > > >> >>> already done a lot of work.
>> > >> >> >> >> > > >> >>> Since this is an interface change so we are
>>going
>> > >> >>through
>> > >> >> >>the
>> > >> >> >> >> KIP
>> > >> >> >> >> > > >> >>>process.
>> > >> >> >> >> > > >> >>> Here is the KIP link:
>> > >> >> >> >> > > >> >>>
>> > >> >> >> >> > > >> >>>
>> > >> >> >> >> > > >>
>> > >> >> >> >> > >
>> > >> >> >> >>
>> > >> >> >>
>> > >> >>
>> > >>
>> > 
>>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>> > >> >> >> >> > > >> >>>2
>> > >> >> >> >> > > >> >>>
>> > >> >> >> >> > > >> >>> Thanks.
>> > >> >> >> >> > > >> >>>
>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
>> > >> >> >> >> > > >> >>>
>> > >> >> >> >> > > >> >
>> > >> >> >> >> > > >>
>> > >> >> >> >> > > >>
>> > >> >> >> >> > >
>> > >> >> >> >> > >
>> > >> >> >> >> >
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> --
>> > >> >> >> >> Thanks,
>> > >> >> >> >> Neha
>> > >> >> >> >>
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> >--
>> > >> >> >> >-- Guozhang
>> > >> >> >>
>> > >> >> >>
>> > >> >> >
>> > >> >> >
>> > >> >> >--
>> > >> >> >-- Guozhang
>> > >> >>
>> > >> >>
>> > >> >
>> > >> >
>> > >> >--
>> > >> >-- Guozhang
>> > >>
>> > >>
>> >
>> >
>>
>
>
>
>-- 
>-- Guozhang

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Guozhang Wang <wa...@gmail.com>.

Yeah in this sense the sender thread will not exist immediately in the
close(0) call, but will only terminate after the current response batch has
been processed, as will the producer instance itself.

There is a reason for this though: for a clean shutdown the caller thread
has to wait for the sender thread to join before closing the producer
instance, but this cannot be achieve if close(0) is called by the sender
thread itself (for example in KAFKA-1659, there is a proposal from Andrew
Stein on using thread.interrupt and thread.stop, but if it is called by the
ioThread itself the stop call will fail). Hence we came up with the flag
approach to let the sender thread to close as soon as it is at the barrier
of the run loop.

Guozhang

On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <ju...@confluent.io> wrote:

> Hmm, does that mean that after close(0), the sender thread is not necessary
> gone? Normally, after closing an entity, we expect all internal threads
> associated with the entity are shut down completely.
>
> Thanks,
>
> Jun
>
> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
> wrote:
>
> > Hi Jun,
> >
> > Close(0) will set two flags in sender. Running=false and a newly added
> > forceClose=true. It will also set accumulator.closed=true so no further
> > producer.send() will succeed.
> > The sender thread will finish executing all the callbacks in current
> batch
> > of responses, then it will see the forceClose flag. It will just fail all
> > the incomplete batches in the producer and exit.
> > So close(0) is a non-blocking call and sender thread will not try to join
> > itself in close(0).
> >
> > Thanks.
> >
> > Jiangjie (Becket) Qin
> >
> > On 3/16/15, 2:50 PM, "Jun Rao" <ju...@confluent.io> wrote:
> >
> > >How does close(0) work if it's called from the sender thread? If
> close(0)
> > >needs to wait for the sender thread to join, wouldn't this cause a
> > >deadlock?
> > >
> > >Thanks,
> > >
> > >Jun
> > >
> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin <jqin@linkedin.com.invalid
> >
> > >wrote:
> > >
> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered without
> > >> discussing with you :)
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
> > >>
> > >> >Thanks Jiangjie,
> > >> >
> > >> >After talking to you offline on this, I have been convinced and
> > >>changed my
> > >> >preference to blocking. The immediate shutdown approach does have
> some
> > >> >unsafeness in some cases.
> > >> >
> > >> >Guozhang
> > >> >
> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
> > >><jqin@linkedin.com.invalid
> > >> >
> > >> >wrote:
> > >> >
> > >> >> It looks that the problem we want to solve and the purpose we want
> to
> > >> >> achieve is:
> > >> >> If user uses close() in callback, we want to let user be aware that
> > >>they
> > >> >> should use close(0) instead of close() in the callback.
> > >> >>
> > >> >> We have agreed that we will have an error log to inform user about
> > >>this
> > >> >> mis-usage. The options differ in the way how we can force user to
> > >>take a
> > >> >> look at that error log.
> > >> >> There are two scenarios:
> > >> >> 1. User does not expect the program to exit.
> > >> >> 2. User expect the program to exit.
> > >> >>
> > >> >> For scenario 1), blocking will probably delay the discovery of the
> > >> >> problem. Calling close(0) exposes the problem quicker. In this
> > >>scenario
> > >> >> producer just encounter a send failure when running normally.
> > >> >> For scenario 2), blocking will expose the problem quick. Calling
> > >> >>close(-1)
> > >> >> might hide the problem. This scenario might include: a) Unit test
> > >>for a
> > >> >> send failure. b) Message sending during a close() call from a user
> > >> >>thread.
> > >> >>
> > >> >> So as a summary table:
> > >> >>
> > >> >>                   Scenario 1)                         Scenario 2)
> > >> >>
> > >> >> Blocking      Delay problem discovery         Guaranteed problem
> > >> >>discovery
> > >> >>
> > >> >> Close(-1)     Immediate problem discovery     Problem might be
> hidden
> > >> >>
> > >> >>
> > >> >> Personally I prefer blocking because it seems providing more
> > >>guarantees
> > >> >> and safer.
> > >> >>
> > >> >> Thanks.
> > >> >>
> > >> >> Jiangjie (Becket) Qin
> > >> >>
> > >> >>
> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wa...@gmail.com> wrote:
> > >> >>
> > >> >> >HI Jiangjie,
> > >> >> >
> > >> >> >As far as I understand calling close() in the ioThread is not
> > >>common,
> > >> >>as
> > >> >> >it
> > >> >> >may only trigger when we saw some non-retriable error. Hence when
> > >>user
> > >> >>run
> > >> >> >their program it is unlikely that close() will be triggered and
> > >>problem
> > >> >> >will be detected. So it seems to me that from the error detection
> > >> >>aspect
> > >> >> >these two options seems to be the same as people will usually
> > >>detect it
> > >> >> >from the producer metrics all dropping to 0.
> > >> >> >
> > >> >> >Guozhang
> > >> >> >
> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> > >> >><jq...@linkedin.com.invalid>
> > >> >> >wrote:
> > >> >> >
> > >> >> >> It seems there are two options we can choose from when close()
> is
> > >> >>called
> > >> >> >> from sender thread (callback):
> > >> >> >> 1. Log an error and close the producer using close(-1)
> > >> >> >> 2. Log an error and block.
> > >> >> >> (Throwing an exception will not work because we catch all the
> > >> >>exception
> > >> >> >> thrown from user callback. It will just lead to an error log.)
> > >> >> >>
> > >> >> >> My concern for the first option is that the producer will be
> > >>closed
> > >> >>even
> > >> >> >> if we logged and error. I am wondering if some user would not
> even
> > >> >>take
> > >> >> >>a
> > >> >> >> look at the log if producer is closed normally. Because from the
> > >> >> >>programs
> > >> >> >> behavior, everything looks good. If that is the case, the error
> > >> >>message
> > >> >> >>we
> > >> >> >> logged probably will just be ignored until some day when people
> > >>check
> > >> >> >>the
> > >> >> >> log and see it.
> > >> >> >>
> > >> >> >> As for the second option, because producer does not close but
> > >>blocks.
> > >> >> >>User
> > >> >> >> will notice this the first time they run the program. They
> > >>probably
> > >> >>will
> > >> >> >> look at the log to see why producer could not be closed and they
> > >>will
> > >> >> >>see
> > >> >> >> the error log we put there. So they will get informed about this
> > >> >> >>mis-usage
> > >> >> >> of close() in sender thread the first time they run the code
> > >>instead
> > >> >>of
> > >> >> >> some time later.
> > >> >> >>
> > >> >> >> Personally I prefer the second one because it is more obvious
> that
> > >> >> >> something was wrong.
> > >> >> >>
> > >> >> >> Jiangjie (Becket) Qin
> > >> >> >>
> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com>
> wrote:
> > >> >> >>
> > >> >> >> >Yeah I agree we should not silently change the behavior of the
> > >> >>function
> > >> >> >> >with the given parameters; and I would prefer
> > >> >> >>error-logging-and-shutdown
> > >> >> >> >over blocking when close(>0) is used, since as Neha suggested
> > >> >>blocking
> > >> >> >> >would also not proceed with sending any data, bu will just let
> > >> >>users to
> > >> >> >> >realize the issue later than sooner.
> > >> >> >> >
> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
> > >><ne...@confluent.io>
> > >> >> >>wrote:
> > >> >> >> >
> > >> >> >> >> >
> > >> >> >> >> > And I also agree it is better if we can make producer block
> > >>when
> > >> >> >> >> > close() is called from sender thread so user will notice
> > >> >>something
> > >> >> >> >>went
> > >> >> >> >> > wrong.
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >> This isn't a great experience either. Why can't we just throw
> > >>an
> > >> >> >> >>exception
> > >> >> >> >> for a behavior we know is incorrect and we'd like the user to
> > >> >>know.
> > >> >> >> >> Blocking as a means of doing that seems wrong and annoying.
> > >> >> >> >>
> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
> > >><ja...@gmail.com>
> > >> >> >> wrote:
> > >> >> >> >>
> > >> >> >> >> > Cool.
> > >> >> >> >> >
> > >> >> >> >> > I think blocking is good or alternately throwing an
> exception
> > >> >> >>directly
> > >> >> >> >> from
> > >> >> >> >> > close(). Basically I would just worry about subtly doing
> > >> >>something
> > >> >> >> >> slightly
> > >> >> >> >> > different from what the user asked for as it will be hard
> to
> > >> >>notice
> > >> >> >> >>that
> > >> >> >> >> > behavior difference.
> > >> >> >> >> >
> > >> >> >> >> > -Jay
> > >> >> >> >> >
> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> > >> >> >> >><jqin@linkedin.com.invalid
> > >> >> >> >> >
> > >> >> >> >> > wrote:
> > >> >> >> >> >
> > >> >> >> >> > > Hi Jay,
> > >> >> >> >> > >
> > >> >> >> >> > > I have modified the KIP as you suggested. I thinks as
> long
> > >>as
> > >> >>we
> > >> >> >> >>have
> > >> >> >> >> > > consistent define for timeout across Kafka interface,
> there
> > >> >>would
> > >> >> >> >>be no
> > >> >> >> >> > > problem. And I also agree it is better if we can make
> > >>producer
> > >> >> >>block
> > >> >> >> >> when
> > >> >> >> >> > > close() is called from sender thread so user will notice
> > >> >> >>something
> > >> >> >> >>went
> > >> >> >> >> > > wrong.
> > >> >> >> >> > >
> > >> >> >> >> > > Thanks.
> > >> >> >> >> > >
> > >> >> >> >> > > Jiangjie (Becket) Qin
> > >> >> >> >> > >
> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com>
> > >> wrote:
> > >> >> >> >> > >
> > >> >> >> >> > > >Hey Jiangjie,
> > >> >> >> >> > > >
> > >> >> >> >> > > >I think this is going to be very confusing that
> > >> >> >> >> > > >  close(0) waits indefinitely and
> > >> >> >> >> > > >  close(-1) waits for 0.
> > >> >> >> >> > > >I understand this appears in other apis, but it is a
> > >>constant
> > >> >> >> >>cause of
> > >> >> >> >> > > >bugs. Let's not repeat that mistake.
> > >> >> >> >> > > >
> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't need a way to
> > >>wait
> > >> >> >> >> indefinitely
> > >> >> >> >> > > >as
> > >> >> >> >> > > >we already have close() so having a magical constant for
> > >> >>that is
> > >> >> >> >> > > >redundant.
> > >> >> >> >> > > >
> > >> >> >> >> > > >Calling close() from the I/O thread was already possible
> > >>and
> > >> >> >>would
> > >> >> >> >> block
> > >> >> >> >> > > >indefinitely. I think trying to silently change the
> > >>behavior
> > >> >>is
> > >> >> >> >> probably
> > >> >> >> >> > > >not right. I.e. if the user calls close() in the
> callback
> > >> >>there
> > >> >> >>is
> > >> >> >> >> > > >actually
> > >> >> >> >> > > >some misunderstanding and they need to think more,
> > >>silently
> > >> >> >>making
> > >> >> >> >> this
> > >> >> >> >> > > >not
> > >> >> >> >> > > >block will hide the problem from them which is the
> > >>opposite
> > >> >>of
> > >> >> >> >>what we
> > >> >> >> >> > > >want.
> > >> >> >> >> > > >
> > >> >> >> >> > > >-Jay
> > >> >> >> >> > > >
> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> > >> >> >> >> <jqin@linkedin.com.invalid
> > >> >> >> >> > >
> > >> >> >> >> > > >wrote:
> > >> >> >> >> > > >
> > >> >> >> >> > > >> Hey Joe & Jay,
> > >> >> >> >> > > >>
> > >> >> >> >> > > >> Thanks for the comments on the voting thread. Since it
> > >> >>seems
> > >> >> >>we
> > >> >> >> >> > probably
> > >> >> >> >> > > >> will have more discussion on this, I am just replying
> > >>from
> > >> >>the
> > >> >> >> >> > > >>discussion
> > >> >> >> >> > > >> thread here.
> > >> >> >> >> > > >> I’ve updated the KIP page to make it less like
> > >>half-baked,
> > >> >> >> >>apologize
> > >> >> >> >> > for
> > >> >> >> >> > > >> the rush...
> > >> >> >> >> > > >>
> > >> >> >> >> > > >> The contract in current KIP is:
> > >> >> >> >> > > >>   1. close() - wait until all requests either are sent
> > >>or
> > >> >> >>reach
> > >> >> >> >> > request
> > >> >> >> >> > > >> timeout.
> > >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close
> > >>immediately
> > >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to
> > >> >>close(),
> > >> >> >> >>i.e.
> > >> >> >> >> > Wait
> > >> >> >> >> > > >> until all requests are sent or reach request timeout
> > >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to
> > >> >>finish
> > >> >> >> >> sending
> > >> >> >> >> > > >>in 5
> > >> >> >> >> > > >> milliseconds, if something went wrong, just shutdown
> the
> > >> >> >>producer
> > >> >> >> >> > > >>anyway,
> > >> >> >> >> > > >> my callback will handle the failures.
> > >> >> >> >> > > >>
> > >> >> >> >> > > >> About how we define what timeout value stands for, I
> > >> >>actually
> > >> >> >> >> > struggled
> > >> >> >> >> > > >>a
> > >> >> >> >> > > >> little bit when wrote the patch. Intuitively, close(0)
> > >> >>should
> > >> >> >> >>mean
> > >> >> >> >> > > >> immediately, however it seems that all the existing
> java
> > >> >>class
> > >> >> >> >>have
> > >> >> >> >> > this
> > >> >> >> >> > > >> convention of timeout=0 means no timeout or never
> > >>timeout
> > >> >> >> >> > > >>(Thread.join(0),
> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is either we
> > >> >>follow
> > >> >> >>the
> > >> >> >> >> > > >> intuition or we follow the convention. What I chose is
> > >>to
> > >> >> >>follow
> > >> >> >> >>the
> > >> >> >> >> > > >> convention but document the interface to let user be
> > >>aware
> > >> >>of
> > >> >> >>the
> > >> >> >> >> > usage.
> > >> >> >> >> > > >> The reason is that I think producer.close() is a
> public
> > >> >> >> >>interface so
> > >> >> >> >> > it
> > >> >> >> >> > > >> might be better to follow java convention. Whereas
> > >> >>selector is
> > >> >> >> >>not a
> > >> >> >> >> > > >> public interface that used by user, so as long as it
> > >>makes
> > >> >> >>sense
> > >> >> >> >>to
> > >> >> >> >> > us,
> > >> >> >> >> > > >>it
> > >> >> >> >> > > >> is less a problem to be different from java
> convention.
> > >> >>That
> > >> >> >>said
> > >> >> >> >> > since
> > >> >> >> >> > > >> consumer.poll(timeout) is also a public interface, I
> > >>think
> > >> >>it
> > >> >> >> >>also
> > >> >> >> >> > makes
> > >> >> >> >> > > >> sense to make producer.close() to have the same
> > >>definition
> > >> >>of
> > >> >> >> >> > > >> consumer.poll(timeout).
> > >> >> >> >> > > >>
> > >> >> >> >> > > >> The main argument for keeping a timeout in close would
> > >>be
> > >> >> >> >>separating
> > >> >> >> >> > the
> > >> >> >> >> > > >> close timeout from request timeout, which probably
> makes
> > >> >> >>sense. I
> > >> >> >> >> > would
> > >> >> >> >> > > >> guess typically the request timeout would be long
> (e.g.
> > >>60
> > >> >> >> >>seconds)
> > >> >> >> >> > > >> because we might want to consider retries with back
> off
> > >> >>time.
> > >> >> >>If
> > >> >> >> >>we
> > >> >> >> >> > have
> > >> >> >> >> > > >> multiple batches in accumulator, in worst case that
> > >>could
> > >> >>take
> > >> >> >> >>up to
> > >> >> >> >> > > >> several minutes to complete all the requests. But when
> > >>we
> > >> >> >>close a
> > >> >> >> >> > > >> producer, we might not want to wait for that long as
> it
> > >> >>might
> > >> >> >> >>cause
> > >> >> >> >> > some
> > >> >> >> >> > > >> other problem like deployment tool timeout.
> > >> >> >> >> > > >>
> > >> >> >> >> > > >> There is also a subtle difference between
> close(timeout)
> > >> >>and
> > >> >> >> >> > > >> flush(timeout). The only purpose for flush() is to
> write
> > >> >>data
> > >> >> >>to
> > >> >> >> >>the
> > >> >> >> >> > > >> broker, so it makes perfect sense to wait until
> request
> > >> >> >>timeout.
> > >> >> >> >>I
> > >> >> >> >> > think
> > >> >> >> >> > > >> that is why flush(timeout) looks strange. On the other
> > >> >>hand,
> > >> >> >>the
> > >> >> >> >>top
> > >> >> >> >> > > >> priority for close() is to close the producer rather
> > >>than
> > >> >> >>flush()
> > >> >> >> >> > data,
> > >> >> >> >> > > >>so
> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded waiting for
> > >>its
> > >> >>main
> > >> >> >> >>job.
> > >> >> >> >> > > >>
> > >> >> >> >> > > >> Sorry for the confusion about forceClose flag. It is
> > >>not a
> > >> >> >>public
> > >> >> >> >> > > >> interface. I mentioned it in Proposed Changes section
> > >> >>which I
> > >> >> >> >> thought
> > >> >> >> >> > > >>was
> > >> >> >> >> > > >> supposed to provide implementation details.
> > >> >> >> >> > > >>
> > >> >> >> >> > > >> Thanks again for all the comments and suggestions!
> > >> >> >> >> > > >>
> > >> >> >> >> > > >> Jiangjie (Becket) Qin
> > >> >> >> >> > > >>
> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <
> jqin@linkedin.com>
> > >> >> wrote:
> > >> >> >> >> > > >>
> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s comments.
> > >> >> >> >> > > >> >I¹d like to initiate the voting process if no further
> > >> >> >>comments
> > >> >> >> >>are
> > >> >> >> >> > > >> >received by tomorrow.
> > >> >> >> >> > > >> >
> > >> >> >> >> > > >> >Jiangjie (Becket) Qin
> > >> >> >> >> > > >> >
> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <jay.kreps@gmail.com
> >
> > >> >>wrote:
> > >> >> >> >> > > >> >
> > >> >> >> >> > > >> >>Hey Jiangjie,
> > >> >> >> >> > > >> >>
> > >> >> >> >> > > >> >>Can you capture the full motivation and use cases
> for
> > >>the
> > >> >> >> >>feature?
> > >> >> >> >> > > >>This
> > >> >> >> >> > > >> >>mentions your interest in having a way of aborting
> > >>from
> > >> >> >>inside
> > >> >> >> >>the
> > >> >> >> >> > > >> >>Callback. But it doesn't really explain that usage
> or
> > >>why
> > >> >> >>other
> > >> >> >> >> > people
> > >> >> >> >> > > >> >>would want to do that. It also doesn't list the
> > >>primary
> > >> >>use
> > >> >> >> >>case
> > >> >> >> >> for
> > >> >> >> >> > > >> >>having
> > >> >> >> >> > > >> >>close with a bounded timeout which was to avoid
> > >>blocking
> > >> >>too
> > >> >> >> >>long
> > >> >> >> >> on
> > >> >> >> >> > > >> >>shutdown.
> > >> >> >> >> > > >> >>
> > >> >> >> >> > > >> >>-Jay
> > >> >> >> >> > > >> >>
> > >> >> >> >> > > >> >>
> > >> >> >> >> > > >> >>
> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> > >> >> >> >> > > >><jqin@linkedin.com.invalid
> > >> >> >> >> > > >> >
> > >> >> >> >> > > >> >>wrote:
> > >> >> >> >> > > >> >>
> > >> >> >> >> > > >> >>> Hi,
> > >> >> >> >> > > >> >>>
> > >> >> >> >> > > >> >>> I just created a KIP for adding a close(timeout)
> to
> > >>new
> > >> >> >> >> producer.
> > >> >> >> >> > > >>Most
> > >> >> >> >> > > >> >>>of
> > >> >> >> >> > > >> >>> the previous discussions are in KAFKA-1660 where
> > >>Parth
> > >> >> >> >> Brahmbhatt
> > >> >> >> >> > > >>has
> > >> >> >> >> > > >> >>> already done a lot of work.
> > >> >> >> >> > > >> >>> Since this is an interface change so we are going
> > >> >>through
> > >> >> >>the
> > >> >> >> >> KIP
> > >> >> >> >> > > >> >>>process.
> > >> >> >> >> > > >> >>> Here is the KIP link:
> > >> >> >> >> > > >> >>>
> > >> >> >> >> > > >> >>>
> > >> >> >> >> > > >>
> > >> >> >> >> > >
> > >> >> >> >>
> > >> >> >>
> > >> >>
> > >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> > >> >> >> >> > > >> >>>2
> > >> >> >> >> > > >> >>>
> > >> >> >> >> > > >> >>> Thanks.
> > >> >> >> >> > > >> >>>
> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> > >> >> >> >> > > >> >>>
> > >> >> >> >> > > >> >
> > >> >> >> >> > > >>
> > >> >> >> >> > > >>
> > >> >> >> >> > >
> > >> >> >> >> > >
> > >> >> >> >> >
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >> --
> > >> >> >> >> Thanks,
> > >> >> >> >> Neha
> > >> >> >> >>
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >--
> > >> >> >> >-- Guozhang
> > >> >> >>
> > >> >> >>
> > >> >> >
> > >> >> >
> > >> >> >--
> > >> >> >-- Guozhang
> > >> >>
> > >> >>
> > >> >
> > >> >
> > >> >--
> > >> >-- Guozhang
> > >>
> > >>
> >
> >
>



-- 
-- Guozhang

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jun Rao <ju...@confluent.io>.

Hmm, does that mean that after close(0), the sender thread is not necessary
gone? Normally, after closing an entity, we expect all internal threads
associated with the entity are shut down completely.

Thanks,

Jun

On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> Hi Jun,
>
> Close(0) will set two flags in sender. Running=false and a newly added
> forceClose=true. It will also set accumulator.closed=true so no further
> producer.send() will succeed.
> The sender thread will finish executing all the callbacks in current batch
> of responses, then it will see the forceClose flag. It will just fail all
> the incomplete batches in the producer and exit.
> So close(0) is a non-blocking call and sender thread will not try to join
> itself in close(0).
>
> Thanks.
>
> Jiangjie (Becket) Qin
>
> On 3/16/15, 2:50 PM, "Jun Rao" <ju...@confluent.io> wrote:
>
> >How does close(0) work if it's called from the sender thread? If close(0)
> >needs to wait for the sender thread to join, wouldn't this cause a
> >deadlock?
> >
> >Thanks,
> >
> >Jun
> >
> >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
> >wrote:
> >
> >> Thanks Guozhang. It wouldn’t be as thoroughly considered without
> >> discussing with you :)
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
> >>
> >> >Thanks Jiangjie,
> >> >
> >> >After talking to you offline on this, I have been convinced and
> >>changed my
> >> >preference to blocking. The immediate shutdown approach does have some
> >> >unsafeness in some cases.
> >> >
> >> >Guozhang
> >> >
> >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
> >><jqin@linkedin.com.invalid
> >> >
> >> >wrote:
> >> >
> >> >> It looks that the problem we want to solve and the purpose we want to
> >> >> achieve is:
> >> >> If user uses close() in callback, we want to let user be aware that
> >>they
> >> >> should use close(0) instead of close() in the callback.
> >> >>
> >> >> We have agreed that we will have an error log to inform user about
> >>this
> >> >> mis-usage. The options differ in the way how we can force user to
> >>take a
> >> >> look at that error log.
> >> >> There are two scenarios:
> >> >> 1. User does not expect the program to exit.
> >> >> 2. User expect the program to exit.
> >> >>
> >> >> For scenario 1), blocking will probably delay the discovery of the
> >> >> problem. Calling close(0) exposes the problem quicker. In this
> >>scenario
> >> >> producer just encounter a send failure when running normally.
> >> >> For scenario 2), blocking will expose the problem quick. Calling
> >> >>close(-1)
> >> >> might hide the problem. This scenario might include: a) Unit test
> >>for a
> >> >> send failure. b) Message sending during a close() call from a user
> >> >>thread.
> >> >>
> >> >> So as a summary table:
> >> >>
> >> >>                   Scenario 1)                         Scenario 2)
> >> >>
> >> >> Blocking      Delay problem discovery         Guaranteed problem
> >> >>discovery
> >> >>
> >> >> Close(-1)     Immediate problem discovery     Problem might be hidden
> >> >>
> >> >>
> >> >> Personally I prefer blocking because it seems providing more
> >>guarantees
> >> >> and safer.
> >> >>
> >> >> Thanks.
> >> >>
> >> >> Jiangjie (Becket) Qin
> >> >>
> >> >>
> >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wa...@gmail.com> wrote:
> >> >>
> >> >> >HI Jiangjie,
> >> >> >
> >> >> >As far as I understand calling close() in the ioThread is not
> >>common,
> >> >>as
> >> >> >it
> >> >> >may only trigger when we saw some non-retriable error. Hence when
> >>user
> >> >>run
> >> >> >their program it is unlikely that close() will be triggered and
> >>problem
> >> >> >will be detected. So it seems to me that from the error detection
> >> >>aspect
> >> >> >these two options seems to be the same as people will usually
> >>detect it
> >> >> >from the producer metrics all dropping to 0.
> >> >> >
> >> >> >Guozhang
> >> >> >
> >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> >> >><jq...@linkedin.com.invalid>
> >> >> >wrote:
> >> >> >
> >> >> >> It seems there are two options we can choose from when close() is
> >> >>called
> >> >> >> from sender thread (callback):
> >> >> >> 1. Log an error and close the producer using close(-1)
> >> >> >> 2. Log an error and block.
> >> >> >> (Throwing an exception will not work because we catch all the
> >> >>exception
> >> >> >> thrown from user callback. It will just lead to an error log.)
> >> >> >>
> >> >> >> My concern for the first option is that the producer will be
> >>closed
> >> >>even
> >> >> >> if we logged and error. I am wondering if some user would not even
> >> >>take
> >> >> >>a
> >> >> >> look at the log if producer is closed normally. Because from the
> >> >> >>programs
> >> >> >> behavior, everything looks good. If that is the case, the error
> >> >>message
> >> >> >>we
> >> >> >> logged probably will just be ignored until some day when people
> >>check
> >> >> >>the
> >> >> >> log and see it.
> >> >> >>
> >> >> >> As for the second option, because producer does not close but
> >>blocks.
> >> >> >>User
> >> >> >> will notice this the first time they run the program. They
> >>probably
> >> >>will
> >> >> >> look at the log to see why producer could not be closed and they
> >>will
> >> >> >>see
> >> >> >> the error log we put there. So they will get informed about this
> >> >> >>mis-usage
> >> >> >> of close() in sender thread the first time they run the code
> >>instead
> >> >>of
> >> >> >> some time later.
> >> >> >>
> >> >> >> Personally I prefer the second one because it is more obvious that
> >> >> >> something was wrong.
> >> >> >>
> >> >> >> Jiangjie (Becket) Qin
> >> >> >>
> >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
> >> >> >>
> >> >> >> >Yeah I agree we should not silently change the behavior of the
> >> >>function
> >> >> >> >with the given parameters; and I would prefer
> >> >> >>error-logging-and-shutdown
> >> >> >> >over blocking when close(>0) is used, since as Neha suggested
> >> >>blocking
> >> >> >> >would also not proceed with sending any data, bu will just let
> >> >>users to
> >> >> >> >realize the issue later than sooner.
> >> >> >> >
> >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
> >><ne...@confluent.io>
> >> >> >>wrote:
> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > And I also agree it is better if we can make producer block
> >>when
> >> >> >> >> > close() is called from sender thread so user will notice
> >> >>something
> >> >> >> >>went
> >> >> >> >> > wrong.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> This isn't a great experience either. Why can't we just throw
> >>an
> >> >> >> >>exception
> >> >> >> >> for a behavior we know is incorrect and we'd like the user to
> >> >>know.
> >> >> >> >> Blocking as a means of doing that seems wrong and annoying.
> >> >> >> >>
> >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
> >><ja...@gmail.com>
> >> >> >> wrote:
> >> >> >> >>
> >> >> >> >> > Cool.
> >> >> >> >> >
> >> >> >> >> > I think blocking is good or alternately throwing an exception
> >> >> >>directly
> >> >> >> >> from
> >> >> >> >> > close(). Basically I would just worry about subtly doing
> >> >>something
> >> >> >> >> slightly
> >> >> >> >> > different from what the user asked for as it will be hard to
> >> >>notice
> >> >> >> >>that
> >> >> >> >> > behavior difference.
> >> >> >> >> >
> >> >> >> >> > -Jay
> >> >> >> >> >
> >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> >> >> >> >><jqin@linkedin.com.invalid
> >> >> >> >> >
> >> >> >> >> > wrote:
> >> >> >> >> >
> >> >> >> >> > > Hi Jay,
> >> >> >> >> > >
> >> >> >> >> > > I have modified the KIP as you suggested. I thinks as long
> >>as
> >> >>we
> >> >> >> >>have
> >> >> >> >> > > consistent define for timeout across Kafka interface, there
> >> >>would
> >> >> >> >>be no
> >> >> >> >> > > problem. And I also agree it is better if we can make
> >>producer
> >> >> >>block
> >> >> >> >> when
> >> >> >> >> > > close() is called from sender thread so user will notice
> >> >> >>something
> >> >> >> >>went
> >> >> >> >> > > wrong.
> >> >> >> >> > >
> >> >> >> >> > > Thanks.
> >> >> >> >> > >
> >> >> >> >> > > Jiangjie (Becket) Qin
> >> >> >> >> > >
> >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com>
> >> wrote:
> >> >> >> >> > >
> >> >> >> >> > > >Hey Jiangjie,
> >> >> >> >> > > >
> >> >> >> >> > > >I think this is going to be very confusing that
> >> >> >> >> > > >  close(0) waits indefinitely and
> >> >> >> >> > > >  close(-1) waits for 0.
> >> >> >> >> > > >I understand this appears in other apis, but it is a
> >>constant
> >> >> >> >>cause of
> >> >> >> >> > > >bugs. Let's not repeat that mistake.
> >> >> >> >> > > >
> >> >> >> >> > > >Let's make close(0) wait for 0. We don't need a way to
> >>wait
> >> >> >> >> indefinitely
> >> >> >> >> > > >as
> >> >> >> >> > > >we already have close() so having a magical constant for
> >> >>that is
> >> >> >> >> > > >redundant.
> >> >> >> >> > > >
> >> >> >> >> > > >Calling close() from the I/O thread was already possible
> >>and
> >> >> >>would
> >> >> >> >> block
> >> >> >> >> > > >indefinitely. I think trying to silently change the
> >>behavior
> >> >>is
> >> >> >> >> probably
> >> >> >> >> > > >not right. I.e. if the user calls close() in the callback
> >> >>there
> >> >> >>is
> >> >> >> >> > > >actually
> >> >> >> >> > > >some misunderstanding and they need to think more,
> >>silently
> >> >> >>making
> >> >> >> >> this
> >> >> >> >> > > >not
> >> >> >> >> > > >block will hide the problem from them which is the
> >>opposite
> >> >>of
> >> >> >> >>what we
> >> >> >> >> > > >want.
> >> >> >> >> > > >
> >> >> >> >> > > >-Jay
> >> >> >> >> > > >
> >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> >> >> >> >> <jqin@linkedin.com.invalid
> >> >> >> >> > >
> >> >> >> >> > > >wrote:
> >> >> >> >> > > >
> >> >> >> >> > > >> Hey Joe & Jay,
> >> >> >> >> > > >>
> >> >> >> >> > > >> Thanks for the comments on the voting thread. Since it
> >> >>seems
> >> >> >>we
> >> >> >> >> > probably
> >> >> >> >> > > >> will have more discussion on this, I am just replying
> >>from
> >> >>the
> >> >> >> >> > > >>discussion
> >> >> >> >> > > >> thread here.
> >> >> >> >> > > >> I’ve updated the KIP page to make it less like
> >>half-baked,
> >> >> >> >>apologize
> >> >> >> >> > for
> >> >> >> >> > > >> the rush...
> >> >> >> >> > > >>
> >> >> >> >> > > >> The contract in current KIP is:
> >> >> >> >> > > >>   1. close() - wait until all requests either are sent
> >>or
> >> >> >>reach
> >> >> >> >> > request
> >> >> >> >> > > >> timeout.
> >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close
> >>immediately
> >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to
> >> >>close(),
> >> >> >> >>i.e.
> >> >> >> >> > Wait
> >> >> >> >> > > >> until all requests are sent or reach request timeout
> >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to
> >> >>finish
> >> >> >> >> sending
> >> >> >> >> > > >>in 5
> >> >> >> >> > > >> milliseconds, if something went wrong, just shutdown the
> >> >> >>producer
> >> >> >> >> > > >>anyway,
> >> >> >> >> > > >> my callback will handle the failures.
> >> >> >> >> > > >>
> >> >> >> >> > > >> About how we define what timeout value stands for, I
> >> >>actually
> >> >> >> >> > struggled
> >> >> >> >> > > >>a
> >> >> >> >> > > >> little bit when wrote the patch. Intuitively, close(0)
> >> >>should
> >> >> >> >>mean
> >> >> >> >> > > >> immediately, however it seems that all the existing java
> >> >>class
> >> >> >> >>have
> >> >> >> >> > this
> >> >> >> >> > > >> convention of timeout=0 means no timeout or never
> >>timeout
> >> >> >> >> > > >>(Thread.join(0),
> >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is either we
> >> >>follow
> >> >> >>the
> >> >> >> >> > > >> intuition or we follow the convention. What I chose is
> >>to
> >> >> >>follow
> >> >> >> >>the
> >> >> >> >> > > >> convention but document the interface to let user be
> >>aware
> >> >>of
> >> >> >>the
> >> >> >> >> > usage.
> >> >> >> >> > > >> The reason is that I think producer.close() is a public
> >> >> >> >>interface so
> >> >> >> >> > it
> >> >> >> >> > > >> might be better to follow java convention. Whereas
> >> >>selector is
> >> >> >> >>not a
> >> >> >> >> > > >> public interface that used by user, so as long as it
> >>makes
> >> >> >>sense
> >> >> >> >>to
> >> >> >> >> > us,
> >> >> >> >> > > >>it
> >> >> >> >> > > >> is less a problem to be different from java convention.
> >> >>That
> >> >> >>said
> >> >> >> >> > since
> >> >> >> >> > > >> consumer.poll(timeout) is also a public interface, I
> >>think
> >> >>it
> >> >> >> >>also
> >> >> >> >> > makes
> >> >> >> >> > > >> sense to make producer.close() to have the same
> >>definition
> >> >>of
> >> >> >> >> > > >> consumer.poll(timeout).
> >> >> >> >> > > >>
> >> >> >> >> > > >> The main argument for keeping a timeout in close would
> >>be
> >> >> >> >>separating
> >> >> >> >> > the
> >> >> >> >> > > >> close timeout from request timeout, which probably makes
> >> >> >>sense. I
> >> >> >> >> > would
> >> >> >> >> > > >> guess typically the request timeout would be long (e.g.
> >>60
> >> >> >> >>seconds)
> >> >> >> >> > > >> because we might want to consider retries with back off
> >> >>time.
> >> >> >>If
> >> >> >> >>we
> >> >> >> >> > have
> >> >> >> >> > > >> multiple batches in accumulator, in worst case that
> >>could
> >> >>take
> >> >> >> >>up to
> >> >> >> >> > > >> several minutes to complete all the requests. But when
> >>we
> >> >> >>close a
> >> >> >> >> > > >> producer, we might not want to wait for that long as it
> >> >>might
> >> >> >> >>cause
> >> >> >> >> > some
> >> >> >> >> > > >> other problem like deployment tool timeout.
> >> >> >> >> > > >>
> >> >> >> >> > > >> There is also a subtle difference between close(timeout)
> >> >>and
> >> >> >> >> > > >> flush(timeout). The only purpose for flush() is to write
> >> >>data
> >> >> >>to
> >> >> >> >>the
> >> >> >> >> > > >> broker, so it makes perfect sense to wait until request
> >> >> >>timeout.
> >> >> >> >>I
> >> >> >> >> > think
> >> >> >> >> > > >> that is why flush(timeout) looks strange. On the other
> >> >>hand,
> >> >> >>the
> >> >> >> >>top
> >> >> >> >> > > >> priority for close() is to close the producer rather
> >>than
> >> >> >>flush()
> >> >> >> >> > data,
> >> >> >> >> > > >>so
> >> >> >> >> > > >> close(timeout) gives guarantee on bounded waiting for
> >>its
> >> >>main
> >> >> >> >>job.
> >> >> >> >> > > >>
> >> >> >> >> > > >> Sorry for the confusion about forceClose flag. It is
> >>not a
> >> >> >>public
> >> >> >> >> > > >> interface. I mentioned it in Proposed Changes section
> >> >>which I
> >> >> >> >> thought
> >> >> >> >> > > >>was
> >> >> >> >> > > >> supposed to provide implementation details.
> >> >> >> >> > > >>
> >> >> >> >> > > >> Thanks again for all the comments and suggestions!
> >> >> >> >> > > >>
> >> >> >> >> > > >> Jiangjie (Becket) Qin
> >> >> >> >> > > >>
> >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com>
> >> >> wrote:
> >> >> >> >> > > >>
> >> >> >> >> > > >> >The KIP page has been updated per Jay¹s comments.
> >> >> >> >> > > >> >I¹d like to initiate the voting process if no further
> >> >> >>comments
> >> >> >> >>are
> >> >> >> >> > > >> >received by tomorrow.
> >> >> >> >> > > >> >
> >> >> >> >> > > >> >Jiangjie (Becket) Qin
> >> >> >> >> > > >> >
> >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com>
> >> >>wrote:
> >> >> >> >> > > >> >
> >> >> >> >> > > >> >>Hey Jiangjie,
> >> >> >> >> > > >> >>
> >> >> >> >> > > >> >>Can you capture the full motivation and use cases for
> >>the
> >> >> >> >>feature?
> >> >> >> >> > > >>This
> >> >> >> >> > > >> >>mentions your interest in having a way of aborting
> >>from
> >> >> >>inside
> >> >> >> >>the
> >> >> >> >> > > >> >>Callback. But it doesn't really explain that usage or
> >>why
> >> >> >>other
> >> >> >> >> > people
> >> >> >> >> > > >> >>would want to do that. It also doesn't list the
> >>primary
> >> >>use
> >> >> >> >>case
> >> >> >> >> for
> >> >> >> >> > > >> >>having
> >> >> >> >> > > >> >>close with a bounded timeout which was to avoid
> >>blocking
> >> >>too
> >> >> >> >>long
> >> >> >> >> on
> >> >> >> >> > > >> >>shutdown.
> >> >> >> >> > > >> >>
> >> >> >> >> > > >> >>-Jay
> >> >> >> >> > > >> >>
> >> >> >> >> > > >> >>
> >> >> >> >> > > >> >>
> >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> >> >> >> >> > > >><jqin@linkedin.com.invalid
> >> >> >> >> > > >> >
> >> >> >> >> > > >> >>wrote:
> >> >> >> >> > > >> >>
> >> >> >> >> > > >> >>> Hi,
> >> >> >> >> > > >> >>>
> >> >> >> >> > > >> >>> I just created a KIP for adding a close(timeout) to
> >>new
> >> >> >> >> producer.
> >> >> >> >> > > >>Most
> >> >> >> >> > > >> >>>of
> >> >> >> >> > > >> >>> the previous discussions are in KAFKA-1660 where
> >>Parth
> >> >> >> >> Brahmbhatt
> >> >> >> >> > > >>has
> >> >> >> >> > > >> >>> already done a lot of work.
> >> >> >> >> > > >> >>> Since this is an interface change so we are going
> >> >>through
> >> >> >>the
> >> >> >> >> KIP
> >> >> >> >> > > >> >>>process.
> >> >> >> >> > > >> >>> Here is the KIP link:
> >> >> >> >> > > >> >>>
> >> >> >> >> > > >> >>>
> >> >> >> >> > > >>
> >> >> >> >> > >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >> >> >> >> > > >> >>>2
> >> >> >> >> > > >> >>>
> >> >> >> >> > > >> >>> Thanks.
> >> >> >> >> > > >> >>>
> >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> >> >> >> >> > > >> >>>
> >> >> >> >> > > >> >
> >> >> >> >> > > >>
> >> >> >> >> > > >>
> >> >> >> >> > >
> >> >> >> >> > >
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> Thanks,
> >> >> >> >> Neha
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >--
> >> >> >> >-- Guozhang
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >> >--
> >> >> >-- Guozhang
> >> >>
> >> >>
> >> >
> >> >
> >> >--
> >> >-- Guozhang
> >>
> >>
>
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Hi Jun,

Close(0) will set two flags in sender. Running=false and a newly added
forceClose=true. It will also set accumulator.closed=true so no further
producer.send() will succeed.
The sender thread will finish executing all the callbacks in current batch
of responses, then it will see the forceClose flag. It will just fail all
the incomplete batches in the producer and exit.
So close(0) is a non-blocking call and sender thread will not try to join
itself in close(0).

Thanks.

Jiangjie (Becket) Qin

On 3/16/15, 2:50 PM, "Jun Rao" <ju...@confluent.io> wrote:

>How does close(0) work if it's called from the sender thread? If close(0)
>needs to wait for the sender thread to join, wouldn't this cause a
>deadlock?
>
>Thanks,
>
>Jun
>
>On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
>wrote:
>
>> Thanks Guozhang. It wouldn’t be as thoroughly considered without
>> discussing with you :)
>>
>> Jiangjie (Becket) Qin
>>
>> On 3/16/15, 1:07 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
>>
>> >Thanks Jiangjie,
>> >
>> >After talking to you offline on this, I have been convinced and
>>changed my
>> >preference to blocking. The immediate shutdown approach does have some
>> >unsafeness in some cases.
>> >
>> >Guozhang
>> >
>> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
>><jqin@linkedin.com.invalid
>> >
>> >wrote:
>> >
>> >> It looks that the problem we want to solve and the purpose we want to
>> >> achieve is:
>> >> If user uses close() in callback, we want to let user be aware that
>>they
>> >> should use close(0) instead of close() in the callback.
>> >>
>> >> We have agreed that we will have an error log to inform user about
>>this
>> >> mis-usage. The options differ in the way how we can force user to
>>take a
>> >> look at that error log.
>> >> There are two scenarios:
>> >> 1. User does not expect the program to exit.
>> >> 2. User expect the program to exit.
>> >>
>> >> For scenario 1), blocking will probably delay the discovery of the
>> >> problem. Calling close(0) exposes the problem quicker. In this
>>scenario
>> >> producer just encounter a send failure when running normally.
>> >> For scenario 2), blocking will expose the problem quick. Calling
>> >>close(-1)
>> >> might hide the problem. This scenario might include: a) Unit test
>>for a
>> >> send failure. b) Message sending during a close() call from a user
>> >>thread.
>> >>
>> >> So as a summary table:
>> >>
>> >>                   Scenario 1)                         Scenario 2)
>> >>
>> >> Blocking      Delay problem discovery         Guaranteed problem
>> >>discovery
>> >>
>> >> Close(-1)     Immediate problem discovery     Problem might be hidden
>> >>
>> >>
>> >> Personally I prefer blocking because it seems providing more
>>guarantees
>> >> and safer.
>> >>
>> >> Thanks.
>> >>
>> >> Jiangjie (Becket) Qin
>> >>
>> >>
>> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wa...@gmail.com> wrote:
>> >>
>> >> >HI Jiangjie,
>> >> >
>> >> >As far as I understand calling close() in the ioThread is not
>>common,
>> >>as
>> >> >it
>> >> >may only trigger when we saw some non-retriable error. Hence when
>>user
>> >>run
>> >> >their program it is unlikely that close() will be triggered and
>>problem
>> >> >will be detected. So it seems to me that from the error detection
>> >>aspect
>> >> >these two options seems to be the same as people will usually
>>detect it
>> >> >from the producer metrics all dropping to 0.
>> >> >
>> >> >Guozhang
>> >> >
>> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
>> >><jq...@linkedin.com.invalid>
>> >> >wrote:
>> >> >
>> >> >> It seems there are two options we can choose from when close() is
>> >>called
>> >> >> from sender thread (callback):
>> >> >> 1. Log an error and close the producer using close(-1)
>> >> >> 2. Log an error and block.
>> >> >> (Throwing an exception will not work because we catch all the
>> >>exception
>> >> >> thrown from user callback. It will just lead to an error log.)
>> >> >>
>> >> >> My concern for the first option is that the producer will be
>>closed
>> >>even
>> >> >> if we logged and error. I am wondering if some user would not even
>> >>take
>> >> >>a
>> >> >> look at the log if producer is closed normally. Because from the
>> >> >>programs
>> >> >> behavior, everything looks good. If that is the case, the error
>> >>message
>> >> >>we
>> >> >> logged probably will just be ignored until some day when people
>>check
>> >> >>the
>> >> >> log and see it.
>> >> >>
>> >> >> As for the second option, because producer does not close but
>>blocks.
>> >> >>User
>> >> >> will notice this the first time they run the program. They
>>probably
>> >>will
>> >> >> look at the log to see why producer could not be closed and they
>>will
>> >> >>see
>> >> >> the error log we put there. So they will get informed about this
>> >> >>mis-usage
>> >> >> of close() in sender thread the first time they run the code
>>instead
>> >>of
>> >> >> some time later.
>> >> >>
>> >> >> Personally I prefer the second one because it is more obvious that
>> >> >> something was wrong.
>> >> >>
>> >> >> Jiangjie (Becket) Qin
>> >> >>
>> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
>> >> >>
>> >> >> >Yeah I agree we should not silently change the behavior of the
>> >>function
>> >> >> >with the given parameters; and I would prefer
>> >> >>error-logging-and-shutdown
>> >> >> >over blocking when close(>0) is used, since as Neha suggested
>> >>blocking
>> >> >> >would also not proceed with sending any data, bu will just let
>> >>users to
>> >> >> >realize the issue later than sooner.
>> >> >> >
>> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
>><ne...@confluent.io>
>> >> >>wrote:
>> >> >> >
>> >> >> >> >
>> >> >> >> > And I also agree it is better if we can make producer block
>>when
>> >> >> >> > close() is called from sender thread so user will notice
>> >>something
>> >> >> >>went
>> >> >> >> > wrong.
>> >> >> >>
>> >> >> >>
>> >> >> >> This isn't a great experience either. Why can't we just throw
>>an
>> >> >> >>exception
>> >> >> >> for a behavior we know is incorrect and we'd like the user to
>> >>know.
>> >> >> >> Blocking as a means of doing that seems wrong and annoying.
>> >> >> >>
>> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
>><ja...@gmail.com>
>> >> >> wrote:
>> >> >> >>
>> >> >> >> > Cool.
>> >> >> >> >
>> >> >> >> > I think blocking is good or alternately throwing an exception
>> >> >>directly
>> >> >> >> from
>> >> >> >> > close(). Basically I would just worry about subtly doing
>> >>something
>> >> >> >> slightly
>> >> >> >> > different from what the user asked for as it will be hard to
>> >>notice
>> >> >> >>that
>> >> >> >> > behavior difference.
>> >> >> >> >
>> >> >> >> > -Jay
>> >> >> >> >
>> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
>> >> >> >><jqin@linkedin.com.invalid
>> >> >> >> >
>> >> >> >> > wrote:
>> >> >> >> >
>> >> >> >> > > Hi Jay,
>> >> >> >> > >
>> >> >> >> > > I have modified the KIP as you suggested. I thinks as long
>>as
>> >>we
>> >> >> >>have
>> >> >> >> > > consistent define for timeout across Kafka interface, there
>> >>would
>> >> >> >>be no
>> >> >> >> > > problem. And I also agree it is better if we can make
>>producer
>> >> >>block
>> >> >> >> when
>> >> >> >> > > close() is called from sender thread so user will notice
>> >> >>something
>> >> >> >>went
>> >> >> >> > > wrong.
>> >> >> >> > >
>> >> >> >> > > Thanks.
>> >> >> >> > >
>> >> >> >> > > Jiangjie (Becket) Qin
>> >> >> >> > >
>> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com>
>> wrote:
>> >> >> >> > >
>> >> >> >> > > >Hey Jiangjie,
>> >> >> >> > > >
>> >> >> >> > > >I think this is going to be very confusing that
>> >> >> >> > > >  close(0) waits indefinitely and
>> >> >> >> > > >  close(-1) waits for 0.
>> >> >> >> > > >I understand this appears in other apis, but it is a
>>constant
>> >> >> >>cause of
>> >> >> >> > > >bugs. Let's not repeat that mistake.
>> >> >> >> > > >
>> >> >> >> > > >Let's make close(0) wait for 0. We don't need a way to
>>wait
>> >> >> >> indefinitely
>> >> >> >> > > >as
>> >> >> >> > > >we already have close() so having a magical constant for
>> >>that is
>> >> >> >> > > >redundant.
>> >> >> >> > > >
>> >> >> >> > > >Calling close() from the I/O thread was already possible
>>and
>> >> >>would
>> >> >> >> block
>> >> >> >> > > >indefinitely. I think trying to silently change the
>>behavior
>> >>is
>> >> >> >> probably
>> >> >> >> > > >not right. I.e. if the user calls close() in the callback
>> >>there
>> >> >>is
>> >> >> >> > > >actually
>> >> >> >> > > >some misunderstanding and they need to think more,
>>silently
>> >> >>making
>> >> >> >> this
>> >> >> >> > > >not
>> >> >> >> > > >block will hide the problem from them which is the
>>opposite
>> >>of
>> >> >> >>what we
>> >> >> >> > > >want.
>> >> >> >> > > >
>> >> >> >> > > >-Jay
>> >> >> >> > > >
>> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
>> >> >> >> <jqin@linkedin.com.invalid
>> >> >> >> > >
>> >> >> >> > > >wrote:
>> >> >> >> > > >
>> >> >> >> > > >> Hey Joe & Jay,
>> >> >> >> > > >>
>> >> >> >> > > >> Thanks for the comments on the voting thread. Since it
>> >>seems
>> >> >>we
>> >> >> >> > probably
>> >> >> >> > > >> will have more discussion on this, I am just replying
>>from
>> >>the
>> >> >> >> > > >>discussion
>> >> >> >> > > >> thread here.
>> >> >> >> > > >> I’ve updated the KIP page to make it less like
>>half-baked,
>> >> >> >>apologize
>> >> >> >> > for
>> >> >> >> > > >> the rush...
>> >> >> >> > > >>
>> >> >> >> > > >> The contract in current KIP is:
>> >> >> >> > > >>   1. close() - wait until all requests either are sent
>>or
>> >> >>reach
>> >> >> >> > request
>> >> >> >> > > >> timeout.
>> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close
>>immediately
>> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to
>> >>close(),
>> >> >> >>i.e.
>> >> >> >> > Wait
>> >> >> >> > > >> until all requests are sent or reach request timeout
>> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to
>> >>finish
>> >> >> >> sending
>> >> >> >> > > >>in 5
>> >> >> >> > > >> milliseconds, if something went wrong, just shutdown the
>> >> >>producer
>> >> >> >> > > >>anyway,
>> >> >> >> > > >> my callback will handle the failures.
>> >> >> >> > > >>
>> >> >> >> > > >> About how we define what timeout value stands for, I
>> >>actually
>> >> >> >> > struggled
>> >> >> >> > > >>a
>> >> >> >> > > >> little bit when wrote the patch. Intuitively, close(0)
>> >>should
>> >> >> >>mean
>> >> >> >> > > >> immediately, however it seems that all the existing java
>> >>class
>> >> >> >>have
>> >> >> >> > this
>> >> >> >> > > >> convention of timeout=0 means no timeout or never
>>timeout
>> >> >> >> > > >>(Thread.join(0),
>> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is either we
>> >>follow
>> >> >>the
>> >> >> >> > > >> intuition or we follow the convention. What I chose is
>>to
>> >> >>follow
>> >> >> >>the
>> >> >> >> > > >> convention but document the interface to let user be
>>aware
>> >>of
>> >> >>the
>> >> >> >> > usage.
>> >> >> >> > > >> The reason is that I think producer.close() is a public
>> >> >> >>interface so
>> >> >> >> > it
>> >> >> >> > > >> might be better to follow java convention. Whereas
>> >>selector is
>> >> >> >>not a
>> >> >> >> > > >> public interface that used by user, so as long as it
>>makes
>> >> >>sense
>> >> >> >>to
>> >> >> >> > us,
>> >> >> >> > > >>it
>> >> >> >> > > >> is less a problem to be different from java convention.
>> >>That
>> >> >>said
>> >> >> >> > since
>> >> >> >> > > >> consumer.poll(timeout) is also a public interface, I
>>think
>> >>it
>> >> >> >>also
>> >> >> >> > makes
>> >> >> >> > > >> sense to make producer.close() to have the same
>>definition
>> >>of
>> >> >> >> > > >> consumer.poll(timeout).
>> >> >> >> > > >>
>> >> >> >> > > >> The main argument for keeping a timeout in close would
>>be
>> >> >> >>separating
>> >> >> >> > the
>> >> >> >> > > >> close timeout from request timeout, which probably makes
>> >> >>sense. I
>> >> >> >> > would
>> >> >> >> > > >> guess typically the request timeout would be long (e.g.
>>60
>> >> >> >>seconds)
>> >> >> >> > > >> because we might want to consider retries with back off
>> >>time.
>> >> >>If
>> >> >> >>we
>> >> >> >> > have
>> >> >> >> > > >> multiple batches in accumulator, in worst case that
>>could
>> >>take
>> >> >> >>up to
>> >> >> >> > > >> several minutes to complete all the requests. But when
>>we
>> >> >>close a
>> >> >> >> > > >> producer, we might not want to wait for that long as it
>> >>might
>> >> >> >>cause
>> >> >> >> > some
>> >> >> >> > > >> other problem like deployment tool timeout.
>> >> >> >> > > >>
>> >> >> >> > > >> There is also a subtle difference between close(timeout)
>> >>and
>> >> >> >> > > >> flush(timeout). The only purpose for flush() is to write
>> >>data
>> >> >>to
>> >> >> >>the
>> >> >> >> > > >> broker, so it makes perfect sense to wait until request
>> >> >>timeout.
>> >> >> >>I
>> >> >> >> > think
>> >> >> >> > > >> that is why flush(timeout) looks strange. On the other
>> >>hand,
>> >> >>the
>> >> >> >>top
>> >> >> >> > > >> priority for close() is to close the producer rather
>>than
>> >> >>flush()
>> >> >> >> > data,
>> >> >> >> > > >>so
>> >> >> >> > > >> close(timeout) gives guarantee on bounded waiting for
>>its
>> >>main
>> >> >> >>job.
>> >> >> >> > > >>
>> >> >> >> > > >> Sorry for the confusion about forceClose flag. It is
>>not a
>> >> >>public
>> >> >> >> > > >> interface. I mentioned it in Proposed Changes section
>> >>which I
>> >> >> >> thought
>> >> >> >> > > >>was
>> >> >> >> > > >> supposed to provide implementation details.
>> >> >> >> > > >>
>> >> >> >> > > >> Thanks again for all the comments and suggestions!
>> >> >> >> > > >>
>> >> >> >> > > >> Jiangjie (Becket) Qin
>> >> >> >> > > >>
>> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com>
>> >> wrote:
>> >> >> >> > > >>
>> >> >> >> > > >> >The KIP page has been updated per Jay¹s comments.
>> >> >> >> > > >> >I¹d like to initiate the voting process if no further
>> >> >>comments
>> >> >> >>are
>> >> >> >> > > >> >received by tomorrow.
>> >> >> >> > > >> >
>> >> >> >> > > >> >Jiangjie (Becket) Qin
>> >> >> >> > > >> >
>> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com>
>> >>wrote:
>> >> >> >> > > >> >
>> >> >> >> > > >> >>Hey Jiangjie,
>> >> >> >> > > >> >>
>> >> >> >> > > >> >>Can you capture the full motivation and use cases for
>>the
>> >> >> >>feature?
>> >> >> >> > > >>This
>> >> >> >> > > >> >>mentions your interest in having a way of aborting
>>from
>> >> >>inside
>> >> >> >>the
>> >> >> >> > > >> >>Callback. But it doesn't really explain that usage or
>>why
>> >> >>other
>> >> >> >> > people
>> >> >> >> > > >> >>would want to do that. It also doesn't list the
>>primary
>> >>use
>> >> >> >>case
>> >> >> >> for
>> >> >> >> > > >> >>having
>> >> >> >> > > >> >>close with a bounded timeout which was to avoid
>>blocking
>> >>too
>> >> >> >>long
>> >> >> >> on
>> >> >> >> > > >> >>shutdown.
>> >> >> >> > > >> >>
>> >> >> >> > > >> >>-Jay
>> >> >> >> > > >> >>
>> >> >> >> > > >> >>
>> >> >> >> > > >> >>
>> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
>> >> >> >> > > >><jqin@linkedin.com.invalid
>> >> >> >> > > >> >
>> >> >> >> > > >> >>wrote:
>> >> >> >> > > >> >>
>> >> >> >> > > >> >>> Hi,
>> >> >> >> > > >> >>>
>> >> >> >> > > >> >>> I just created a KIP for adding a close(timeout) to
>>new
>> >> >> >> producer.
>> >> >> >> > > >>Most
>> >> >> >> > > >> >>>of
>> >> >> >> > > >> >>> the previous discussions are in KAFKA-1660 where
>>Parth
>> >> >> >> Brahmbhatt
>> >> >> >> > > >>has
>> >> >> >> > > >> >>> already done a lot of work.
>> >> >> >> > > >> >>> Since this is an interface change so we are going
>> >>through
>> >> >>the
>> >> >> >> KIP
>> >> >> >> > > >> >>>process.
>> >> >> >> > > >> >>> Here is the KIP link:
>> >> >> >> > > >> >>>
>> >> >> >> > > >> >>>
>> >> >> >> > > >>
>> >> >> >> > >
>> >> >> >>
>> >> >>
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>> >> >> >> > > >> >>>2
>> >> >> >> > > >> >>>
>> >> >> >> > > >> >>> Thanks.
>> >> >> >> > > >> >>>
>> >> >> >> > > >> >>> Jiangjie (Becket) Qin
>> >> >> >> > > >> >>>
>> >> >> >> > > >> >
>> >> >> >> > > >>
>> >> >> >> > > >>
>> >> >> >> > >
>> >> >> >> > >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Thanks,
>> >> >> >> Neha
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >--
>> >> >> >-- Guozhang
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> >--
>> >> >-- Guozhang
>> >>
>> >>
>> >
>> >
>> >--
>> >-- Guozhang
>>
>>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jun Rao <ju...@confluent.io>.

How does close(0) work if it's called from the sender thread? If close(0)
needs to wait for the sender thread to join, wouldn't this cause a deadlock?

Thanks,

Jun

On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> Thanks Guozhang. It wouldn’t be as thoroughly considered without
> discussing with you :)
>
> Jiangjie (Becket) Qin
>
> On 3/16/15, 1:07 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
>
> >Thanks Jiangjie,
> >
> >After talking to you offline on this, I have been convinced and changed my
> >preference to blocking. The immediate shutdown approach does have some
> >unsafeness in some cases.
> >
> >Guozhang
> >
> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin <jqin@linkedin.com.invalid
> >
> >wrote:
> >
> >> It looks that the problem we want to solve and the purpose we want to
> >> achieve is:
> >> If user uses close() in callback, we want to let user be aware that they
> >> should use close(0) instead of close() in the callback.
> >>
> >> We have agreed that we will have an error log to inform user about this
> >> mis-usage. The options differ in the way how we can force user to take a
> >> look at that error log.
> >> There are two scenarios:
> >> 1. User does not expect the program to exit.
> >> 2. User expect the program to exit.
> >>
> >> For scenario 1), blocking will probably delay the discovery of the
> >> problem. Calling close(0) exposes the problem quicker. In this scenario
> >> producer just encounter a send failure when running normally.
> >> For scenario 2), blocking will expose the problem quick. Calling
> >>close(-1)
> >> might hide the problem. This scenario might include: a) Unit test for a
> >> send failure. b) Message sending during a close() call from a user
> >>thread.
> >>
> >> So as a summary table:
> >>
> >>                   Scenario 1)                         Scenario 2)
> >>
> >> Blocking      Delay problem discovery         Guaranteed problem
> >>discovery
> >>
> >> Close(-1)     Immediate problem discovery     Problem might be hidden
> >>
> >>
> >> Personally I prefer blocking because it seems providing more guarantees
> >> and safer.
> >>
> >> Thanks.
> >>
> >> Jiangjie (Becket) Qin
> >>
> >>
> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wa...@gmail.com> wrote:
> >>
> >> >HI Jiangjie,
> >> >
> >> >As far as I understand calling close() in the ioThread is not common,
> >>as
> >> >it
> >> >may only trigger when we saw some non-retriable error. Hence when user
> >>run
> >> >their program it is unlikely that close() will be triggered and problem
> >> >will be detected. So it seems to me that from the error detection
> >>aspect
> >> >these two options seems to be the same as people will usually detect it
> >> >from the producer metrics all dropping to 0.
> >> >
> >> >Guozhang
> >> >
> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> >><jq...@linkedin.com.invalid>
> >> >wrote:
> >> >
> >> >> It seems there are two options we can choose from when close() is
> >>called
> >> >> from sender thread (callback):
> >> >> 1. Log an error and close the producer using close(-1)
> >> >> 2. Log an error and block.
> >> >> (Throwing an exception will not work because we catch all the
> >>exception
> >> >> thrown from user callback. It will just lead to an error log.)
> >> >>
> >> >> My concern for the first option is that the producer will be closed
> >>even
> >> >> if we logged and error. I am wondering if some user would not even
> >>take
> >> >>a
> >> >> look at the log if producer is closed normally. Because from the
> >> >>programs
> >> >> behavior, everything looks good. If that is the case, the error
> >>message
> >> >>we
> >> >> logged probably will just be ignored until some day when people check
> >> >>the
> >> >> log and see it.
> >> >>
> >> >> As for the second option, because producer does not close but blocks.
> >> >>User
> >> >> will notice this the first time they run the program. They probably
> >>will
> >> >> look at the log to see why producer could not be closed and they will
> >> >>see
> >> >> the error log we put there. So they will get informed about this
> >> >>mis-usage
> >> >> of close() in sender thread the first time they run the code instead
> >>of
> >> >> some time later.
> >> >>
> >> >> Personally I prefer the second one because it is more obvious that
> >> >> something was wrong.
> >> >>
> >> >> Jiangjie (Becket) Qin
> >> >>
> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
> >> >>
> >> >> >Yeah I agree we should not silently change the behavior of the
> >>function
> >> >> >with the given parameters; and I would prefer
> >> >>error-logging-and-shutdown
> >> >> >over blocking when close(>0) is used, since as Neha suggested
> >>blocking
> >> >> >would also not proceed with sending any data, bu will just let
> >>users to
> >> >> >realize the issue later than sooner.
> >> >> >
> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede <ne...@confluent.io>
> >> >>wrote:
> >> >> >
> >> >> >> >
> >> >> >> > And I also agree it is better if we can make producer block when
> >> >> >> > close() is called from sender thread so user will notice
> >>something
> >> >> >>went
> >> >> >> > wrong.
> >> >> >>
> >> >> >>
> >> >> >> This isn't a great experience either. Why can't we just throw an
> >> >> >>exception
> >> >> >> for a behavior we know is incorrect and we'd like the user to
> >>know.
> >> >> >> Blocking as a means of doing that seems wrong and annoying.
> >> >> >>
> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps <ja...@gmail.com>
> >> >> wrote:
> >> >> >>
> >> >> >> > Cool.
> >> >> >> >
> >> >> >> > I think blocking is good or alternately throwing an exception
> >> >>directly
> >> >> >> from
> >> >> >> > close(). Basically I would just worry about subtly doing
> >>something
> >> >> >> slightly
> >> >> >> > different from what the user asked for as it will be hard to
> >>notice
> >> >> >>that
> >> >> >> > behavior difference.
> >> >> >> >
> >> >> >> > -Jay
> >> >> >> >
> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> >> >> >><jqin@linkedin.com.invalid
> >> >> >> >
> >> >> >> > wrote:
> >> >> >> >
> >> >> >> > > Hi Jay,
> >> >> >> > >
> >> >> >> > > I have modified the KIP as you suggested. I thinks as long as
> >>we
> >> >> >>have
> >> >> >> > > consistent define for timeout across Kafka interface, there
> >>would
> >> >> >>be no
> >> >> >> > > problem. And I also agree it is better if we can make producer
> >> >>block
> >> >> >> when
> >> >> >> > > close() is called from sender thread so user will notice
> >> >>something
> >> >> >>went
> >> >> >> > > wrong.
> >> >> >> > >
> >> >> >> > > Thanks.
> >> >> >> > >
> >> >> >> > > Jiangjie (Becket) Qin
> >> >> >> > >
> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com>
> wrote:
> >> >> >> > >
> >> >> >> > > >Hey Jiangjie,
> >> >> >> > > >
> >> >> >> > > >I think this is going to be very confusing that
> >> >> >> > > >  close(0) waits indefinitely and
> >> >> >> > > >  close(-1) waits for 0.
> >> >> >> > > >I understand this appears in other apis, but it is a constant
> >> >> >>cause of
> >> >> >> > > >bugs. Let's not repeat that mistake.
> >> >> >> > > >
> >> >> >> > > >Let's make close(0) wait for 0. We don't need a way to wait
> >> >> >> indefinitely
> >> >> >> > > >as
> >> >> >> > > >we already have close() so having a magical constant for
> >>that is
> >> >> >> > > >redundant.
> >> >> >> > > >
> >> >> >> > > >Calling close() from the I/O thread was already possible and
> >> >>would
> >> >> >> block
> >> >> >> > > >indefinitely. I think trying to silently change the behavior
> >>is
> >> >> >> probably
> >> >> >> > > >not right. I.e. if the user calls close() in the callback
> >>there
> >> >>is
> >> >> >> > > >actually
> >> >> >> > > >some misunderstanding and they need to think more, silently
> >> >>making
> >> >> >> this
> >> >> >> > > >not
> >> >> >> > > >block will hide the problem from them which is the opposite
> >>of
> >> >> >>what we
> >> >> >> > > >want.
> >> >> >> > > >
> >> >> >> > > >-Jay
> >> >> >> > > >
> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> >> >> >> <jqin@linkedin.com.invalid
> >> >> >> > >
> >> >> >> > > >wrote:
> >> >> >> > > >
> >> >> >> > > >> Hey Joe & Jay,
> >> >> >> > > >>
> >> >> >> > > >> Thanks for the comments on the voting thread. Since it
> >>seems
> >> >>we
> >> >> >> > probably
> >> >> >> > > >> will have more discussion on this, I am just replying from
> >>the
> >> >> >> > > >>discussion
> >> >> >> > > >> thread here.
> >> >> >> > > >> I’ve updated the KIP page to make it less like half-baked,
> >> >> >>apologize
> >> >> >> > for
> >> >> >> > > >> the rush...
> >> >> >> > > >>
> >> >> >> > > >> The contract in current KIP is:
> >> >> >> > > >>   1. close() - wait until all requests either are sent or
> >> >>reach
> >> >> >> > request
> >> >> >> > > >> timeout.
> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to
> >>close(),
> >> >> >>i.e.
> >> >> >> > Wait
> >> >> >> > > >> until all requests are sent or reach request timeout
> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to
> >>finish
> >> >> >> sending
> >> >> >> > > >>in 5
> >> >> >> > > >> milliseconds, if something went wrong, just shutdown the
> >> >>producer
> >> >> >> > > >>anyway,
> >> >> >> > > >> my callback will handle the failures.
> >> >> >> > > >>
> >> >> >> > > >> About how we define what timeout value stands for, I
> >>actually
> >> >> >> > struggled
> >> >> >> > > >>a
> >> >> >> > > >> little bit when wrote the patch. Intuitively, close(0)
> >>should
> >> >> >>mean
> >> >> >> > > >> immediately, however it seems that all the existing java
> >>class
> >> >> >>have
> >> >> >> > this
> >> >> >> > > >> convention of timeout=0 means no timeout or never timeout
> >> >> >> > > >>(Thread.join(0),
> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is either we
> >>follow
> >> >>the
> >> >> >> > > >> intuition or we follow the convention. What I chose is to
> >> >>follow
> >> >> >>the
> >> >> >> > > >> convention but document the interface to let user be aware
> >>of
> >> >>the
> >> >> >> > usage.
> >> >> >> > > >> The reason is that I think producer.close() is a public
> >> >> >>interface so
> >> >> >> > it
> >> >> >> > > >> might be better to follow java convention. Whereas
> >>selector is
> >> >> >>not a
> >> >> >> > > >> public interface that used by user, so as long as it makes
> >> >>sense
> >> >> >>to
> >> >> >> > us,
> >> >> >> > > >>it
> >> >> >> > > >> is less a problem to be different from java convention.
> >>That
> >> >>said
> >> >> >> > since
> >> >> >> > > >> consumer.poll(timeout) is also a public interface, I think
> >>it
> >> >> >>also
> >> >> >> > makes
> >> >> >> > > >> sense to make producer.close() to have the same definition
> >>of
> >> >> >> > > >> consumer.poll(timeout).
> >> >> >> > > >>
> >> >> >> > > >> The main argument for keeping a timeout in close would be
> >> >> >>separating
> >> >> >> > the
> >> >> >> > > >> close timeout from request timeout, which probably makes
> >> >>sense. I
> >> >> >> > would
> >> >> >> > > >> guess typically the request timeout would be long (e.g. 60
> >> >> >>seconds)
> >> >> >> > > >> because we might want to consider retries with back off
> >>time.
> >> >>If
> >> >> >>we
> >> >> >> > have
> >> >> >> > > >> multiple batches in accumulator, in worst case that could
> >>take
> >> >> >>up to
> >> >> >> > > >> several minutes to complete all the requests. But when we
> >> >>close a
> >> >> >> > > >> producer, we might not want to wait for that long as it
> >>might
> >> >> >>cause
> >> >> >> > some
> >> >> >> > > >> other problem like deployment tool timeout.
> >> >> >> > > >>
> >> >> >> > > >> There is also a subtle difference between close(timeout)
> >>and
> >> >> >> > > >> flush(timeout). The only purpose for flush() is to write
> >>data
> >> >>to
> >> >> >>the
> >> >> >> > > >> broker, so it makes perfect sense to wait until request
> >> >>timeout.
> >> >> >>I
> >> >> >> > think
> >> >> >> > > >> that is why flush(timeout) looks strange. On the other
> >>hand,
> >> >>the
> >> >> >>top
> >> >> >> > > >> priority for close() is to close the producer rather than
> >> >>flush()
> >> >> >> > data,
> >> >> >> > > >>so
> >> >> >> > > >> close(timeout) gives guarantee on bounded waiting for its
> >>main
> >> >> >>job.
> >> >> >> > > >>
> >> >> >> > > >> Sorry for the confusion about forceClose flag. It is not a
> >> >>public
> >> >> >> > > >> interface. I mentioned it in Proposed Changes section
> >>which I
> >> >> >> thought
> >> >> >> > > >>was
> >> >> >> > > >> supposed to provide implementation details.
> >> >> >> > > >>
> >> >> >> > > >> Thanks again for all the comments and suggestions!
> >> >> >> > > >>
> >> >> >> > > >> Jiangjie (Becket) Qin
> >> >> >> > > >>
> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com>
> >> wrote:
> >> >> >> > > >>
> >> >> >> > > >> >The KIP page has been updated per Jay¹s comments.
> >> >> >> > > >> >I¹d like to initiate the voting process if no further
> >> >>comments
> >> >> >>are
> >> >> >> > > >> >received by tomorrow.
> >> >> >> > > >> >
> >> >> >> > > >> >Jiangjie (Becket) Qin
> >> >> >> > > >> >
> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com>
> >>wrote:
> >> >> >> > > >> >
> >> >> >> > > >> >>Hey Jiangjie,
> >> >> >> > > >> >>
> >> >> >> > > >> >>Can you capture the full motivation and use cases for the
> >> >> >>feature?
> >> >> >> > > >>This
> >> >> >> > > >> >>mentions your interest in having a way of aborting from
> >> >>inside
> >> >> >>the
> >> >> >> > > >> >>Callback. But it doesn't really explain that usage or why
> >> >>other
> >> >> >> > people
> >> >> >> > > >> >>would want to do that. It also doesn't list the primary
> >>use
> >> >> >>case
> >> >> >> for
> >> >> >> > > >> >>having
> >> >> >> > > >> >>close with a bounded timeout which was to avoid blocking
> >>too
> >> >> >>long
> >> >> >> on
> >> >> >> > > >> >>shutdown.
> >> >> >> > > >> >>
> >> >> >> > > >> >>-Jay
> >> >> >> > > >> >>
> >> >> >> > > >> >>
> >> >> >> > > >> >>
> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> >> >> >> > > >><jqin@linkedin.com.invalid
> >> >> >> > > >> >
> >> >> >> > > >> >>wrote:
> >> >> >> > > >> >>
> >> >> >> > > >> >>> Hi,
> >> >> >> > > >> >>>
> >> >> >> > > >> >>> I just created a KIP for adding a close(timeout) to new
> >> >> >> producer.
> >> >> >> > > >>Most
> >> >> >> > > >> >>>of
> >> >> >> > > >> >>> the previous discussions are in KAFKA-1660 where Parth
> >> >> >> Brahmbhatt
> >> >> >> > > >>has
> >> >> >> > > >> >>> already done a lot of work.
> >> >> >> > > >> >>> Since this is an interface change so we are going
> >>through
> >> >>the
> >> >> >> KIP
> >> >> >> > > >> >>>process.
> >> >> >> > > >> >>> Here is the KIP link:
> >> >> >> > > >> >>>
> >> >> >> > > >> >>>
> >> >> >> > > >>
> >> >> >> > >
> >> >> >>
> >> >>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >> >> >> > > >> >>>2
> >> >> >> > > >> >>>
> >> >> >> > > >> >>> Thanks.
> >> >> >> > > >> >>>
> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> >> >> >> > > >> >>>
> >> >> >> > > >> >
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > >
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Thanks,
> >> >> >> Neha
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> >--
> >> >> >-- Guozhang
> >> >>
> >> >>
> >> >
> >> >
> >> >--
> >> >-- Guozhang
> >>
> >>
> >
> >
> >--
> >-- Guozhang
>
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Thanks Guozhang. It wouldn’t be as thoroughly considered without
discussing with you :)

Jiangjie (Becket) Qin

On 3/16/15, 1:07 PM, "Guozhang Wang" <wa...@gmail.com> wrote:

>Thanks Jiangjie,
>
>After talking to you offline on this, I have been convinced and changed my
>preference to blocking. The immediate shutdown approach does have some
>unsafeness in some cases.
>
>Guozhang
>
>On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
>wrote:
>
>> It looks that the problem we want to solve and the purpose we want to
>> achieve is:
>> If user uses close() in callback, we want to let user be aware that they
>> should use close(0) instead of close() in the callback.
>>
>> We have agreed that we will have an error log to inform user about this
>> mis-usage. The options differ in the way how we can force user to take a
>> look at that error log.
>> There are two scenarios:
>> 1. User does not expect the program to exit.
>> 2. User expect the program to exit.
>>
>> For scenario 1), blocking will probably delay the discovery of the
>> problem. Calling close(0) exposes the problem quicker. In this scenario
>> producer just encounter a send failure when running normally.
>> For scenario 2), blocking will expose the problem quick. Calling
>>close(-1)
>> might hide the problem. This scenario might include: a) Unit test for a
>> send failure. b) Message sending during a close() call from a user
>>thread.
>>
>> So as a summary table:
>>
>>                   Scenario 1)                         Scenario 2)
>>
>> Blocking      Delay problem discovery         Guaranteed problem
>>discovery
>>
>> Close(-1)     Immediate problem discovery     Problem might be hidden
>>
>>
>> Personally I prefer blocking because it seems providing more guarantees
>> and safer.
>>
>> Thanks.
>>
>> Jiangjie (Becket) Qin
>>
>>
>> On 3/16/15, 10:11 AM, "Guozhang Wang" <wa...@gmail.com> wrote:
>>
>> >HI Jiangjie,
>> >
>> >As far as I understand calling close() in the ioThread is not common,
>>as
>> >it
>> >may only trigger when we saw some non-retriable error. Hence when user
>>run
>> >their program it is unlikely that close() will be triggered and problem
>> >will be detected. So it seems to me that from the error detection
>>aspect
>> >these two options seems to be the same as people will usually detect it
>> >from the producer metrics all dropping to 0.
>> >
>> >Guozhang
>> >
>> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
>><jq...@linkedin.com.invalid>
>> >wrote:
>> >
>> >> It seems there are two options we can choose from when close() is
>>called
>> >> from sender thread (callback):
>> >> 1. Log an error and close the producer using close(-1)
>> >> 2. Log an error and block.
>> >> (Throwing an exception will not work because we catch all the
>>exception
>> >> thrown from user callback. It will just lead to an error log.)
>> >>
>> >> My concern for the first option is that the producer will be closed
>>even
>> >> if we logged and error. I am wondering if some user would not even
>>take
>> >>a
>> >> look at the log if producer is closed normally. Because from the
>> >>programs
>> >> behavior, everything looks good. If that is the case, the error
>>message
>> >>we
>> >> logged probably will just be ignored until some day when people check
>> >>the
>> >> log and see it.
>> >>
>> >> As for the second option, because producer does not close but blocks.
>> >>User
>> >> will notice this the first time they run the program. They probably
>>will
>> >> look at the log to see why producer could not be closed and they will
>> >>see
>> >> the error log we put there. So they will get informed about this
>> >>mis-usage
>> >> of close() in sender thread the first time they run the code instead
>>of
>> >> some time later.
>> >>
>> >> Personally I prefer the second one because it is more obvious that
>> >> something was wrong.
>> >>
>> >> Jiangjie (Becket) Qin
>> >>
>> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
>> >>
>> >> >Yeah I agree we should not silently change the behavior of the
>>function
>> >> >with the given parameters; and I would prefer
>> >>error-logging-and-shutdown
>> >> >over blocking when close(>0) is used, since as Neha suggested
>>blocking
>> >> >would also not proceed with sending any data, bu will just let
>>users to
>> >> >realize the issue later than sooner.
>> >> >
>> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede <ne...@confluent.io>
>> >>wrote:
>> >> >
>> >> >> >
>> >> >> > And I also agree it is better if we can make producer block when
>> >> >> > close() is called from sender thread so user will notice
>>something
>> >> >>went
>> >> >> > wrong.
>> >> >>
>> >> >>
>> >> >> This isn't a great experience either. Why can't we just throw an
>> >> >>exception
>> >> >> for a behavior we know is incorrect and we'd like the user to
>>know.
>> >> >> Blocking as a means of doing that seems wrong and annoying.
>> >> >>
>> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps <ja...@gmail.com>
>> >> wrote:
>> >> >>
>> >> >> > Cool.
>> >> >> >
>> >> >> > I think blocking is good or alternately throwing an exception
>> >>directly
>> >> >> from
>> >> >> > close(). Basically I would just worry about subtly doing
>>something
>> >> >> slightly
>> >> >> > different from what the user asked for as it will be hard to
>>notice
>> >> >>that
>> >> >> > behavior difference.
>> >> >> >
>> >> >> > -Jay
>> >> >> >
>> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
>> >> >><jqin@linkedin.com.invalid
>> >> >> >
>> >> >> > wrote:
>> >> >> >
>> >> >> > > Hi Jay,
>> >> >> > >
>> >> >> > > I have modified the KIP as you suggested. I thinks as long as
>>we
>> >> >>have
>> >> >> > > consistent define for timeout across Kafka interface, there
>>would
>> >> >>be no
>> >> >> > > problem. And I also agree it is better if we can make producer
>> >>block
>> >> >> when
>> >> >> > > close() is called from sender thread so user will notice
>> >>something
>> >> >>went
>> >> >> > > wrong.
>> >> >> > >
>> >> >> > > Thanks.
>> >> >> > >
>> >> >> > > Jiangjie (Becket) Qin
>> >> >> > >
>> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>> >> >> > >
>> >> >> > > >Hey Jiangjie,
>> >> >> > > >
>> >> >> > > >I think this is going to be very confusing that
>> >> >> > > >  close(0) waits indefinitely and
>> >> >> > > >  close(-1) waits for 0.
>> >> >> > > >I understand this appears in other apis, but it is a constant
>> >> >>cause of
>> >> >> > > >bugs. Let's not repeat that mistake.
>> >> >> > > >
>> >> >> > > >Let's make close(0) wait for 0. We don't need a way to wait
>> >> >> indefinitely
>> >> >> > > >as
>> >> >> > > >we already have close() so having a magical constant for
>>that is
>> >> >> > > >redundant.
>> >> >> > > >
>> >> >> > > >Calling close() from the I/O thread was already possible and
>> >>would
>> >> >> block
>> >> >> > > >indefinitely. I think trying to silently change the behavior
>>is
>> >> >> probably
>> >> >> > > >not right. I.e. if the user calls close() in the callback
>>there
>> >>is
>> >> >> > > >actually
>> >> >> > > >some misunderstanding and they need to think more, silently
>> >>making
>> >> >> this
>> >> >> > > >not
>> >> >> > > >block will hide the problem from them which is the opposite
>>of
>> >> >>what we
>> >> >> > > >want.
>> >> >> > > >
>> >> >> > > >-Jay
>> >> >> > > >
>> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
>> >> >> <jqin@linkedin.com.invalid
>> >> >> > >
>> >> >> > > >wrote:
>> >> >> > > >
>> >> >> > > >> Hey Joe & Jay,
>> >> >> > > >>
>> >> >> > > >> Thanks for the comments on the voting thread. Since it
>>seems
>> >>we
>> >> >> > probably
>> >> >> > > >> will have more discussion on this, I am just replying from
>>the
>> >> >> > > >>discussion
>> >> >> > > >> thread here.
>> >> >> > > >> I’ve updated the KIP page to make it less like half-baked,
>> >> >>apologize
>> >> >> > for
>> >> >> > > >> the rush...
>> >> >> > > >>
>> >> >> > > >> The contract in current KIP is:
>> >> >> > > >>   1. close() - wait until all requests either are sent or
>> >>reach
>> >> >> > request
>> >> >> > > >> timeout.
>> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
>> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to
>>close(),
>> >> >>i.e.
>> >> >> > Wait
>> >> >> > > >> until all requests are sent or reach request timeout
>> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to
>>finish
>> >> >> sending
>> >> >> > > >>in 5
>> >> >> > > >> milliseconds, if something went wrong, just shutdown the
>> >>producer
>> >> >> > > >>anyway,
>> >> >> > > >> my callback will handle the failures.
>> >> >> > > >>
>> >> >> > > >> About how we define what timeout value stands for, I
>>actually
>> >> >> > struggled
>> >> >> > > >>a
>> >> >> > > >> little bit when wrote the patch. Intuitively, close(0)
>>should
>> >> >>mean
>> >> >> > > >> immediately, however it seems that all the existing java
>>class
>> >> >>have
>> >> >> > this
>> >> >> > > >> convention of timeout=0 means no timeout or never timeout
>> >> >> > > >>(Thread.join(0),
>> >> >> > > >> Object.wait(0), etc.) So here the dilemma is either we
>>follow
>> >>the
>> >> >> > > >> intuition or we follow the convention. What I chose is to
>> >>follow
>> >> >>the
>> >> >> > > >> convention but document the interface to let user be aware
>>of
>> >>the
>> >> >> > usage.
>> >> >> > > >> The reason is that I think producer.close() is a public
>> >> >>interface so
>> >> >> > it
>> >> >> > > >> might be better to follow java convention. Whereas
>>selector is
>> >> >>not a
>> >> >> > > >> public interface that used by user, so as long as it makes
>> >>sense
>> >> >>to
>> >> >> > us,
>> >> >> > > >>it
>> >> >> > > >> is less a problem to be different from java convention.
>>That
>> >>said
>> >> >> > since
>> >> >> > > >> consumer.poll(timeout) is also a public interface, I think
>>it
>> >> >>also
>> >> >> > makes
>> >> >> > > >> sense to make producer.close() to have the same definition
>>of
>> >> >> > > >> consumer.poll(timeout).
>> >> >> > > >>
>> >> >> > > >> The main argument for keeping a timeout in close would be
>> >> >>separating
>> >> >> > the
>> >> >> > > >> close timeout from request timeout, which probably makes
>> >>sense. I
>> >> >> > would
>> >> >> > > >> guess typically the request timeout would be long (e.g. 60
>> >> >>seconds)
>> >> >> > > >> because we might want to consider retries with back off
>>time.
>> >>If
>> >> >>we
>> >> >> > have
>> >> >> > > >> multiple batches in accumulator, in worst case that could
>>take
>> >> >>up to
>> >> >> > > >> several minutes to complete all the requests. But when we
>> >>close a
>> >> >> > > >> producer, we might not want to wait for that long as it
>>might
>> >> >>cause
>> >> >> > some
>> >> >> > > >> other problem like deployment tool timeout.
>> >> >> > > >>
>> >> >> > > >> There is also a subtle difference between close(timeout)
>>and
>> >> >> > > >> flush(timeout). The only purpose for flush() is to write
>>data
>> >>to
>> >> >>the
>> >> >> > > >> broker, so it makes perfect sense to wait until request
>> >>timeout.
>> >> >>I
>> >> >> > think
>> >> >> > > >> that is why flush(timeout) looks strange. On the other
>>hand,
>> >>the
>> >> >>top
>> >> >> > > >> priority for close() is to close the producer rather than
>> >>flush()
>> >> >> > data,
>> >> >> > > >>so
>> >> >> > > >> close(timeout) gives guarantee on bounded waiting for its
>>main
>> >> >>job.
>> >> >> > > >>
>> >> >> > > >> Sorry for the confusion about forceClose flag. It is not a
>> >>public
>> >> >> > > >> interface. I mentioned it in Proposed Changes section
>>which I
>> >> >> thought
>> >> >> > > >>was
>> >> >> > > >> supposed to provide implementation details.
>> >> >> > > >>
>> >> >> > > >> Thanks again for all the comments and suggestions!
>> >> >> > > >>
>> >> >> > > >> Jiangjie (Becket) Qin
>> >> >> > > >>
>> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com>
>> wrote:
>> >> >> > > >>
>> >> >> > > >> >The KIP page has been updated per Jay¹s comments.
>> >> >> > > >> >I¹d like to initiate the voting process if no further
>> >>comments
>> >> >>are
>> >> >> > > >> >received by tomorrow.
>> >> >> > > >> >
>> >> >> > > >> >Jiangjie (Becket) Qin
>> >> >> > > >> >
>> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com>
>>wrote:
>> >> >> > > >> >
>> >> >> > > >> >>Hey Jiangjie,
>> >> >> > > >> >>
>> >> >> > > >> >>Can you capture the full motivation and use cases for the
>> >> >>feature?
>> >> >> > > >>This
>> >> >> > > >> >>mentions your interest in having a way of aborting from
>> >>inside
>> >> >>the
>> >> >> > > >> >>Callback. But it doesn't really explain that usage or why
>> >>other
>> >> >> > people
>> >> >> > > >> >>would want to do that. It also doesn't list the primary
>>use
>> >> >>case
>> >> >> for
>> >> >> > > >> >>having
>> >> >> > > >> >>close with a bounded timeout which was to avoid blocking
>>too
>> >> >>long
>> >> >> on
>> >> >> > > >> >>shutdown.
>> >> >> > > >> >>
>> >> >> > > >> >>-Jay
>> >> >> > > >> >>
>> >> >> > > >> >>
>> >> >> > > >> >>
>> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
>> >> >> > > >><jqin@linkedin.com.invalid
>> >> >> > > >> >
>> >> >> > > >> >>wrote:
>> >> >> > > >> >>
>> >> >> > > >> >>> Hi,
>> >> >> > > >> >>>
>> >> >> > > >> >>> I just created a KIP for adding a close(timeout) to new
>> >> >> producer.
>> >> >> > > >>Most
>> >> >> > > >> >>>of
>> >> >> > > >> >>> the previous discussions are in KAFKA-1660 where Parth
>> >> >> Brahmbhatt
>> >> >> > > >>has
>> >> >> > > >> >>> already done a lot of work.
>> >> >> > > >> >>> Since this is an interface change so we are going
>>through
>> >>the
>> >> >> KIP
>> >> >> > > >> >>>process.
>> >> >> > > >> >>> Here is the KIP link:
>> >> >> > > >> >>>
>> >> >> > > >> >>>
>> >> >> > > >>
>> >> >> > >
>> >> >>
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>> >> >> > > >> >>>2
>> >> >> > > >> >>>
>> >> >> > > >> >>> Thanks.
>> >> >> > > >> >>>
>> >> >> > > >> >>> Jiangjie (Becket) Qin
>> >> >> > > >> >>>
>> >> >> > > >> >
>> >> >> > > >>
>> >> >> > > >>
>> >> >> > >
>> >> >> > >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Thanks,
>> >> >> Neha
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> >--
>> >> >-- Guozhang
>> >>
>> >>
>> >
>> >
>> >--
>> >-- Guozhang
>>
>>
>
>
>-- 
>-- Guozhang

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Guozhang Wang <wa...@gmail.com>.

Thanks Jiangjie,

After talking to you offline on this, I have been convinced and changed my
preference to blocking. The immediate shutdown approach does have some
unsafeness in some cases.

Guozhang

On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> It looks that the problem we want to solve and the purpose we want to
> achieve is:
> If user uses close() in callback, we want to let user be aware that they
> should use close(0) instead of close() in the callback.
>
> We have agreed that we will have an error log to inform user about this
> mis-usage. The options differ in the way how we can force user to take a
> look at that error log.
> There are two scenarios:
> 1. User does not expect the program to exit.
> 2. User expect the program to exit.
>
> For scenario 1), blocking will probably delay the discovery of the
> problem. Calling close(0) exposes the problem quicker. In this scenario
> producer just encounter a send failure when running normally.
> For scenario 2), blocking will expose the problem quick. Calling close(-1)
> might hide the problem. This scenario might include: a) Unit test for a
> send failure. b) Message sending during a close() call from a user thread.
>
> So as a summary table:
>
>                   Scenario 1)                         Scenario 2)
>
> Blocking      Delay problem discovery         Guaranteed problem discovery
>
> Close(-1)     Immediate problem discovery     Problem might be hidden
>
>
> Personally I prefer blocking because it seems providing more guarantees
> and safer.
>
> Thanks.
>
> Jiangjie (Becket) Qin
>
>
> On 3/16/15, 10:11 AM, "Guozhang Wang" <wa...@gmail.com> wrote:
>
> >HI Jiangjie,
> >
> >As far as I understand calling close() in the ioThread is not common, as
> >it
> >may only trigger when we saw some non-retriable error. Hence when user run
> >their program it is unlikely that close() will be triggered and problem
> >will be detected. So it seems to me that from the error detection aspect
> >these two options seems to be the same as people will usually detect it
> >from the producer metrics all dropping to 0.
> >
> >Guozhang
> >
> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
> >wrote:
> >
> >> It seems there are two options we can choose from when close() is called
> >> from sender thread (callback):
> >> 1. Log an error and close the producer using close(-1)
> >> 2. Log an error and block.
> >> (Throwing an exception will not work because we catch all the exception
> >> thrown from user callback. It will just lead to an error log.)
> >>
> >> My concern for the first option is that the producer will be closed even
> >> if we logged and error. I am wondering if some user would not even take
> >>a
> >> look at the log if producer is closed normally. Because from the
> >>programs
> >> behavior, everything looks good. If that is the case, the error message
> >>we
> >> logged probably will just be ignored until some day when people check
> >>the
> >> log and see it.
> >>
> >> As for the second option, because producer does not close but blocks.
> >>User
> >> will notice this the first time they run the program. They probably will
> >> look at the log to see why producer could not be closed and they will
> >>see
> >> the error log we put there. So they will get informed about this
> >>mis-usage
> >> of close() in sender thread the first time they run the code instead of
> >> some time later.
> >>
> >> Personally I prefer the second one because it is more obvious that
> >> something was wrong.
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
> >>
> >> >Yeah I agree we should not silently change the behavior of the function
> >> >with the given parameters; and I would prefer
> >>error-logging-and-shutdown
> >> >over blocking when close(>0) is used, since as Neha suggested blocking
> >> >would also not proceed with sending any data, bu will just let users to
> >> >realize the issue later than sooner.
> >> >
> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede <ne...@confluent.io>
> >>wrote:
> >> >
> >> >> >
> >> >> > And I also agree it is better if we can make producer block when
> >> >> > close() is called from sender thread so user will notice something
> >> >>went
> >> >> > wrong.
> >> >>
> >> >>
> >> >> This isn't a great experience either. Why can't we just throw an
> >> >>exception
> >> >> for a behavior we know is incorrect and we'd like the user to know.
> >> >> Blocking as a means of doing that seems wrong and annoying.
> >> >>
> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps <ja...@gmail.com>
> >> wrote:
> >> >>
> >> >> > Cool.
> >> >> >
> >> >> > I think blocking is good or alternately throwing an exception
> >>directly
> >> >> from
> >> >> > close(). Basically I would just worry about subtly doing something
> >> >> slightly
> >> >> > different from what the user asked for as it will be hard to notice
> >> >>that
> >> >> > behavior difference.
> >> >> >
> >> >> > -Jay
> >> >> >
> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> >> >><jqin@linkedin.com.invalid
> >> >> >
> >> >> > wrote:
> >> >> >
> >> >> > > Hi Jay,
> >> >> > >
> >> >> > > I have modified the KIP as you suggested. I thinks as long as we
> >> >>have
> >> >> > > consistent define for timeout across Kafka interface, there would
> >> >>be no
> >> >> > > problem. And I also agree it is better if we can make producer
> >>block
> >> >> when
> >> >> > > close() is called from sender thread so user will notice
> >>something
> >> >>went
> >> >> > > wrong.
> >> >> > >
> >> >> > > Thanks.
> >> >> > >
> >> >> > > Jiangjie (Becket) Qin
> >> >> > >
> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> >> >> > >
> >> >> > > >Hey Jiangjie,
> >> >> > > >
> >> >> > > >I think this is going to be very confusing that
> >> >> > > >  close(0) waits indefinitely and
> >> >> > > >  close(-1) waits for 0.
> >> >> > > >I understand this appears in other apis, but it is a constant
> >> >>cause of
> >> >> > > >bugs. Let's not repeat that mistake.
> >> >> > > >
> >> >> > > >Let's make close(0) wait for 0. We don't need a way to wait
> >> >> indefinitely
> >> >> > > >as
> >> >> > > >we already have close() so having a magical constant for that is
> >> >> > > >redundant.
> >> >> > > >
> >> >> > > >Calling close() from the I/O thread was already possible and
> >>would
> >> >> block
> >> >> > > >indefinitely. I think trying to silently change the behavior is
> >> >> probably
> >> >> > > >not right. I.e. if the user calls close() in the callback there
> >>is
> >> >> > > >actually
> >> >> > > >some misunderstanding and they need to think more, silently
> >>making
> >> >> this
> >> >> > > >not
> >> >> > > >block will hide the problem from them which is the opposite of
> >> >>what we
> >> >> > > >want.
> >> >> > > >
> >> >> > > >-Jay
> >> >> > > >
> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> >> >> <jqin@linkedin.com.invalid
> >> >> > >
> >> >> > > >wrote:
> >> >> > > >
> >> >> > > >> Hey Joe & Jay,
> >> >> > > >>
> >> >> > > >> Thanks for the comments on the voting thread. Since it seems
> >>we
> >> >> > probably
> >> >> > > >> will have more discussion on this, I am just replying from the
> >> >> > > >>discussion
> >> >> > > >> thread here.
> >> >> > > >> I’ve updated the KIP page to make it less like half-baked,
> >> >>apologize
> >> >> > for
> >> >> > > >> the rush...
> >> >> > > >>
> >> >> > > >> The contract in current KIP is:
> >> >> > > >>   1. close() - wait until all requests either are sent or
> >>reach
> >> >> > request
> >> >> > > >> timeout.
> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(),
> >> >>i.e.
> >> >> > Wait
> >> >> > > >> until all requests are sent or reach request timeout
> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish
> >> >> sending
> >> >> > > >>in 5
> >> >> > > >> milliseconds, if something went wrong, just shutdown the
> >>producer
> >> >> > > >>anyway,
> >> >> > > >> my callback will handle the failures.
> >> >> > > >>
> >> >> > > >> About how we define what timeout value stands for, I actually
> >> >> > struggled
> >> >> > > >>a
> >> >> > > >> little bit when wrote the patch. Intuitively, close(0) should
> >> >>mean
> >> >> > > >> immediately, however it seems that all the existing java class
> >> >>have
> >> >> > this
> >> >> > > >> convention of timeout=0 means no timeout or never timeout
> >> >> > > >>(Thread.join(0),
> >> >> > > >> Object.wait(0), etc.) So here the dilemma is either we follow
> >>the
> >> >> > > >> intuition or we follow the convention. What I chose is to
> >>follow
> >> >>the
> >> >> > > >> convention but document the interface to let user be aware of
> >>the
> >> >> > usage.
> >> >> > > >> The reason is that I think producer.close() is a public
> >> >>interface so
> >> >> > it
> >> >> > > >> might be better to follow java convention. Whereas selector is
> >> >>not a
> >> >> > > >> public interface that used by user, so as long as it makes
> >>sense
> >> >>to
> >> >> > us,
> >> >> > > >>it
> >> >> > > >> is less a problem to be different from java convention. That
> >>said
> >> >> > since
> >> >> > > >> consumer.poll(timeout) is also a public interface, I think it
> >> >>also
> >> >> > makes
> >> >> > > >> sense to make producer.close() to have the same definition of
> >> >> > > >> consumer.poll(timeout).
> >> >> > > >>
> >> >> > > >> The main argument for keeping a timeout in close would be
> >> >>separating
> >> >> > the
> >> >> > > >> close timeout from request timeout, which probably makes
> >>sense. I
> >> >> > would
> >> >> > > >> guess typically the request timeout would be long (e.g. 60
> >> >>seconds)
> >> >> > > >> because we might want to consider retries with back off time.
> >>If
> >> >>we
> >> >> > have
> >> >> > > >> multiple batches in accumulator, in worst case that could take
> >> >>up to
> >> >> > > >> several minutes to complete all the requests. But when we
> >>close a
> >> >> > > >> producer, we might not want to wait for that long as it might
> >> >>cause
> >> >> > some
> >> >> > > >> other problem like deployment tool timeout.
> >> >> > > >>
> >> >> > > >> There is also a subtle difference between close(timeout) and
> >> >> > > >> flush(timeout). The only purpose for flush() is to write data
> >>to
> >> >>the
> >> >> > > >> broker, so it makes perfect sense to wait until request
> >>timeout.
> >> >>I
> >> >> > think
> >> >> > > >> that is why flush(timeout) looks strange. On the other hand,
> >>the
> >> >>top
> >> >> > > >> priority for close() is to close the producer rather than
> >>flush()
> >> >> > data,
> >> >> > > >>so
> >> >> > > >> close(timeout) gives guarantee on bounded waiting for its main
> >> >>job.
> >> >> > > >>
> >> >> > > >> Sorry for the confusion about forceClose flag. It is not a
> >>public
> >> >> > > >> interface. I mentioned it in Proposed Changes section which I
> >> >> thought
> >> >> > > >>was
> >> >> > > >> supposed to provide implementation details.
> >> >> > > >>
> >> >> > > >> Thanks again for all the comments and suggestions!
> >> >> > > >>
> >> >> > > >> Jiangjie (Becket) Qin
> >> >> > > >>
> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com>
> wrote:
> >> >> > > >>
> >> >> > > >> >The KIP page has been updated per Jay¹s comments.
> >> >> > > >> >I¹d like to initiate the voting process if no further
> >>comments
> >> >>are
> >> >> > > >> >received by tomorrow.
> >> >> > > >> >
> >> >> > > >> >Jiangjie (Becket) Qin
> >> >> > > >> >
> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> >> >> > > >> >
> >> >> > > >> >>Hey Jiangjie,
> >> >> > > >> >>
> >> >> > > >> >>Can you capture the full motivation and use cases for the
> >> >>feature?
> >> >> > > >>This
> >> >> > > >> >>mentions your interest in having a way of aborting from
> >>inside
> >> >>the
> >> >> > > >> >>Callback. But it doesn't really explain that usage or why
> >>other
> >> >> > people
> >> >> > > >> >>would want to do that. It also doesn't list the primary use
> >> >>case
> >> >> for
> >> >> > > >> >>having
> >> >> > > >> >>close with a bounded timeout which was to avoid blocking too
> >> >>long
> >> >> on
> >> >> > > >> >>shutdown.
> >> >> > > >> >>
> >> >> > > >> >>-Jay
> >> >> > > >> >>
> >> >> > > >> >>
> >> >> > > >> >>
> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> >> >> > > >><jqin@linkedin.com.invalid
> >> >> > > >> >
> >> >> > > >> >>wrote:
> >> >> > > >> >>
> >> >> > > >> >>> Hi,
> >> >> > > >> >>>
> >> >> > > >> >>> I just created a KIP for adding a close(timeout) to new
> >> >> producer.
> >> >> > > >>Most
> >> >> > > >> >>>of
> >> >> > > >> >>> the previous discussions are in KAFKA-1660 where Parth
> >> >> Brahmbhatt
> >> >> > > >>has
> >> >> > > >> >>> already done a lot of work.
> >> >> > > >> >>> Since this is an interface change so we are going through
> >>the
> >> >> KIP
> >> >> > > >> >>>process.
> >> >> > > >> >>> Here is the KIP link:
> >> >> > > >> >>>
> >> >> > > >> >>>
> >> >> > > >>
> >> >> > >
> >> >>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >> >> > > >> >>>2
> >> >> > > >> >>>
> >> >> > > >> >>> Thanks.
> >> >> > > >> >>>
> >> >> > > >> >>> Jiangjie (Becket) Qin
> >> >> > > >> >>>
> >> >> > > >> >
> >> >> > > >>
> >> >> > > >>
> >> >> > >
> >> >> > >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Thanks,
> >> >> Neha
> >> >>
> >> >
> >> >
> >> >
> >> >--
> >> >-- Guozhang
> >>
> >>
> >
> >
> >--
> >-- Guozhang
>
>


-- 
-- Guozhang

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

It looks that the problem we want to solve and the purpose we want to
achieve is:
If user uses close() in callback, we want to let user be aware that they
should use close(0) instead of close() in the callback.

We have agreed that we will have an error log to inform user about this
mis-usage. The options differ in the way how we can force user to take a
look at that error log.
There are two scenarios:
1. User does not expect the program to exit.
2. User expect the program to exit.

For scenario 1), blocking will probably delay the discovery of the
problem. Calling close(0) exposes the problem quicker. In this scenario
producer just encounter a send failure when running normally.
For scenario 2), blocking will expose the problem quick. Calling close(-1)
might hide the problem. This scenario might include: a) Unit test for a
send failure. b) Message sending during a close() call from a user thread.

So as a summary table:

                  Scenario 1)                         Scenario 2)

Blocking      Delay problem discovery         Guaranteed problem discovery

Close(-1)     Immediate problem discovery     Problem might be hidden


Personally I prefer blocking because it seems providing more guarantees
and safer.

Thanks.

Jiangjie (Becket) Qin


On 3/16/15, 10:11 AM, "Guozhang Wang" <wa...@gmail.com> wrote:

>HI Jiangjie,
>
>As far as I understand calling close() in the ioThread is not common, as
>it
>may only trigger when we saw some non-retriable error. Hence when user run
>their program it is unlikely that close() will be triggered and problem
>will be detected. So it seems to me that from the error detection aspect
>these two options seems to be the same as people will usually detect it
>from the producer metrics all dropping to 0.
>
>Guozhang
>
>On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
>wrote:
>
>> It seems there are two options we can choose from when close() is called
>> from sender thread (callback):
>> 1. Log an error and close the producer using close(-1)
>> 2. Log an error and block.
>> (Throwing an exception will not work because we catch all the exception
>> thrown from user callback. It will just lead to an error log.)
>>
>> My concern for the first option is that the producer will be closed even
>> if we logged and error. I am wondering if some user would not even take
>>a
>> look at the log if producer is closed normally. Because from the
>>programs
>> behavior, everything looks good. If that is the case, the error message
>>we
>> logged probably will just be ignored until some day when people check
>>the
>> log and see it.
>>
>> As for the second option, because producer does not close but blocks.
>>User
>> will notice this the first time they run the program. They probably will
>> look at the log to see why producer could not be closed and they will
>>see
>> the error log we put there. So they will get informed about this
>>mis-usage
>> of close() in sender thread the first time they run the code instead of
>> some time later.
>>
>> Personally I prefer the second one because it is more obvious that
>> something was wrong.
>>
>> Jiangjie (Becket) Qin
>>
>> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
>>
>> >Yeah I agree we should not silently change the behavior of the function
>> >with the given parameters; and I would prefer
>>error-logging-and-shutdown
>> >over blocking when close(>0) is used, since as Neha suggested blocking
>> >would also not proceed with sending any data, bu will just let users to
>> >realize the issue later than sooner.
>> >
>> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede <ne...@confluent.io>
>>wrote:
>> >
>> >> >
>> >> > And I also agree it is better if we can make producer block when
>> >> > close() is called from sender thread so user will notice something
>> >>went
>> >> > wrong.
>> >>
>> >>
>> >> This isn't a great experience either. Why can't we just throw an
>> >>exception
>> >> for a behavior we know is incorrect and we'd like the user to know.
>> >> Blocking as a means of doing that seems wrong and annoying.
>> >>
>> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps <ja...@gmail.com>
>> wrote:
>> >>
>> >> > Cool.
>> >> >
>> >> > I think blocking is good or alternately throwing an exception
>>directly
>> >> from
>> >> > close(). Basically I would just worry about subtly doing something
>> >> slightly
>> >> > different from what the user asked for as it will be hard to notice
>> >>that
>> >> > behavior difference.
>> >> >
>> >> > -Jay
>> >> >
>> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
>> >><jqin@linkedin.com.invalid
>> >> >
>> >> > wrote:
>> >> >
>> >> > > Hi Jay,
>> >> > >
>> >> > > I have modified the KIP as you suggested. I thinks as long as we
>> >>have
>> >> > > consistent define for timeout across Kafka interface, there would
>> >>be no
>> >> > > problem. And I also agree it is better if we can make producer
>>block
>> >> when
>> >> > > close() is called from sender thread so user will notice
>>something
>> >>went
>> >> > > wrong.
>> >> > >
>> >> > > Thanks.
>> >> > >
>> >> > > Jiangjie (Becket) Qin
>> >> > >
>> >> > > On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>> >> > >
>> >> > > >Hey Jiangjie,
>> >> > > >
>> >> > > >I think this is going to be very confusing that
>> >> > > >  close(0) waits indefinitely and
>> >> > > >  close(-1) waits for 0.
>> >> > > >I understand this appears in other apis, but it is a constant
>> >>cause of
>> >> > > >bugs. Let's not repeat that mistake.
>> >> > > >
>> >> > > >Let's make close(0) wait for 0. We don't need a way to wait
>> >> indefinitely
>> >> > > >as
>> >> > > >we already have close() so having a magical constant for that is
>> >> > > >redundant.
>> >> > > >
>> >> > > >Calling close() from the I/O thread was already possible and
>>would
>> >> block
>> >> > > >indefinitely. I think trying to silently change the behavior is
>> >> probably
>> >> > > >not right. I.e. if the user calls close() in the callback there
>>is
>> >> > > >actually
>> >> > > >some misunderstanding and they need to think more, silently
>>making
>> >> this
>> >> > > >not
>> >> > > >block will hide the problem from them which is the opposite of
>> >>what we
>> >> > > >want.
>> >> > > >
>> >> > > >-Jay
>> >> > > >
>> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
>> >> <jqin@linkedin.com.invalid
>> >> > >
>> >> > > >wrote:
>> >> > > >
>> >> > > >> Hey Joe & Jay,
>> >> > > >>
>> >> > > >> Thanks for the comments on the voting thread. Since it seems
>>we
>> >> > probably
>> >> > > >> will have more discussion on this, I am just replying from the
>> >> > > >>discussion
>> >> > > >> thread here.
>> >> > > >> I’ve updated the KIP page to make it less like half-baked,
>> >>apologize
>> >> > for
>> >> > > >> the rush...
>> >> > > >>
>> >> > > >> The contract in current KIP is:
>> >> > > >>   1. close() - wait until all requests either are sent or
>>reach
>> >> > request
>> >> > > >> timeout.
>> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
>> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(),
>> >>i.e.
>> >> > Wait
>> >> > > >> until all requests are sent or reach request timeout
>> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish
>> >> sending
>> >> > > >>in 5
>> >> > > >> milliseconds, if something went wrong, just shutdown the
>>producer
>> >> > > >>anyway,
>> >> > > >> my callback will handle the failures.
>> >> > > >>
>> >> > > >> About how we define what timeout value stands for, I actually
>> >> > struggled
>> >> > > >>a
>> >> > > >> little bit when wrote the patch. Intuitively, close(0) should
>> >>mean
>> >> > > >> immediately, however it seems that all the existing java class
>> >>have
>> >> > this
>> >> > > >> convention of timeout=0 means no timeout or never timeout
>> >> > > >>(Thread.join(0),
>> >> > > >> Object.wait(0), etc.) So here the dilemma is either we follow
>>the
>> >> > > >> intuition or we follow the convention. What I chose is to
>>follow
>> >>the
>> >> > > >> convention but document the interface to let user be aware of
>>the
>> >> > usage.
>> >> > > >> The reason is that I think producer.close() is a public
>> >>interface so
>> >> > it
>> >> > > >> might be better to follow java convention. Whereas selector is
>> >>not a
>> >> > > >> public interface that used by user, so as long as it makes
>>sense
>> >>to
>> >> > us,
>> >> > > >>it
>> >> > > >> is less a problem to be different from java convention. That
>>said
>> >> > since
>> >> > > >> consumer.poll(timeout) is also a public interface, I think it
>> >>also
>> >> > makes
>> >> > > >> sense to make producer.close() to have the same definition of
>> >> > > >> consumer.poll(timeout).
>> >> > > >>
>> >> > > >> The main argument for keeping a timeout in close would be
>> >>separating
>> >> > the
>> >> > > >> close timeout from request timeout, which probably makes
>>sense. I
>> >> > would
>> >> > > >> guess typically the request timeout would be long (e.g. 60
>> >>seconds)
>> >> > > >> because we might want to consider retries with back off time.
>>If
>> >>we
>> >> > have
>> >> > > >> multiple batches in accumulator, in worst case that could take
>> >>up to
>> >> > > >> several minutes to complete all the requests. But when we
>>close a
>> >> > > >> producer, we might not want to wait for that long as it might
>> >>cause
>> >> > some
>> >> > > >> other problem like deployment tool timeout.
>> >> > > >>
>> >> > > >> There is also a subtle difference between close(timeout) and
>> >> > > >> flush(timeout). The only purpose for flush() is to write data
>>to
>> >>the
>> >> > > >> broker, so it makes perfect sense to wait until request
>>timeout.
>> >>I
>> >> > think
>> >> > > >> that is why flush(timeout) looks strange. On the other hand,
>>the
>> >>top
>> >> > > >> priority for close() is to close the producer rather than
>>flush()
>> >> > data,
>> >> > > >>so
>> >> > > >> close(timeout) gives guarantee on bounded waiting for its main
>> >>job.
>> >> > > >>
>> >> > > >> Sorry for the confusion about forceClose flag. It is not a
>>public
>> >> > > >> interface. I mentioned it in Proposed Changes section which I
>> >> thought
>> >> > > >>was
>> >> > > >> supposed to provide implementation details.
>> >> > > >>
>> >> > > >> Thanks again for all the comments and suggestions!
>> >> > > >>
>> >> > > >> Jiangjie (Becket) Qin
>> >> > > >>
>> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>> >> > > >>
>> >> > > >> >The KIP page has been updated per Jay¹s comments.
>> >> > > >> >I¹d like to initiate the voting process if no further
>>comments
>> >>are
>> >> > > >> >received by tomorrow.
>> >> > > >> >
>> >> > > >> >Jiangjie (Becket) Qin
>> >> > > >> >
>> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>> >> > > >> >
>> >> > > >> >>Hey Jiangjie,
>> >> > > >> >>
>> >> > > >> >>Can you capture the full motivation and use cases for the
>> >>feature?
>> >> > > >>This
>> >> > > >> >>mentions your interest in having a way of aborting from
>>inside
>> >>the
>> >> > > >> >>Callback. But it doesn't really explain that usage or why
>>other
>> >> > people
>> >> > > >> >>would want to do that. It also doesn't list the primary use
>> >>case
>> >> for
>> >> > > >> >>having
>> >> > > >> >>close with a bounded timeout which was to avoid blocking too
>> >>long
>> >> on
>> >> > > >> >>shutdown.
>> >> > > >> >>
>> >> > > >> >>-Jay
>> >> > > >> >>
>> >> > > >> >>
>> >> > > >> >>
>> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
>> >> > > >><jqin@linkedin.com.invalid
>> >> > > >> >
>> >> > > >> >>wrote:
>> >> > > >> >>
>> >> > > >> >>> Hi,
>> >> > > >> >>>
>> >> > > >> >>> I just created a KIP for adding a close(timeout) to new
>> >> producer.
>> >> > > >>Most
>> >> > > >> >>>of
>> >> > > >> >>> the previous discussions are in KAFKA-1660 where Parth
>> >> Brahmbhatt
>> >> > > >>has
>> >> > > >> >>> already done a lot of work.
>> >> > > >> >>> Since this is an interface change so we are going through
>>the
>> >> KIP
>> >> > > >> >>>process.
>> >> > > >> >>> Here is the KIP link:
>> >> > > >> >>>
>> >> > > >> >>>
>> >> > > >>
>> >> > >
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>> >> > > >> >>>2
>> >> > > >> >>>
>> >> > > >> >>> Thanks.
>> >> > > >> >>>
>> >> > > >> >>> Jiangjie (Becket) Qin
>> >> > > >> >>>
>> >> > > >> >
>> >> > > >>
>> >> > > >>
>> >> > >
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Thanks,
>> >> Neha
>> >>
>> >
>> >
>> >
>> >--
>> >-- Guozhang
>>
>>
>
>
>-- 
>-- Guozhang

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Guozhang Wang <wa...@gmail.com>.

HI Jiangjie,

As far as I understand calling close() in the ioThread is not common, as it
may only trigger when we saw some non-retriable error. Hence when user run
their program it is unlikely that close() will be triggered and problem
will be detected. So it seems to me that from the error detection aspect
these two options seems to be the same as people will usually detect it
from the producer metrics all dropping to 0.

Guozhang

On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> It seems there are two options we can choose from when close() is called
> from sender thread (callback):
> 1. Log an error and close the producer using close(-1)
> 2. Log an error and block.
> (Throwing an exception will not work because we catch all the exception
> thrown from user callback. It will just lead to an error log.)
>
> My concern for the first option is that the producer will be closed even
> if we logged and error. I am wondering if some user would not even take a
> look at the log if producer is closed normally. Because from the programs
> behavior, everything looks good. If that is the case, the error message we
> logged probably will just be ignored until some day when people check the
> log and see it.
>
> As for the second option, because producer does not close but blocks. User
> will notice this the first time they run the program. They probably will
> look at the log to see why producer could not be closed and they will see
> the error log we put there. So they will get informed about this mis-usage
> of close() in sender thread the first time they run the code instead of
> some time later.
>
> Personally I prefer the second one because it is more obvious that
> something was wrong.
>
> Jiangjie (Becket) Qin
>
> On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
>
> >Yeah I agree we should not silently change the behavior of the function
> >with the given parameters; and I would prefer error-logging-and-shutdown
> >over blocking when close(>0) is used, since as Neha suggested blocking
> >would also not proceed with sending any data, bu will just let users to
> >realize the issue later than sooner.
> >
> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede <ne...@confluent.io> wrote:
> >
> >> >
> >> > And I also agree it is better if we can make producer block when
> >> > close() is called from sender thread so user will notice something
> >>went
> >> > wrong.
> >>
> >>
> >> This isn't a great experience either. Why can't we just throw an
> >>exception
> >> for a behavior we know is incorrect and we'd like the user to know.
> >> Blocking as a means of doing that seems wrong and annoying.
> >>
> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps <ja...@gmail.com>
> wrote:
> >>
> >> > Cool.
> >> >
> >> > I think blocking is good or alternately throwing an exception directly
> >> from
> >> > close(). Basically I would just worry about subtly doing something
> >> slightly
> >> > different from what the user asked for as it will be hard to notice
> >>that
> >> > behavior difference.
> >> >
> >> > -Jay
> >> >
> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> >><jqin@linkedin.com.invalid
> >> >
> >> > wrote:
> >> >
> >> > > Hi Jay,
> >> > >
> >> > > I have modified the KIP as you suggested. I thinks as long as we
> >>have
> >> > > consistent define for timeout across Kafka interface, there would
> >>be no
> >> > > problem. And I also agree it is better if we can make producer block
> >> when
> >> > > close() is called from sender thread so user will notice something
> >>went
> >> > > wrong.
> >> > >
> >> > > Thanks.
> >> > >
> >> > > Jiangjie (Becket) Qin
> >> > >
> >> > > On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> >> > >
> >> > > >Hey Jiangjie,
> >> > > >
> >> > > >I think this is going to be very confusing that
> >> > > >  close(0) waits indefinitely and
> >> > > >  close(-1) waits for 0.
> >> > > >I understand this appears in other apis, but it is a constant
> >>cause of
> >> > > >bugs. Let's not repeat that mistake.
> >> > > >
> >> > > >Let's make close(0) wait for 0. We don't need a way to wait
> >> indefinitely
> >> > > >as
> >> > > >we already have close() so having a magical constant for that is
> >> > > >redundant.
> >> > > >
> >> > > >Calling close() from the I/O thread was already possible and would
> >> block
> >> > > >indefinitely. I think trying to silently change the behavior is
> >> probably
> >> > > >not right. I.e. if the user calls close() in the callback there is
> >> > > >actually
> >> > > >some misunderstanding and they need to think more, silently making
> >> this
> >> > > >not
> >> > > >block will hide the problem from them which is the opposite of
> >>what we
> >> > > >want.
> >> > > >
> >> > > >-Jay
> >> > > >
> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> >> <jqin@linkedin.com.invalid
> >> > >
> >> > > >wrote:
> >> > > >
> >> > > >> Hey Joe & Jay,
> >> > > >>
> >> > > >> Thanks for the comments on the voting thread. Since it seems we
> >> > probably
> >> > > >> will have more discussion on this, I am just replying from the
> >> > > >>discussion
> >> > > >> thread here.
> >> > > >> I’ve updated the KIP page to make it less like half-baked,
> >>apologize
> >> > for
> >> > > >> the rush...
> >> > > >>
> >> > > >> The contract in current KIP is:
> >> > > >>   1. close() - wait until all requests either are sent or reach
> >> > request
> >> > > >> timeout.
> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(),
> >>i.e.
> >> > Wait
> >> > > >> until all requests are sent or reach request timeout
> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish
> >> sending
> >> > > >>in 5
> >> > > >> milliseconds, if something went wrong, just shutdown the producer
> >> > > >>anyway,
> >> > > >> my callback will handle the failures.
> >> > > >>
> >> > > >> About how we define what timeout value stands for, I actually
> >> > struggled
> >> > > >>a
> >> > > >> little bit when wrote the patch. Intuitively, close(0) should
> >>mean
> >> > > >> immediately, however it seems that all the existing java class
> >>have
> >> > this
> >> > > >> convention of timeout=0 means no timeout or never timeout
> >> > > >>(Thread.join(0),
> >> > > >> Object.wait(0), etc.) So here the dilemma is either we follow the
> >> > > >> intuition or we follow the convention. What I chose is to follow
> >>the
> >> > > >> convention but document the interface to let user be aware of the
> >> > usage.
> >> > > >> The reason is that I think producer.close() is a public
> >>interface so
> >> > it
> >> > > >> might be better to follow java convention. Whereas selector is
> >>not a
> >> > > >> public interface that used by user, so as long as it makes sense
> >>to
> >> > us,
> >> > > >>it
> >> > > >> is less a problem to be different from java convention. That said
> >> > since
> >> > > >> consumer.poll(timeout) is also a public interface, I think it
> >>also
> >> > makes
> >> > > >> sense to make producer.close() to have the same definition of
> >> > > >> consumer.poll(timeout).
> >> > > >>
> >> > > >> The main argument for keeping a timeout in close would be
> >>separating
> >> > the
> >> > > >> close timeout from request timeout, which probably makes sense. I
> >> > would
> >> > > >> guess typically the request timeout would be long (e.g. 60
> >>seconds)
> >> > > >> because we might want to consider retries with back off time. If
> >>we
> >> > have
> >> > > >> multiple batches in accumulator, in worst case that could take
> >>up to
> >> > > >> several minutes to complete all the requests. But when we close a
> >> > > >> producer, we might not want to wait for that long as it might
> >>cause
> >> > some
> >> > > >> other problem like deployment tool timeout.
> >> > > >>
> >> > > >> There is also a subtle difference between close(timeout) and
> >> > > >> flush(timeout). The only purpose for flush() is to write data to
> >>the
> >> > > >> broker, so it makes perfect sense to wait until request timeout.
> >>I
> >> > think
> >> > > >> that is why flush(timeout) looks strange. On the other hand, the
> >>top
> >> > > >> priority for close() is to close the producer rather than flush()
> >> > data,
> >> > > >>so
> >> > > >> close(timeout) gives guarantee on bounded waiting for its main
> >>job.
> >> > > >>
> >> > > >> Sorry for the confusion about forceClose flag. It is not a public
> >> > > >> interface. I mentioned it in Proposed Changes section which I
> >> thought
> >> > > >>was
> >> > > >> supposed to provide implementation details.
> >> > > >>
> >> > > >> Thanks again for all the comments and suggestions!
> >> > > >>
> >> > > >> Jiangjie (Becket) Qin
> >> > > >>
> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
> >> > > >>
> >> > > >> >The KIP page has been updated per Jay¹s comments.
> >> > > >> >I¹d like to initiate the voting process if no further comments
> >>are
> >> > > >> >received by tomorrow.
> >> > > >> >
> >> > > >> >Jiangjie (Becket) Qin
> >> > > >> >
> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> >> > > >> >
> >> > > >> >>Hey Jiangjie,
> >> > > >> >>
> >> > > >> >>Can you capture the full motivation and use cases for the
> >>feature?
> >> > > >>This
> >> > > >> >>mentions your interest in having a way of aborting from inside
> >>the
> >> > > >> >>Callback. But it doesn't really explain that usage or why other
> >> > people
> >> > > >> >>would want to do that. It also doesn't list the primary use
> >>case
> >> for
> >> > > >> >>having
> >> > > >> >>close with a bounded timeout which was to avoid blocking too
> >>long
> >> on
> >> > > >> >>shutdown.
> >> > > >> >>
> >> > > >> >>-Jay
> >> > > >> >>
> >> > > >> >>
> >> > > >> >>
> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> >> > > >><jqin@linkedin.com.invalid
> >> > > >> >
> >> > > >> >>wrote:
> >> > > >> >>
> >> > > >> >>> Hi,
> >> > > >> >>>
> >> > > >> >>> I just created a KIP for adding a close(timeout) to new
> >> producer.
> >> > > >>Most
> >> > > >> >>>of
> >> > > >> >>> the previous discussions are in KAFKA-1660 where Parth
> >> Brahmbhatt
> >> > > >>has
> >> > > >> >>> already done a lot of work.
> >> > > >> >>> Since this is an interface change so we are going through the
> >> KIP
> >> > > >> >>>process.
> >> > > >> >>> Here is the KIP link:
> >> > > >> >>>
> >> > > >> >>>
> >> > > >>
> >> > >
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >> > > >> >>>2
> >> > > >> >>>
> >> > > >> >>> Thanks.
> >> > > >> >>>
> >> > > >> >>> Jiangjie (Becket) Qin
> >> > > >> >>>
> >> > > >> >
> >> > > >>
> >> > > >>
> >> > >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks,
> >> Neha
> >>
> >
> >
> >
> >--
> >-- Guozhang
>
>


-- 
-- Guozhang

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

It seems there are two options we can choose from when close() is called
from sender thread (callback):
1. Log an error and close the producer using close(-1)
2. Log an error and block.
(Throwing an exception will not work because we catch all the exception
thrown from user callback. It will just lead to an error log.)

My concern for the first option is that the producer will be closed even
if we logged and error. I am wondering if some user would not even take a
look at the log if producer is closed normally. Because from the programs
behavior, everything looks good. If that is the case, the error message we
logged probably will just be ignored until some day when people check the
log and see it.

As for the second option, because producer does not close but blocks. User
will notice this the first time they run the program. They probably will
look at the log to see why producer could not be closed and they will see
the error log we put there. So they will get informed about this mis-usage
of close() in sender thread the first time they run the code instead of
some time later.

Personally I prefer the second one because it is more obvious that
something was wrong.

Jiangjie (Becket) Qin

On 3/15/15, 4:27 PM, "Guozhang Wang" <wa...@gmail.com> wrote:

>Yeah I agree we should not silently change the behavior of the function
>with the given parameters; and I would prefer error-logging-and-shutdown
>over blocking when close(>0) is used, since as Neha suggested blocking
>would also not proceed with sending any data, bu will just let users to
>realize the issue later than sooner.
>
>On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede <ne...@confluent.io> wrote:
>
>> >
>> > And I also agree it is better if we can make producer block when
>> > close() is called from sender thread so user will notice something
>>went
>> > wrong.
>>
>>
>> This isn't a great experience either. Why can't we just throw an
>>exception
>> for a behavior we know is incorrect and we'd like the user to know.
>> Blocking as a means of doing that seems wrong and annoying.
>>
>> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps <ja...@gmail.com> wrote:
>>
>> > Cool.
>> >
>> > I think blocking is good or alternately throwing an exception directly
>> from
>> > close(). Basically I would just worry about subtly doing something
>> slightly
>> > different from what the user asked for as it will be hard to notice
>>that
>> > behavior difference.
>> >
>> > -Jay
>> >
>> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
>><jqin@linkedin.com.invalid
>> >
>> > wrote:
>> >
>> > > Hi Jay,
>> > >
>> > > I have modified the KIP as you suggested. I thinks as long as we
>>have
>> > > consistent define for timeout across Kafka interface, there would
>>be no
>> > > problem. And I also agree it is better if we can make producer block
>> when
>> > > close() is called from sender thread so user will notice something
>>went
>> > > wrong.
>> > >
>> > > Thanks.
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > > On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>> > >
>> > > >Hey Jiangjie,
>> > > >
>> > > >I think this is going to be very confusing that
>> > > >  close(0) waits indefinitely and
>> > > >  close(-1) waits for 0.
>> > > >I understand this appears in other apis, but it is a constant
>>cause of
>> > > >bugs. Let's not repeat that mistake.
>> > > >
>> > > >Let's make close(0) wait for 0. We don't need a way to wait
>> indefinitely
>> > > >as
>> > > >we already have close() so having a magical constant for that is
>> > > >redundant.
>> > > >
>> > > >Calling close() from the I/O thread was already possible and would
>> block
>> > > >indefinitely. I think trying to silently change the behavior is
>> probably
>> > > >not right. I.e. if the user calls close() in the callback there is
>> > > >actually
>> > > >some misunderstanding and they need to think more, silently making
>> this
>> > > >not
>> > > >block will hide the problem from them which is the opposite of
>>what we
>> > > >want.
>> > > >
>> > > >-Jay
>> > > >
>> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
>> <jqin@linkedin.com.invalid
>> > >
>> > > >wrote:
>> > > >
>> > > >> Hey Joe & Jay,
>> > > >>
>> > > >> Thanks for the comments on the voting thread. Since it seems we
>> > probably
>> > > >> will have more discussion on this, I am just replying from the
>> > > >>discussion
>> > > >> thread here.
>> > > >> I’ve updated the KIP page to make it less like half-baked,
>>apologize
>> > for
>> > > >> the rush...
>> > > >>
>> > > >> The contract in current KIP is:
>> > > >>   1. close() - wait until all requests either are sent or reach
>> > request
>> > > >> timeout.
>> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
>> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(),
>>i.e.
>> > Wait
>> > > >> until all requests are sent or reach request timeout
>> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish
>> sending
>> > > >>in 5
>> > > >> milliseconds, if something went wrong, just shutdown the producer
>> > > >>anyway,
>> > > >> my callback will handle the failures.
>> > > >>
>> > > >> About how we define what timeout value stands for, I actually
>> > struggled
>> > > >>a
>> > > >> little bit when wrote the patch. Intuitively, close(0) should
>>mean
>> > > >> immediately, however it seems that all the existing java class
>>have
>> > this
>> > > >> convention of timeout=0 means no timeout or never timeout
>> > > >>(Thread.join(0),
>> > > >> Object.wait(0), etc.) So here the dilemma is either we follow the
>> > > >> intuition or we follow the convention. What I chose is to follow
>>the
>> > > >> convention but document the interface to let user be aware of the
>> > usage.
>> > > >> The reason is that I think producer.close() is a public
>>interface so
>> > it
>> > > >> might be better to follow java convention. Whereas selector is
>>not a
>> > > >> public interface that used by user, so as long as it makes sense
>>to
>> > us,
>> > > >>it
>> > > >> is less a problem to be different from java convention. That said
>> > since
>> > > >> consumer.poll(timeout) is also a public interface, I think it
>>also
>> > makes
>> > > >> sense to make producer.close() to have the same definition of
>> > > >> consumer.poll(timeout).
>> > > >>
>> > > >> The main argument for keeping a timeout in close would be
>>separating
>> > the
>> > > >> close timeout from request timeout, which probably makes sense. I
>> > would
>> > > >> guess typically the request timeout would be long (e.g. 60
>>seconds)
>> > > >> because we might want to consider retries with back off time. If
>>we
>> > have
>> > > >> multiple batches in accumulator, in worst case that could take
>>up to
>> > > >> several minutes to complete all the requests. But when we close a
>> > > >> producer, we might not want to wait for that long as it might
>>cause
>> > some
>> > > >> other problem like deployment tool timeout.
>> > > >>
>> > > >> There is also a subtle difference between close(timeout) and
>> > > >> flush(timeout). The only purpose for flush() is to write data to
>>the
>> > > >> broker, so it makes perfect sense to wait until request timeout.
>>I
>> > think
>> > > >> that is why flush(timeout) looks strange. On the other hand, the
>>top
>> > > >> priority for close() is to close the producer rather than flush()
>> > data,
>> > > >>so
>> > > >> close(timeout) gives guarantee on bounded waiting for its main
>>job.
>> > > >>
>> > > >> Sorry for the confusion about forceClose flag. It is not a public
>> > > >> interface. I mentioned it in Proposed Changes section which I
>> thought
>> > > >>was
>> > > >> supposed to provide implementation details.
>> > > >>
>> > > >> Thanks again for all the comments and suggestions!
>> > > >>
>> > > >> Jiangjie (Becket) Qin
>> > > >>
>> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>> > > >>
>> > > >> >The KIP page has been updated per Jay¹s comments.
>> > > >> >I¹d like to initiate the voting process if no further comments
>>are
>> > > >> >received by tomorrow.
>> > > >> >
>> > > >> >Jiangjie (Becket) Qin
>> > > >> >
>> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>> > > >> >
>> > > >> >>Hey Jiangjie,
>> > > >> >>
>> > > >> >>Can you capture the full motivation and use cases for the
>>feature?
>> > > >>This
>> > > >> >>mentions your interest in having a way of aborting from inside
>>the
>> > > >> >>Callback. But it doesn't really explain that usage or why other
>> > people
>> > > >> >>would want to do that. It also doesn't list the primary use
>>case
>> for
>> > > >> >>having
>> > > >> >>close with a bounded timeout which was to avoid blocking too
>>long
>> on
>> > > >> >>shutdown.
>> > > >> >>
>> > > >> >>-Jay
>> > > >> >>
>> > > >> >>
>> > > >> >>
>> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
>> > > >><jqin@linkedin.com.invalid
>> > > >> >
>> > > >> >>wrote:
>> > > >> >>
>> > > >> >>> Hi,
>> > > >> >>>
>> > > >> >>> I just created a KIP for adding a close(timeout) to new
>> producer.
>> > > >>Most
>> > > >> >>>of
>> > > >> >>> the previous discussions are in KAFKA-1660 where Parth
>> Brahmbhatt
>> > > >>has
>> > > >> >>> already done a lot of work.
>> > > >> >>> Since this is an interface change so we are going through the
>> KIP
>> > > >> >>>process.
>> > > >> >>> Here is the KIP link:
>> > > >> >>>
>> > > >> >>>
>> > > >>
>> > >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>> > > >> >>>2
>> > > >> >>>
>> > > >> >>> Thanks.
>> > > >> >>>
>> > > >> >>> Jiangjie (Becket) Qin
>> > > >> >>>
>> > > >> >
>> > > >>
>> > > >>
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> Thanks,
>> Neha
>>
>
>
>
>-- 
>-- Guozhang

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Guozhang Wang <wa...@gmail.com>.

Yeah I agree we should not silently change the behavior of the function
with the given parameters; and I would prefer error-logging-and-shutdown
over blocking when close(>0) is used, since as Neha suggested blocking
would also not proceed with sending any data, bu will just let users to
realize the issue later than sooner.

On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede <ne...@confluent.io> wrote:

> >
> > And I also agree it is better if we can make producer block when
> > close() is called from sender thread so user will notice something went
> > wrong.
>
>
> This isn't a great experience either. Why can't we just throw an exception
> for a behavior we know is incorrect and we'd like the user to know.
> Blocking as a means of doing that seems wrong and annoying.
>
> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > Cool.
> >
> > I think blocking is good or alternately throwing an exception directly
> from
> > close(). Basically I would just worry about subtly doing something
> slightly
> > different from what the user asked for as it will be hard to notice that
> > behavior difference.
> >
> > -Jay
> >
> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin <jqin@linkedin.com.invalid
> >
> > wrote:
> >
> > > Hi Jay,
> > >
> > > I have modified the KIP as you suggested. I thinks as long as we have
> > > consistent define for timeout across Kafka interface, there would be no
> > > problem. And I also agree it is better if we can make producer block
> when
> > > close() is called from sender thread so user will notice something went
> > > wrong.
> > >
> > > Thanks.
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> > >
> > > >Hey Jiangjie,
> > > >
> > > >I think this is going to be very confusing that
> > > >  close(0) waits indefinitely and
> > > >  close(-1) waits for 0.
> > > >I understand this appears in other apis, but it is a constant cause of
> > > >bugs. Let's not repeat that mistake.
> > > >
> > > >Let's make close(0) wait for 0. We don't need a way to wait
> indefinitely
> > > >as
> > > >we already have close() so having a magical constant for that is
> > > >redundant.
> > > >
> > > >Calling close() from the I/O thread was already possible and would
> block
> > > >indefinitely. I think trying to silently change the behavior is
> probably
> > > >not right. I.e. if the user calls close() in the callback there is
> > > >actually
> > > >some misunderstanding and they need to think more, silently making
> this
> > > >not
> > > >block will hide the problem from them which is the opposite of what we
> > > >want.
> > > >
> > > >-Jay
> > > >
> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> <jqin@linkedin.com.invalid
> > >
> > > >wrote:
> > > >
> > > >> Hey Joe & Jay,
> > > >>
> > > >> Thanks for the comments on the voting thread. Since it seems we
> > probably
> > > >> will have more discussion on this, I am just replying from the
> > > >>discussion
> > > >> thread here.
> > > >> I’ve updated the KIP page to make it less like half-baked, apologize
> > for
> > > >> the rush...
> > > >>
> > > >> The contract in current KIP is:
> > > >>   1. close() - wait until all requests either are sent or reach
> > request
> > > >> timeout.
> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e.
> > Wait
> > > >> until all requests are sent or reach request timeout
> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish
> sending
> > > >>in 5
> > > >> milliseconds, if something went wrong, just shutdown the producer
> > > >>anyway,
> > > >> my callback will handle the failures.
> > > >>
> > > >> About how we define what timeout value stands for, I actually
> > struggled
> > > >>a
> > > >> little bit when wrote the patch. Intuitively, close(0) should mean
> > > >> immediately, however it seems that all the existing java class have
> > this
> > > >> convention of timeout=0 means no timeout or never timeout
> > > >>(Thread.join(0),
> > > >> Object.wait(0), etc.) So here the dilemma is either we follow the
> > > >> intuition or we follow the convention. What I chose is to follow the
> > > >> convention but document the interface to let user be aware of the
> > usage.
> > > >> The reason is that I think producer.close() is a public interface so
> > it
> > > >> might be better to follow java convention. Whereas selector is not a
> > > >> public interface that used by user, so as long as it makes sense to
> > us,
> > > >>it
> > > >> is less a problem to be different from java convention. That said
> > since
> > > >> consumer.poll(timeout) is also a public interface, I think it also
> > makes
> > > >> sense to make producer.close() to have the same definition of
> > > >> consumer.poll(timeout).
> > > >>
> > > >> The main argument for keeping a timeout in close would be separating
> > the
> > > >> close timeout from request timeout, which probably makes sense. I
> > would
> > > >> guess typically the request timeout would be long (e.g. 60 seconds)
> > > >> because we might want to consider retries with back off time. If we
> > have
> > > >> multiple batches in accumulator, in worst case that could take up to
> > > >> several minutes to complete all the requests. But when we close a
> > > >> producer, we might not want to wait for that long as it might cause
> > some
> > > >> other problem like deployment tool timeout.
> > > >>
> > > >> There is also a subtle difference between close(timeout) and
> > > >> flush(timeout). The only purpose for flush() is to write data to the
> > > >> broker, so it makes perfect sense to wait until request timeout. I
> > think
> > > >> that is why flush(timeout) looks strange. On the other hand, the top
> > > >> priority for close() is to close the producer rather than flush()
> > data,
> > > >>so
> > > >> close(timeout) gives guarantee on bounded waiting for its main job.
> > > >>
> > > >> Sorry for the confusion about forceClose flag. It is not a public
> > > >> interface. I mentioned it in Proposed Changes section which I
> thought
> > > >>was
> > > >> supposed to provide implementation details.
> > > >>
> > > >> Thanks again for all the comments and suggestions!
> > > >>
> > > >> Jiangjie (Becket) Qin
> > > >>
> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
> > > >>
> > > >> >The KIP page has been updated per Jay¹s comments.
> > > >> >I¹d like to initiate the voting process if no further comments are
> > > >> >received by tomorrow.
> > > >> >
> > > >> >Jiangjie (Becket) Qin
> > > >> >
> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> > > >> >
> > > >> >>Hey Jiangjie,
> > > >> >>
> > > >> >>Can you capture the full motivation and use cases for the feature?
> > > >>This
> > > >> >>mentions your interest in having a way of aborting from inside the
> > > >> >>Callback. But it doesn't really explain that usage or why other
> > people
> > > >> >>would want to do that. It also doesn't list the primary use case
> for
> > > >> >>having
> > > >> >>close with a bounded timeout which was to avoid blocking too long
> on
> > > >> >>shutdown.
> > > >> >>
> > > >> >>-Jay
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> > > >><jqin@linkedin.com.invalid
> > > >> >
> > > >> >>wrote:
> > > >> >>
> > > >> >>> Hi,
> > > >> >>>
> > > >> >>> I just created a KIP for adding a close(timeout) to new
> producer.
> > > >>Most
> > > >> >>>of
> > > >> >>> the previous discussions are in KAFKA-1660 where Parth
> Brahmbhatt
> > > >>has
> > > >> >>> already done a lot of work.
> > > >> >>> Since this is an interface change so we are going through the
> KIP
> > > >> >>>process.
> > > >> >>> Here is the KIP link:
> > > >> >>>
> > > >> >>>
> > > >>
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> > > >> >>>2
> > > >> >>>
> > > >> >>> Thanks.
> > > >> >>>
> > > >> >>> Jiangjie (Becket) Qin
> > > >> >>>
> > > >> >
> > > >>
> > > >>
> > >
> > >
> >
>
>
>
> --
> Thanks,
> Neha
>



-- 
-- Guozhang

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Neha Narkhede <ne...@confluent.io>.

>
> And I also agree it is better if we can make producer block when
> close() is called from sender thread so user will notice something went
> wrong.


This isn't a great experience either. Why can't we just throw an exception
for a behavior we know is incorrect and we'd like the user to know.
Blocking as a means of doing that seems wrong and annoying.

On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps <ja...@gmail.com> wrote:

> Cool.
>
> I think blocking is good or alternately throwing an exception directly from
> close(). Basically I would just worry about subtly doing something slightly
> different from what the user asked for as it will be hard to notice that
> behavior difference.
>
> -Jay
>
> On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
> wrote:
>
> > Hi Jay,
> >
> > I have modified the KIP as you suggested. I thinks as long as we have
> > consistent define for timeout across Kafka interface, there would be no
> > problem. And I also agree it is better if we can make producer block when
> > close() is called from sender thread so user will notice something went
> > wrong.
> >
> > Thanks.
> >
> > Jiangjie (Becket) Qin
> >
> > On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> >
> > >Hey Jiangjie,
> > >
> > >I think this is going to be very confusing that
> > >  close(0) waits indefinitely and
> > >  close(-1) waits for 0.
> > >I understand this appears in other apis, but it is a constant cause of
> > >bugs. Let's not repeat that mistake.
> > >
> > >Let's make close(0) wait for 0. We don't need a way to wait indefinitely
> > >as
> > >we already have close() so having a magical constant for that is
> > >redundant.
> > >
> > >Calling close() from the I/O thread was already possible and would block
> > >indefinitely. I think trying to silently change the behavior is probably
> > >not right. I.e. if the user calls close() in the callback there is
> > >actually
> > >some misunderstanding and they need to think more, silently making this
> > >not
> > >block will hide the problem from them which is the opposite of what we
> > >want.
> > >
> > >-Jay
> > >
> > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin <jqin@linkedin.com.invalid
> >
> > >wrote:
> > >
> > >> Hey Joe & Jay,
> > >>
> > >> Thanks for the comments on the voting thread. Since it seems we
> probably
> > >> will have more discussion on this, I am just replying from the
> > >>discussion
> > >> thread here.
> > >> I’ve updated the KIP page to make it less like half-baked, apologize
> for
> > >> the rush...
> > >>
> > >> The contract in current KIP is:
> > >>   1. close() - wait until all requests either are sent or reach
> request
> > >> timeout.
> > >>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
> > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e.
> Wait
> > >> until all requests are sent or reach request timeout
> > >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish sending
> > >>in 5
> > >> milliseconds, if something went wrong, just shutdown the producer
> > >>anyway,
> > >> my callback will handle the failures.
> > >>
> > >> About how we define what timeout value stands for, I actually
> struggled
> > >>a
> > >> little bit when wrote the patch. Intuitively, close(0) should mean
> > >> immediately, however it seems that all the existing java class have
> this
> > >> convention of timeout=0 means no timeout or never timeout
> > >>(Thread.join(0),
> > >> Object.wait(0), etc.) So here the dilemma is either we follow the
> > >> intuition or we follow the convention. What I chose is to follow the
> > >> convention but document the interface to let user be aware of the
> usage.
> > >> The reason is that I think producer.close() is a public interface so
> it
> > >> might be better to follow java convention. Whereas selector is not a
> > >> public interface that used by user, so as long as it makes sense to
> us,
> > >>it
> > >> is less a problem to be different from java convention. That said
> since
> > >> consumer.poll(timeout) is also a public interface, I think it also
> makes
> > >> sense to make producer.close() to have the same definition of
> > >> consumer.poll(timeout).
> > >>
> > >> The main argument for keeping a timeout in close would be separating
> the
> > >> close timeout from request timeout, which probably makes sense. I
> would
> > >> guess typically the request timeout would be long (e.g. 60 seconds)
> > >> because we might want to consider retries with back off time. If we
> have
> > >> multiple batches in accumulator, in worst case that could take up to
> > >> several minutes to complete all the requests. But when we close a
> > >> producer, we might not want to wait for that long as it might cause
> some
> > >> other problem like deployment tool timeout.
> > >>
> > >> There is also a subtle difference between close(timeout) and
> > >> flush(timeout). The only purpose for flush() is to write data to the
> > >> broker, so it makes perfect sense to wait until request timeout. I
> think
> > >> that is why flush(timeout) looks strange. On the other hand, the top
> > >> priority for close() is to close the producer rather than flush()
> data,
> > >>so
> > >> close(timeout) gives guarantee on bounded waiting for its main job.
> > >>
> > >> Sorry for the confusion about forceClose flag. It is not a public
> > >> interface. I mentioned it in Proposed Changes section which I thought
> > >>was
> > >> supposed to provide implementation details.
> > >>
> > >> Thanks again for all the comments and suggestions!
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
> > >>
> > >> >The KIP page has been updated per Jay¹s comments.
> > >> >I¹d like to initiate the voting process if no further comments are
> > >> >received by tomorrow.
> > >> >
> > >> >Jiangjie (Becket) Qin
> > >> >
> > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> > >> >
> > >> >>Hey Jiangjie,
> > >> >>
> > >> >>Can you capture the full motivation and use cases for the feature?
> > >>This
> > >> >>mentions your interest in having a way of aborting from inside the
> > >> >>Callback. But it doesn't really explain that usage or why other
> people
> > >> >>would want to do that. It also doesn't list the primary use case for
> > >> >>having
> > >> >>close with a bounded timeout which was to avoid blocking too long on
> > >> >>shutdown.
> > >> >>
> > >> >>-Jay
> > >> >>
> > >> >>
> > >> >>
> > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> > >><jqin@linkedin.com.invalid
> > >> >
> > >> >>wrote:
> > >> >>
> > >> >>> Hi,
> > >> >>>
> > >> >>> I just created a KIP for adding a close(timeout) to new producer.
> > >>Most
> > >> >>>of
> > >> >>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt
> > >>has
> > >> >>> already done a lot of work.
> > >> >>> Since this is an interface change so we are going through the KIP
> > >> >>>process.
> > >> >>> Here is the KIP link:
> > >> >>>
> > >> >>>
> > >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> > >> >>>2
> > >> >>>
> > >> >>> Thanks.
> > >> >>>
> > >> >>> Jiangjie (Becket) Qin
> > >> >>>
> > >> >
> > >>
> > >>
> >
> >
>



-- 
Thanks,
Neha

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Agree. Since throwing exception when close() is called in callback won’t
work because we are catching all the exceptions from callback, blocking
might be the only option we have here.

Jiangjie (Becket) Qin

On 3/15/15, 11:56 AM, "Jay Kreps" <ja...@gmail.com> wrote:

>Cool.
>
>I think blocking is good or alternately throwing an exception directly
>from
>close(). Basically I would just worry about subtly doing something
>slightly
>different from what the user asked for as it will be hard to notice that
>behavior difference.
>
>-Jay
>
>On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
>wrote:
>
>> Hi Jay,
>>
>> I have modified the KIP as you suggested. I thinks as long as we have
>> consistent define for timeout across Kafka interface, there would be no
>> problem. And I also agree it is better if we can make producer block
>>when
>> close() is called from sender thread so user will notice something went
>> wrong.
>>
>> Thanks.
>>
>> Jiangjie (Becket) Qin
>>
>> On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>>
>> >Hey Jiangjie,
>> >
>> >I think this is going to be very confusing that
>> >  close(0) waits indefinitely and
>> >  close(-1) waits for 0.
>> >I understand this appears in other apis, but it is a constant cause of
>> >bugs. Let's not repeat that mistake.
>> >
>> >Let's make close(0) wait for 0. We don't need a way to wait
>>indefinitely
>> >as
>> >we already have close() so having a magical constant for that is
>> >redundant.
>> >
>> >Calling close() from the I/O thread was already possible and would
>>block
>> >indefinitely. I think trying to silently change the behavior is
>>probably
>> >not right. I.e. if the user calls close() in the callback there is
>> >actually
>> >some misunderstanding and they need to think more, silently making this
>> >not
>> >block will hide the problem from them which is the opposite of what we
>> >want.
>> >
>> >-Jay
>> >
>> >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
>><jq...@linkedin.com.invalid>
>> >wrote:
>> >
>> >> Hey Joe & Jay,
>> >>
>> >> Thanks for the comments on the voting thread. Since it seems we
>>probably
>> >> will have more discussion on this, I am just replying from the
>> >>discussion
>> >> thread here.
>> >> I’ve updated the KIP page to make it less like half-baked, apologize
>>for
>> >> the rush...
>> >>
>> >> The contract in current KIP is:
>> >>   1. close() - wait until all requests either are sent or reach
>>request
>> >> timeout.
>> >>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
>> >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e.
>>Wait
>> >> until all requests are sent or reach request timeout
>> >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish sending
>> >>in 5
>> >> milliseconds, if something went wrong, just shutdown the producer
>> >>anyway,
>> >> my callback will handle the failures.
>> >>
>> >> About how we define what timeout value stands for, I actually
>>struggled
>> >>a
>> >> little bit when wrote the patch. Intuitively, close(0) should mean
>> >> immediately, however it seems that all the existing java class have
>>this
>> >> convention of timeout=0 means no timeout or never timeout
>> >>(Thread.join(0),
>> >> Object.wait(0), etc.) So here the dilemma is either we follow the
>> >> intuition or we follow the convention. What I chose is to follow the
>> >> convention but document the interface to let user be aware of the
>>usage.
>> >> The reason is that I think producer.close() is a public interface so
>>it
>> >> might be better to follow java convention. Whereas selector is not a
>> >> public interface that used by user, so as long as it makes sense to
>>us,
>> >>it
>> >> is less a problem to be different from java convention. That said
>>since
>> >> consumer.poll(timeout) is also a public interface, I think it also
>>makes
>> >> sense to make producer.close() to have the same definition of
>> >> consumer.poll(timeout).
>> >>
>> >> The main argument for keeping a timeout in close would be separating
>>the
>> >> close timeout from request timeout, which probably makes sense. I
>>would
>> >> guess typically the request timeout would be long (e.g. 60 seconds)
>> >> because we might want to consider retries with back off time. If we
>>have
>> >> multiple batches in accumulator, in worst case that could take up to
>> >> several minutes to complete all the requests. But when we close a
>> >> producer, we might not want to wait for that long as it might cause
>>some
>> >> other problem like deployment tool timeout.
>> >>
>> >> There is also a subtle difference between close(timeout) and
>> >> flush(timeout). The only purpose for flush() is to write data to the
>> >> broker, so it makes perfect sense to wait until request timeout. I
>>think
>> >> that is why flush(timeout) looks strange. On the other hand, the top
>> >> priority for close() is to close the producer rather than flush()
>>data,
>> >>so
>> >> close(timeout) gives guarantee on bounded waiting for its main job.
>> >>
>> >> Sorry for the confusion about forceClose flag. It is not a public
>> >> interface. I mentioned it in Proposed Changes section which I thought
>> >>was
>> >> supposed to provide implementation details.
>> >>
>> >> Thanks again for all the comments and suggestions!
>> >>
>> >> Jiangjie (Becket) Qin
>> >>
>> >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>> >>
>> >> >The KIP page has been updated per Jay¹s comments.
>> >> >I¹d like to initiate the voting process if no further comments are
>> >> >received by tomorrow.
>> >> >
>> >> >Jiangjie (Becket) Qin
>> >> >
>> >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>> >> >
>> >> >>Hey Jiangjie,
>> >> >>
>> >> >>Can you capture the full motivation and use cases for the feature?
>> >>This
>> >> >>mentions your interest in having a way of aborting from inside the
>> >> >>Callback. But it doesn't really explain that usage or why other
>>people
>> >> >>would want to do that. It also doesn't list the primary use case
>>for
>> >> >>having
>> >> >>close with a bounded timeout which was to avoid blocking too long
>>on
>> >> >>shutdown.
>> >> >>
>> >> >>-Jay
>> >> >>
>> >> >>
>> >> >>
>> >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
>> >><jqin@linkedin.com.invalid
>> >> >
>> >> >>wrote:
>> >> >>
>> >> >>> Hi,
>> >> >>>
>> >> >>> I just created a KIP for adding a close(timeout) to new producer.
>> >>Most
>> >> >>>of
>> >> >>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt
>> >>has
>> >> >>> already done a lot of work.
>> >> >>> Since this is an interface change so we are going through the KIP
>> >> >>>process.
>> >> >>> Here is the KIP link:
>> >> >>>
>> >> >>>
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>> >> >>>2
>> >> >>>
>> >> >>> Thanks.
>> >> >>>
>> >> >>> Jiangjie (Becket) Qin
>> >> >>>
>> >> >
>> >>
>> >>
>>
>>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jay Kreps <ja...@gmail.com>.

Cool.

I think blocking is good or alternately throwing an exception directly from
close(). Basically I would just worry about subtly doing something slightly
different from what the user asked for as it will be hard to notice that
behavior difference.

-Jay

On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> Hi Jay,
>
> I have modified the KIP as you suggested. I thinks as long as we have
> consistent define for timeout across Kafka interface, there would be no
> problem. And I also agree it is better if we can make producer block when
> close() is called from sender thread so user will notice something went
> wrong.
>
> Thanks.
>
> Jiangjie (Becket) Qin
>
> On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>
> >Hey Jiangjie,
> >
> >I think this is going to be very confusing that
> >  close(0) waits indefinitely and
> >  close(-1) waits for 0.
> >I understand this appears in other apis, but it is a constant cause of
> >bugs. Let's not repeat that mistake.
> >
> >Let's make close(0) wait for 0. We don't need a way to wait indefinitely
> >as
> >we already have close() so having a magical constant for that is
> >redundant.
> >
> >Calling close() from the I/O thread was already possible and would block
> >indefinitely. I think trying to silently change the behavior is probably
> >not right. I.e. if the user calls close() in the callback there is
> >actually
> >some misunderstanding and they need to think more, silently making this
> >not
> >block will hide the problem from them which is the opposite of what we
> >want.
> >
> >-Jay
> >
> >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
> >wrote:
> >
> >> Hey Joe & Jay,
> >>
> >> Thanks for the comments on the voting thread. Since it seems we probably
> >> will have more discussion on this, I am just replying from the
> >>discussion
> >> thread here.
> >> I’ve updated the KIP page to make it less like half-baked, apologize for
> >> the rush...
> >>
> >> The contract in current KIP is:
> >>   1. close() - wait until all requests either are sent or reach request
> >> timeout.
> >>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
> >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e. Wait
> >> until all requests are sent or reach request timeout
> >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish sending
> >>in 5
> >> milliseconds, if something went wrong, just shutdown the producer
> >>anyway,
> >> my callback will handle the failures.
> >>
> >> About how we define what timeout value stands for, I actually struggled
> >>a
> >> little bit when wrote the patch. Intuitively, close(0) should mean
> >> immediately, however it seems that all the existing java class have this
> >> convention of timeout=0 means no timeout or never timeout
> >>(Thread.join(0),
> >> Object.wait(0), etc.) So here the dilemma is either we follow the
> >> intuition or we follow the convention. What I chose is to follow the
> >> convention but document the interface to let user be aware of the usage.
> >> The reason is that I think producer.close() is a public interface so it
> >> might be better to follow java convention. Whereas selector is not a
> >> public interface that used by user, so as long as it makes sense to us,
> >>it
> >> is less a problem to be different from java convention. That said since
> >> consumer.poll(timeout) is also a public interface, I think it also makes
> >> sense to make producer.close() to have the same definition of
> >> consumer.poll(timeout).
> >>
> >> The main argument for keeping a timeout in close would be separating the
> >> close timeout from request timeout, which probably makes sense. I would
> >> guess typically the request timeout would be long (e.g. 60 seconds)
> >> because we might want to consider retries with back off time. If we have
> >> multiple batches in accumulator, in worst case that could take up to
> >> several minutes to complete all the requests. But when we close a
> >> producer, we might not want to wait for that long as it might cause some
> >> other problem like deployment tool timeout.
> >>
> >> There is also a subtle difference between close(timeout) and
> >> flush(timeout). The only purpose for flush() is to write data to the
> >> broker, so it makes perfect sense to wait until request timeout. I think
> >> that is why flush(timeout) looks strange. On the other hand, the top
> >> priority for close() is to close the producer rather than flush() data,
> >>so
> >> close(timeout) gives guarantee on bounded waiting for its main job.
> >>
> >> Sorry for the confusion about forceClose flag. It is not a public
> >> interface. I mentioned it in Proposed Changes section which I thought
> >>was
> >> supposed to provide implementation details.
> >>
> >> Thanks again for all the comments and suggestions!
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
> >>
> >> >The KIP page has been updated per Jay¹s comments.
> >> >I¹d like to initiate the voting process if no further comments are
> >> >received by tomorrow.
> >> >
> >> >Jiangjie (Becket) Qin
> >> >
> >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> >> >
> >> >>Hey Jiangjie,
> >> >>
> >> >>Can you capture the full motivation and use cases for the feature?
> >>This
> >> >>mentions your interest in having a way of aborting from inside the
> >> >>Callback. But it doesn't really explain that usage or why other people
> >> >>would want to do that. It also doesn't list the primary use case for
> >> >>having
> >> >>close with a bounded timeout which was to avoid blocking too long on
> >> >>shutdown.
> >> >>
> >> >>-Jay
> >> >>
> >> >>
> >> >>
> >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> >><jqin@linkedin.com.invalid
> >> >
> >> >>wrote:
> >> >>
> >> >>> Hi,
> >> >>>
> >> >>> I just created a KIP for adding a close(timeout) to new producer.
> >>Most
> >> >>>of
> >> >>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt
> >>has
> >> >>> already done a lot of work.
> >> >>> Since this is an interface change so we are going through the KIP
> >> >>>process.
> >> >>> Here is the KIP link:
> >> >>>
> >> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >> >>>2
> >> >>>
> >> >>> Thanks.
> >> >>>
> >> >>> Jiangjie (Becket) Qin
> >> >>>
> >> >
> >>
> >>
>
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Hi Jay,

I have modified the KIP as you suggested. I thinks as long as we have
consistent define for timeout across Kafka interface, there would be no
problem. And I also agree it is better if we can make producer block when
close() is called from sender thread so user will notice something went
wrong.

Thanks.

Jiangjie (Becket) Qin

On 3/14/15, 11:37 AM, "Jay Kreps" <ja...@gmail.com> wrote:

>Hey Jiangjie,
>
>I think this is going to be very confusing that
>  close(0) waits indefinitely and
>  close(-1) waits for 0.
>I understand this appears in other apis, but it is a constant cause of
>bugs. Let's not repeat that mistake.
>
>Let's make close(0) wait for 0. We don't need a way to wait indefinitely
>as
>we already have close() so having a magical constant for that is
>redundant.
>
>Calling close() from the I/O thread was already possible and would block
>indefinitely. I think trying to silently change the behavior is probably
>not right. I.e. if the user calls close() in the callback there is
>actually
>some misunderstanding and they need to think more, silently making this
>not
>block will hide the problem from them which is the opposite of what we
>want.
>
>-Jay
>
>On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
>wrote:
>
>> Hey Joe & Jay,
>>
>> Thanks for the comments on the voting thread. Since it seems we probably
>> will have more discussion on this, I am just replying from the
>>discussion
>> thread here.
>> I’ve updated the KIP page to make it less like half-baked, apologize for
>> the rush...
>>
>> The contract in current KIP is:
>>   1. close() - wait until all requests either are sent or reach request
>> timeout.
>>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
>>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e. Wait
>> until all requests are sent or reach request timeout
>>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish sending
>>in 5
>> milliseconds, if something went wrong, just shutdown the producer
>>anyway,
>> my callback will handle the failures.
>>
>> About how we define what timeout value stands for, I actually struggled
>>a
>> little bit when wrote the patch. Intuitively, close(0) should mean
>> immediately, however it seems that all the existing java class have this
>> convention of timeout=0 means no timeout or never timeout
>>(Thread.join(0),
>> Object.wait(0), etc.) So here the dilemma is either we follow the
>> intuition or we follow the convention. What I chose is to follow the
>> convention but document the interface to let user be aware of the usage.
>> The reason is that I think producer.close() is a public interface so it
>> might be better to follow java convention. Whereas selector is not a
>> public interface that used by user, so as long as it makes sense to us,
>>it
>> is less a problem to be different from java convention. That said since
>> consumer.poll(timeout) is also a public interface, I think it also makes
>> sense to make producer.close() to have the same definition of
>> consumer.poll(timeout).
>>
>> The main argument for keeping a timeout in close would be separating the
>> close timeout from request timeout, which probably makes sense. I would
>> guess typically the request timeout would be long (e.g. 60 seconds)
>> because we might want to consider retries with back off time. If we have
>> multiple batches in accumulator, in worst case that could take up to
>> several minutes to complete all the requests. But when we close a
>> producer, we might not want to wait for that long as it might cause some
>> other problem like deployment tool timeout.
>>
>> There is also a subtle difference between close(timeout) and
>> flush(timeout). The only purpose for flush() is to write data to the
>> broker, so it makes perfect sense to wait until request timeout. I think
>> that is why flush(timeout) looks strange. On the other hand, the top
>> priority for close() is to close the producer rather than flush() data,
>>so
>> close(timeout) gives guarantee on bounded waiting for its main job.
>>
>> Sorry for the confusion about forceClose flag. It is not a public
>> interface. I mentioned it in Proposed Changes section which I thought
>>was
>> supposed to provide implementation details.
>>
>> Thanks again for all the comments and suggestions!
>>
>> Jiangjie (Becket) Qin
>>
>> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>>
>> >The KIP page has been updated per Jay¹s comments.
>> >I¹d like to initiate the voting process if no further comments are
>> >received by tomorrow.
>> >
>> >Jiangjie (Becket) Qin
>> >
>> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>> >
>> >>Hey Jiangjie,
>> >>
>> >>Can you capture the full motivation and use cases for the feature?
>>This
>> >>mentions your interest in having a way of aborting from inside the
>> >>Callback. But it doesn't really explain that usage or why other people
>> >>would want to do that. It also doesn't list the primary use case for
>> >>having
>> >>close with a bounded timeout which was to avoid blocking too long on
>> >>shutdown.
>> >>
>> >>-Jay
>> >>
>> >>
>> >>
>> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
>><jqin@linkedin.com.invalid
>> >
>> >>wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> I just created a KIP for adding a close(timeout) to new producer.
>>Most
>> >>>of
>> >>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt
>>has
>> >>> already done a lot of work.
>> >>> Since this is an interface change so we are going through the KIP
>> >>>process.
>> >>> Here is the KIP link:
>> >>>
>> >>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>> >>>2
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Jiangjie (Becket) Qin
>> >>>
>> >
>>
>>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jay Kreps <ja...@gmail.com>.

Hey Jiangjie,

I think this is going to be very confusing that
  close(0) waits indefinitely and
  close(-1) waits for 0.
I understand this appears in other apis, but it is a constant cause of
bugs. Let's not repeat that mistake.

Let's make close(0) wait for 0. We don't need a way to wait indefinitely as
we already have close() so having a magical constant for that is redundant.

Calling close() from the I/O thread was already possible and would block
indefinitely. I think trying to silently change the behavior is probably
not right. I.e. if the user calls close() in the callback there is actually
some misunderstanding and they need to think more, silently making this not
block will hide the problem from them which is the opposite of what we want.

-Jay

On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> Hey Joe & Jay,
>
> Thanks for the comments on the voting thread. Since it seems we probably
> will have more discussion on this, I am just replying from the discussion
> thread here.
> I’ve updated the KIP page to make it less like half-baked, apologize for
> the rush...
>
> The contract in current KIP is:
>   1. close() - wait until all requests either are sent or reach request
> timeout.
>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e. Wait
> until all requests are sent or reach request timeout
>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish sending in 5
> milliseconds, if something went wrong, just shutdown the producer anyway,
> my callback will handle the failures.
>
> About how we define what timeout value stands for, I actually struggled a
> little bit when wrote the patch. Intuitively, close(0) should mean
> immediately, however it seems that all the existing java class have this
> convention of timeout=0 means no timeout or never timeout (Thread.join(0),
> Object.wait(0), etc.) So here the dilemma is either we follow the
> intuition or we follow the convention. What I chose is to follow the
> convention but document the interface to let user be aware of the usage.
> The reason is that I think producer.close() is a public interface so it
> might be better to follow java convention. Whereas selector is not a
> public interface that used by user, so as long as it makes sense to us, it
> is less a problem to be different from java convention. That said since
> consumer.poll(timeout) is also a public interface, I think it also makes
> sense to make producer.close() to have the same definition of
> consumer.poll(timeout).
>
> The main argument for keeping a timeout in close would be separating the
> close timeout from request timeout, which probably makes sense. I would
> guess typically the request timeout would be long (e.g. 60 seconds)
> because we might want to consider retries with back off time. If we have
> multiple batches in accumulator, in worst case that could take up to
> several minutes to complete all the requests. But when we close a
> producer, we might not want to wait for that long as it might cause some
> other problem like deployment tool timeout.
>
> There is also a subtle difference between close(timeout) and
> flush(timeout). The only purpose for flush() is to write data to the
> broker, so it makes perfect sense to wait until request timeout. I think
> that is why flush(timeout) looks strange. On the other hand, the top
> priority for close() is to close the producer rather than flush() data, so
> close(timeout) gives guarantee on bounded waiting for its main job.
>
> Sorry for the confusion about forceClose flag. It is not a public
> interface. I mentioned it in Proposed Changes section which I thought was
> supposed to provide implementation details.
>
> Thanks again for all the comments and suggestions!
>
> Jiangjie (Becket) Qin
>
> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>
> >The KIP page has been updated per Jay¹s comments.
> >I¹d like to initiate the voting process if no further comments are
> >received by tomorrow.
> >
> >Jiangjie (Becket) Qin
> >
> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> >
> >>Hey Jiangjie,
> >>
> >>Can you capture the full motivation and use cases for the feature? This
> >>mentions your interest in having a way of aborting from inside the
> >>Callback. But it doesn't really explain that usage or why other people
> >>would want to do that. It also doesn't list the primary use case for
> >>having
> >>close with a bounded timeout which was to avoid blocking too long on
> >>shutdown.
> >>
> >>-Jay
> >>
> >>
> >>
> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin <jqin@linkedin.com.invalid
> >
> >>wrote:
> >>
> >>> Hi,
> >>>
> >>> I just created a KIP for adding a close(timeout) to new producer. Most
> >>>of
> >>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt has
> >>> already done a lot of work.
> >>> Since this is an interface change so we are going through the KIP
> >>>process.
> >>> Here is the KIP link:
> >>>
> >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >>>2
> >>>
> >>> Thanks.
> >>>
> >>> Jiangjie (Becket) Qin
> >>>
> >
>
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Sounds reasonable. I have updated the KIP page accordingly.

On 3/12/15, 10:16 PM, "Guozhang Wang" <wa...@gmail.com> wrote:

>3) I think this is fine.
>4) Hmm, error-message-only may NOT be better than blocking, as with the
>code written with close(>=0), it will likely to just pollute the logs with
>repeating errors until someone gets notified. How about
>error-message-and-close(-1), i.e. record the error and force shutdown?
>
>On Thu, Mar 12, 2015 at 11:50 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
>wrote:
>
>> Hey Guozhang,
>>
>> Thanks for the comments. I updated the page as suggested.
>> For 3), that’s right, I put this in java doc. Do you think we need to
>> reject value other than -1?
>> For 4), I think user will notice this easily because the thread will
>>block
>> and producer is not going to shutdown. About using close(-1) quietly in
>> sender thread when close() is called, my concern is that user might not
>>be
>> aware of that. Maybe we can put an error messages if user call close()
>>in
>> sender thread?
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On 3/12/15, 5:13 AM, "Guozhang Wang" <wa...@gmail.com> wrote:
>>
>> >Hi Becket,
>> >
>> >Some comments on the wiki page:
>> >
>> >1. There are a few typos, for example "multivations", "wiithin".
>> >
>> >2. I think the main motivation could just be "current close() needs to
>> >block on flushing all buffered data, however there are scenarios in
>>which
>> >producers would like to close without blocking on flushing data, or
>>even
>> >close immediately and make sure any buffered data are dropped instead
>>of
>> >sending out." You can probably give some examples for such scenarios.
>> >
>> >3. close(-1, TimeUnit.MILLISECONDS) => from the implementation it seems
>> >any
>> >negative value has the same semantics.
>> >
>> >4. In sender thread only close(-1, TimeUnit.MILLISECONDS) should be
>>called
>> >=> this is not programmatically enforced. Shall we just try to enforce
>>it
>> >via, for example, checking if caller thread is the IO thread, and if
>>yes
>> >just use close(-1)?
>> >
>> >5. Proposed Changes => it seems you only talked about the close(-1)
>>case,
>> >how about close(>=0)?
>> >
>> >Guozhang
>> >
>> >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
>><jq...@linkedin.com.invalid>
>> >wrote:
>> >
>> >> Hey Joe & Jay,
>> >>
>> >> Thanks for the comments on the voting thread. Since it seems we
>>probably
>> >> will have more discussion on this, I am just replying from the
>> >>discussion
>> >> thread here.
>> >> I’ve updated the KIP page to make it less like half-baked, apologize
>>for
>> >> the rush...
>> >>
>> >> The contract in current KIP is:
>> >>   1. close() - wait until all requests either are sent or reach
>>request
>> >> timeout.
>> >>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
>> >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e.
>>Wait
>> >> until all requests are sent or reach request timeout
>> >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish sending
>> >>in 5
>> >> milliseconds, if something went wrong, just shutdown the producer
>> >>anyway,
>> >> my callback will handle the failures.
>> >>
>> >> About how we define what timeout value stands for, I actually
>>struggled
>> >>a
>> >> little bit when wrote the patch. Intuitively, close(0) should mean
>> >> immediately, however it seems that all the existing java class have
>>this
>> >> convention of timeout=0 means no timeout or never timeout
>> >>(Thread.join(0),
>> >> Object.wait(0), etc.) So here the dilemma is either we follow the
>> >> intuition or we follow the convention. What I chose is to follow the
>> >> convention but document the interface to let user be aware of the
>>usage.
>> >> The reason is that I think producer.close() is a public interface so
>>it
>> >> might be better to follow java convention. Whereas selector is not a
>> >> public interface that used by user, so as long as it makes sense to
>>us,
>> >>it
>> >> is less a problem to be different from java convention. That said
>>since
>> >> consumer.poll(timeout) is also a public interface, I think it also
>>makes
>> >> sense to make producer.close() to have the same definition of
>> >> consumer.poll(timeout).
>> >>
>> >> The main argument for keeping a timeout in close would be separating
>>the
>> >> close timeout from request timeout, which probably makes sense. I
>>would
>> >> guess typically the request timeout would be long (e.g. 60 seconds)
>> >> because we might want to consider retries with back off time. If we
>>have
>> >> multiple batches in accumulator, in worst case that could take up to
>> >> several minutes to complete all the requests. But when we close a
>> >> producer, we might not want to wait for that long as it might cause
>>some
>> >> other problem like deployment tool timeout.
>> >>
>> >> There is also a subtle difference between close(timeout) and
>> >> flush(timeout). The only purpose for flush() is to write data to the
>> >> broker, so it makes perfect sense to wait until request timeout. I
>>think
>> >> that is why flush(timeout) looks strange. On the other hand, the top
>> >> priority for close() is to close the producer rather than flush()
>>data,
>> >>so
>> >> close(timeout) gives guarantee on bounded waiting for its main job.
>> >>
>> >> Sorry for the confusion about forceClose flag. It is not a public
>> >> interface. I mentioned it in Proposed Changes section which I thought
>> >>was
>> >> supposed to provide implementation details.
>> >>
>> >> Thanks again for all the comments and suggestions!
>> >>
>> >> Jiangjie (Becket) Qin
>> >>
>> >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>> >>
>> >> >The KIP page has been updated per Jay¹s comments.
>> >> >I¹d like to initiate the voting process if no further comments are
>> >> >received by tomorrow.
>> >> >
>> >> >Jiangjie (Becket) Qin
>> >> >
>> >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>> >> >
>> >> >>Hey Jiangjie,
>> >> >>
>> >> >>Can you capture the full motivation and use cases for the feature?
>> >>This
>> >> >>mentions your interest in having a way of aborting from inside the
>> >> >>Callback. But it doesn't really explain that usage or why other
>>people
>> >> >>would want to do that. It also doesn't list the primary use case
>>for
>> >> >>having
>> >> >>close with a bounded timeout which was to avoid blocking too long
>>on
>> >> >>shutdown.
>> >> >>
>> >> >>-Jay
>> >> >>
>> >> >>
>> >> >>
>> >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
>> >><jqin@linkedin.com.invalid
>> >> >
>> >> >>wrote:
>> >> >>
>> >> >>> Hi,
>> >> >>>
>> >> >>> I just created a KIP for adding a close(timeout) to new producer.
>> >>Most
>> >> >>>of
>> >> >>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt
>> >>has
>> >> >>> already done a lot of work.
>> >> >>> Since this is an interface change so we are going through the KIP
>> >> >>>process.
>> >> >>> Here is the KIP link:
>> >> >>>
>> >> >>>
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>> >> >>>2
>> >> >>>
>> >> >>> Thanks.
>> >> >>>
>> >> >>> Jiangjie (Becket) Qin
>> >> >>>
>> >> >
>> >>
>> >>
>> >
>> >
>> >--
>> >-- Guozhang
>>
>>
>
>
>-- 
>-- Guozhang

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Guozhang Wang <wa...@gmail.com>.

3) I think this is fine.
4) Hmm, error-message-only may NOT be better than blocking, as with the
code written with close(>=0), it will likely to just pollute the logs with
repeating errors until someone gets notified. How about
error-message-and-close(-1), i.e. record the error and force shutdown?

On Thu, Mar 12, 2015 at 11:50 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> Hey Guozhang,
>
> Thanks for the comments. I updated the page as suggested.
> For 3), that’s right, I put this in java doc. Do you think we need to
> reject value other than -1?
> For 4), I think user will notice this easily because the thread will block
> and producer is not going to shutdown. About using close(-1) quietly in
> sender thread when close() is called, my concern is that user might not be
> aware of that. Maybe we can put an error messages if user call close() in
> sender thread?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On 3/12/15, 5:13 AM, "Guozhang Wang" <wa...@gmail.com> wrote:
>
> >Hi Becket,
> >
> >Some comments on the wiki page:
> >
> >1. There are a few typos, for example "multivations", "wiithin".
> >
> >2. I think the main motivation could just be "current close() needs to
> >block on flushing all buffered data, however there are scenarios in which
> >producers would like to close without blocking on flushing data, or even
> >close immediately and make sure any buffered data are dropped instead of
> >sending out." You can probably give some examples for such scenarios.
> >
> >3. close(-1, TimeUnit.MILLISECONDS) => from the implementation it seems
> >any
> >negative value has the same semantics.
> >
> >4. In sender thread only close(-1, TimeUnit.MILLISECONDS) should be called
> >=> this is not programmatically enforced. Shall we just try to enforce it
> >via, for example, checking if caller thread is the IO thread, and if yes
> >just use close(-1)?
> >
> >5. Proposed Changes => it seems you only talked about the close(-1) case,
> >how about close(>=0)?
> >
> >Guozhang
> >
> >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
> >wrote:
> >
> >> Hey Joe & Jay,
> >>
> >> Thanks for the comments on the voting thread. Since it seems we probably
> >> will have more discussion on this, I am just replying from the
> >>discussion
> >> thread here.
> >> I’ve updated the KIP page to make it less like half-baked, apologize for
> >> the rush...
> >>
> >> The contract in current KIP is:
> >>   1. close() - wait until all requests either are sent or reach request
> >> timeout.
> >>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
> >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e. Wait
> >> until all requests are sent or reach request timeout
> >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish sending
> >>in 5
> >> milliseconds, if something went wrong, just shutdown the producer
> >>anyway,
> >> my callback will handle the failures.
> >>
> >> About how we define what timeout value stands for, I actually struggled
> >>a
> >> little bit when wrote the patch. Intuitively, close(0) should mean
> >> immediately, however it seems that all the existing java class have this
> >> convention of timeout=0 means no timeout or never timeout
> >>(Thread.join(0),
> >> Object.wait(0), etc.) So here the dilemma is either we follow the
> >> intuition or we follow the convention. What I chose is to follow the
> >> convention but document the interface to let user be aware of the usage.
> >> The reason is that I think producer.close() is a public interface so it
> >> might be better to follow java convention. Whereas selector is not a
> >> public interface that used by user, so as long as it makes sense to us,
> >>it
> >> is less a problem to be different from java convention. That said since
> >> consumer.poll(timeout) is also a public interface, I think it also makes
> >> sense to make producer.close() to have the same definition of
> >> consumer.poll(timeout).
> >>
> >> The main argument for keeping a timeout in close would be separating the
> >> close timeout from request timeout, which probably makes sense. I would
> >> guess typically the request timeout would be long (e.g. 60 seconds)
> >> because we might want to consider retries with back off time. If we have
> >> multiple batches in accumulator, in worst case that could take up to
> >> several minutes to complete all the requests. But when we close a
> >> producer, we might not want to wait for that long as it might cause some
> >> other problem like deployment tool timeout.
> >>
> >> There is also a subtle difference between close(timeout) and
> >> flush(timeout). The only purpose for flush() is to write data to the
> >> broker, so it makes perfect sense to wait until request timeout. I think
> >> that is why flush(timeout) looks strange. On the other hand, the top
> >> priority for close() is to close the producer rather than flush() data,
> >>so
> >> close(timeout) gives guarantee on bounded waiting for its main job.
> >>
> >> Sorry for the confusion about forceClose flag. It is not a public
> >> interface. I mentioned it in Proposed Changes section which I thought
> >>was
> >> supposed to provide implementation details.
> >>
> >> Thanks again for all the comments and suggestions!
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
> >>
> >> >The KIP page has been updated per Jay¹s comments.
> >> >I¹d like to initiate the voting process if no further comments are
> >> >received by tomorrow.
> >> >
> >> >Jiangjie (Becket) Qin
> >> >
> >> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> >> >
> >> >>Hey Jiangjie,
> >> >>
> >> >>Can you capture the full motivation and use cases for the feature?
> >>This
> >> >>mentions your interest in having a way of aborting from inside the
> >> >>Callback. But it doesn't really explain that usage or why other people
> >> >>would want to do that. It also doesn't list the primary use case for
> >> >>having
> >> >>close with a bounded timeout which was to avoid blocking too long on
> >> >>shutdown.
> >> >>
> >> >>-Jay
> >> >>
> >> >>
> >> >>
> >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> >><jqin@linkedin.com.invalid
> >> >
> >> >>wrote:
> >> >>
> >> >>> Hi,
> >> >>>
> >> >>> I just created a KIP for adding a close(timeout) to new producer.
> >>Most
> >> >>>of
> >> >>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt
> >>has
> >> >>> already done a lot of work.
> >> >>> Since this is an interface change so we are going through the KIP
> >> >>>process.
> >> >>> Here is the KIP link:
> >> >>>
> >> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >> >>>2
> >> >>>
> >> >>> Thanks.
> >> >>>
> >> >>> Jiangjie (Becket) Qin
> >> >>>
> >> >
> >>
> >>
> >
> >
> >--
> >-- Guozhang
>
>


-- 
-- Guozhang

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Hi Jay,

Currently I can see use case for 4).
We’ve seen some application in LinkedIn having issue of undeployment
failure because of timeout, it was because many Kafka consumers were
shutdown at the same time and caused serialized consumer rebalance until
retries are exhausted. I guess in practice normally producer shutdown
won’t take that long, but theoretically it is possible. I remember Andrew
mentioned similar use case in KAFKA-1660. We can also ask him about it.
Maybe it depends on how we implement request timeout but currently I am
not sure whether merely request timeout itself would be effective enough
to limit shutdown time as required.

For 2), I don’t have a real world use case now. But I would argue it is
very likely a valid use case. Imagine in the callback, user wants to do
some stuff (checkpoint maybe?) but failed, in this case user probably want
to stop sending further messages even though the message that triggered
the callback was sent successfully.

What do you think?

Thanks.

Jiangjie (Becket) Qin



On 3/12/15, 8:11 PM, "Jay Kreps" <ja...@gmail.com> wrote:

>Hey Jiangjie,
>
>Thanks for the awesome summary, I think that really captures it. I agree,
>the question is exactly whether there would be a motivational use case we
>can brainstorm for those additional use cases?
>
>-Jay
>
>On Thu, Mar 12, 2015 at 4:24 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
>wrote:
>
>> Hi folks,
>>
>> Since there are actually a long interleaved discussions in KAFKA-1660
>>and
>> KAFKA-1659, I summarized them in the rejected alternatives sections -
>>the
>> section exists for a reason and I shouldn’t have always put None there.
>>.
>> . It looks we took a while to agree upon the current interface.
>>
>> We basically has the following requirements:
>>
>> *Assuming request timeout is available*
>>
>> 1. Need to close a producer forcefully if a message send failed.
>> 2. Need to close a producer forcefully even if no message send failed.
>> 3. Need to close a producer elegantly but with a bounded waiting time.
>> 4. Need to close a producer forcefully after an arbitrary waiting time
>> other than request timeout.
>>
>> The options we have now are ():
>> A. close(timeout) - address all four.
>> B. Close() + abort.on.send.fail -  address 1) and 3)
>> C. abort() + close() - address 1) and 2) and 3)
>>
>> So I think we can narrow the discussion to be whether 2) and 4) are
>>valid
>> use case or not. As soon as we reach conclusion on what requirements
>>need
>> to be met, we would have a conclusion to the solution.
>> I think 2) might be a valid because of some application level logic.
>> I am not sure if 4) is a valid requirement, but form KAFKA-1660, my
>> understanding is it is.
>>
>> Any idea?
>>
>> Jiangjie (Becket) Qin
>>
>>
>> On 3/12/15, 11:50 AM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>>
>> >Hey Guozhang,
>> >
>> >Thanks for the comments. I updated the page as suggested.
>> >For 3), that’s right, I put this in java doc. Do you think we need to
>> >reject value other than -1?
>> >For 4), I think user will notice this easily because the thread will
>>block
>> >and producer is not going to shutdown. About using close(-1) quietly in
>> >sender thread when close() is called, my concern is that user might
>>not be
>> >aware of that. Maybe we can put an error messages if user call close()
>>in
>> >sender thread?
>> >
>> >Thanks,
>> >
>> >Jiangjie (Becket) Qin
>> >
>> >On 3/12/15, 5:13 AM, "Guozhang Wang" <wa...@gmail.com> wrote:
>> >
>> >>Hi Becket,
>> >>
>> >>Some comments on the wiki page:
>> >>
>> >>1. There are a few typos, for example "multivations", "wiithin".
>> >>
>> >>2. I think the main motivation could just be "current close() needs to
>> >>block on flushing all buffered data, however there are scenarios in
>>which
>> >>producers would like to close without blocking on flushing data, or
>>even
>> >>close immediately and make sure any buffered data are dropped instead
>>of
>> >>sending out." You can probably give some examples for such scenarios.
>> >>
>> >>3. close(-1, TimeUnit.MILLISECONDS) => from the implementation it
>>seems
>> >>any
>> >>negative value has the same semantics.
>> >>
>> >>4. In sender thread only close(-1, TimeUnit.MILLISECONDS) should be
>> >>called
>> >>=> this is not programmatically enforced. Shall we just try to
>>enforce it
>> >>via, for example, checking if caller thread is the IO thread, and if
>>yes
>> >>just use close(-1)?
>> >>
>> >>5. Proposed Changes => it seems you only talked about the close(-1)
>>case,
>> >>how about close(>=0)?
>> >>
>> >>Guozhang
>> >>
>> >>On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
>><jqin@linkedin.com.invalid
>> >
>> >>wrote:
>> >>
>> >>> Hey Joe & Jay,
>> >>>
>> >>> Thanks for the comments on the voting thread. Since it seems we
>> >>>probably
>> >>> will have more discussion on this, I am just replying from the
>> >>>discussion
>> >>> thread here.
>> >>> I’ve updated the KIP page to make it less like half-baked, apologize
>> >>>for
>> >>> the rush...
>> >>>
>> >>> The contract in current KIP is:
>> >>>   1. close() - wait until all requests either are sent or reach
>>request
>> >>> timeout.
>> >>>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
>> >>>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e.
>>Wait
>> >>> until all requests are sent or reach request timeout
>> >>>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish
>>sending
>> >>>in 5
>> >>> milliseconds, if something went wrong, just shutdown the producer
>> >>>anyway,
>> >>> my callback will handle the failures.
>> >>>
>> >>> About how we define what timeout value stands for, I actually
>>struggled
>> >>>a
>> >>> little bit when wrote the patch. Intuitively, close(0) should mean
>> >>> immediately, however it seems that all the existing java class have
>> >>>this
>> >>> convention of timeout=0 means no timeout or never timeout
>> >>>(Thread.join(0),
>> >>> Object.wait(0), etc.) So here the dilemma is either we follow the
>> >>> intuition or we follow the convention. What I chose is to follow the
>> >>> convention but document the interface to let user be aware of the
>> >>>usage.
>> >>> The reason is that I think producer.close() is a public interface
>>so it
>> >>> might be better to follow java convention. Whereas selector is not a
>> >>> public interface that used by user, so as long as it makes sense to
>>us,
>> >>>it
>> >>> is less a problem to be different from java convention. That said
>>since
>> >>> consumer.poll(timeout) is also a public interface, I think it also
>> >>>makes
>> >>> sense to make producer.close() to have the same definition of
>> >>> consumer.poll(timeout).
>> >>>
>> >>> The main argument for keeping a timeout in close would be separating
>> >>>the
>> >>> close timeout from request timeout, which probably makes sense. I
>>would
>> >>> guess typically the request timeout would be long (e.g. 60 seconds)
>> >>> because we might want to consider retries with back off time. If we
>> >>>have
>> >>> multiple batches in accumulator, in worst case that could take up to
>> >>> several minutes to complete all the requests. But when we close a
>> >>> producer, we might not want to wait for that long as it might cause
>> >>>some
>> >>> other problem like deployment tool timeout.
>> >>>
>> >>> There is also a subtle difference between close(timeout) and
>> >>> flush(timeout). The only purpose for flush() is to write data to the
>> >>> broker, so it makes perfect sense to wait until request timeout. I
>> >>>think
>> >>> that is why flush(timeout) looks strange. On the other hand, the top
>> >>> priority for close() is to close the producer rather than flush()
>>data,
>> >>>so
>> >>> close(timeout) gives guarantee on bounded waiting for its main job.
>> >>>
>> >>> Sorry for the confusion about forceClose flag. It is not a public
>> >>> interface. I mentioned it in Proposed Changes section which I
>>thought
>> >>>was
>> >>> supposed to provide implementation details.
>> >>>
>> >>> Thanks again for all the comments and suggestions!
>> >>>
>> >>> Jiangjie (Becket) Qin
>> >>>
>> >>> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>> >>>
>> >>> >The KIP page has been updated per Jay¹s comments.
>> >>> >I¹d like to initiate the voting process if no further comments are
>> >>> >received by tomorrow.
>> >>> >
>> >>> >Jiangjie (Becket) Qin
>> >>> >
>> >>> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>> >>> >
>> >>> >>Hey Jiangjie,
>> >>> >>
>> >>> >>Can you capture the full motivation and use cases for the feature?
>> >>>This
>> >>> >>mentions your interest in having a way of aborting from inside the
>> >>> >>Callback. But it doesn't really explain that usage or why other
>> >>>people
>> >>> >>would want to do that. It also doesn't list the primary use case
>>for
>> >>> >>having
>> >>> >>close with a bounded timeout which was to avoid blocking too long
>>on
>> >>> >>shutdown.
>> >>> >>
>> >>> >>-Jay
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
>> >>><jqin@linkedin.com.invalid
>> >>> >
>> >>> >>wrote:
>> >>> >>
>> >>> >>> Hi,
>> >>> >>>
>> >>> >>> I just created a KIP for adding a close(timeout) to new
>>producer.
>> >>>Most
>> >>> >>>of
>> >>> >>> the previous discussions are in KAFKA-1660 where Parth
>>Brahmbhatt
>> >>>has
>> >>> >>> already done a lot of work.
>> >>> >>> Since this is an interface change so we are going through the
>>KIP
>> >>> >>>process.
>> >>> >>> Here is the KIP link:
>> >>> >>>
>> >>> >>>
>> >>>
>> >>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>> >>> >>>2
>> >>> >>>
>> >>> >>> Thanks.
>> >>> >>>
>> >>> >>> Jiangjie (Becket) Qin
>> >>> >>>
>> >>> >
>> >>>
>> >>>
>> >>
>> >>
>> >>--
>> >>-- Guozhang
>> >
>>
>>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jay Kreps <ja...@gmail.com>.

Hey Jiangjie,

Thanks for the awesome summary, I think that really captures it. I agree,
the question is exactly whether there would be a motivational use case we
can brainstorm for those additional use cases?

-Jay

On Thu, Mar 12, 2015 at 4:24 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> Hi folks,
>
> Since there are actually a long interleaved discussions in KAFKA-1660 and
> KAFKA-1659, I summarized them in the rejected alternatives sections - the
> section exists for a reason and I shouldn’t have always put None there. .
> . It looks we took a while to agree upon the current interface.
>
> We basically has the following requirements:
>
> *Assuming request timeout is available*
>
> 1. Need to close a producer forcefully if a message send failed.
> 2. Need to close a producer forcefully even if no message send failed.
> 3. Need to close a producer elegantly but with a bounded waiting time.
> 4. Need to close a producer forcefully after an arbitrary waiting time
> other than request timeout.
>
> The options we have now are ():
> A. close(timeout) - address all four.
> B. Close() + abort.on.send.fail -  address 1) and 3)
> C. abort() + close() - address 1) and 2) and 3)
>
> So I think we can narrow the discussion to be whether 2) and 4) are valid
> use case or not. As soon as we reach conclusion on what requirements need
> to be met, we would have a conclusion to the solution.
> I think 2) might be a valid because of some application level logic.
> I am not sure if 4) is a valid requirement, but form KAFKA-1660, my
> understanding is it is.
>
> Any idea?
>
> Jiangjie (Becket) Qin
>
>
> On 3/12/15, 11:50 AM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>
> >Hey Guozhang,
> >
> >Thanks for the comments. I updated the page as suggested.
> >For 3), that’s right, I put this in java doc. Do you think we need to
> >reject value other than -1?
> >For 4), I think user will notice this easily because the thread will block
> >and producer is not going to shutdown. About using close(-1) quietly in
> >sender thread when close() is called, my concern is that user might not be
> >aware of that. Maybe we can put an error messages if user call close() in
> >sender thread?
> >
> >Thanks,
> >
> >Jiangjie (Becket) Qin
> >
> >On 3/12/15, 5:13 AM, "Guozhang Wang" <wa...@gmail.com> wrote:
> >
> >>Hi Becket,
> >>
> >>Some comments on the wiki page:
> >>
> >>1. There are a few typos, for example "multivations", "wiithin".
> >>
> >>2. I think the main motivation could just be "current close() needs to
> >>block on flushing all buffered data, however there are scenarios in which
> >>producers would like to close without blocking on flushing data, or even
> >>close immediately and make sure any buffered data are dropped instead of
> >>sending out." You can probably give some examples for such scenarios.
> >>
> >>3. close(-1, TimeUnit.MILLISECONDS) => from the implementation it seems
> >>any
> >>negative value has the same semantics.
> >>
> >>4. In sender thread only close(-1, TimeUnit.MILLISECONDS) should be
> >>called
> >>=> this is not programmatically enforced. Shall we just try to enforce it
> >>via, for example, checking if caller thread is the IO thread, and if yes
> >>just use close(-1)?
> >>
> >>5. Proposed Changes => it seems you only talked about the close(-1) case,
> >>how about close(>=0)?
> >>
> >>Guozhang
> >>
> >>On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin <jqin@linkedin.com.invalid
> >
> >>wrote:
> >>
> >>> Hey Joe & Jay,
> >>>
> >>> Thanks for the comments on the voting thread. Since it seems we
> >>>probably
> >>> will have more discussion on this, I am just replying from the
> >>>discussion
> >>> thread here.
> >>> I’ve updated the KIP page to make it less like half-baked, apologize
> >>>for
> >>> the rush...
> >>>
> >>> The contract in current KIP is:
> >>>   1. close() - wait until all requests either are sent or reach request
> >>> timeout.
> >>>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
> >>>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e. Wait
> >>> until all requests are sent or reach request timeout
> >>>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish sending
> >>>in 5
> >>> milliseconds, if something went wrong, just shutdown the producer
> >>>anyway,
> >>> my callback will handle the failures.
> >>>
> >>> About how we define what timeout value stands for, I actually struggled
> >>>a
> >>> little bit when wrote the patch. Intuitively, close(0) should mean
> >>> immediately, however it seems that all the existing java class have
> >>>this
> >>> convention of timeout=0 means no timeout or never timeout
> >>>(Thread.join(0),
> >>> Object.wait(0), etc.) So here the dilemma is either we follow the
> >>> intuition or we follow the convention. What I chose is to follow the
> >>> convention but document the interface to let user be aware of the
> >>>usage.
> >>> The reason is that I think producer.close() is a public interface so it
> >>> might be better to follow java convention. Whereas selector is not a
> >>> public interface that used by user, so as long as it makes sense to us,
> >>>it
> >>> is less a problem to be different from java convention. That said since
> >>> consumer.poll(timeout) is also a public interface, I think it also
> >>>makes
> >>> sense to make producer.close() to have the same definition of
> >>> consumer.poll(timeout).
> >>>
> >>> The main argument for keeping a timeout in close would be separating
> >>>the
> >>> close timeout from request timeout, which probably makes sense. I would
> >>> guess typically the request timeout would be long (e.g. 60 seconds)
> >>> because we might want to consider retries with back off time. If we
> >>>have
> >>> multiple batches in accumulator, in worst case that could take up to
> >>> several minutes to complete all the requests. But when we close a
> >>> producer, we might not want to wait for that long as it might cause
> >>>some
> >>> other problem like deployment tool timeout.
> >>>
> >>> There is also a subtle difference between close(timeout) and
> >>> flush(timeout). The only purpose for flush() is to write data to the
> >>> broker, so it makes perfect sense to wait until request timeout. I
> >>>think
> >>> that is why flush(timeout) looks strange. On the other hand, the top
> >>> priority for close() is to close the producer rather than flush() data,
> >>>so
> >>> close(timeout) gives guarantee on bounded waiting for its main job.
> >>>
> >>> Sorry for the confusion about forceClose flag. It is not a public
> >>> interface. I mentioned it in Proposed Changes section which I thought
> >>>was
> >>> supposed to provide implementation details.
> >>>
> >>> Thanks again for all the comments and suggestions!
> >>>
> >>> Jiangjie (Becket) Qin
> >>>
> >>> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
> >>>
> >>> >The KIP page has been updated per Jay¹s comments.
> >>> >I¹d like to initiate the voting process if no further comments are
> >>> >received by tomorrow.
> >>> >
> >>> >Jiangjie (Becket) Qin
> >>> >
> >>> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> >>> >
> >>> >>Hey Jiangjie,
> >>> >>
> >>> >>Can you capture the full motivation and use cases for the feature?
> >>>This
> >>> >>mentions your interest in having a way of aborting from inside the
> >>> >>Callback. But it doesn't really explain that usage or why other
> >>>people
> >>> >>would want to do that. It also doesn't list the primary use case for
> >>> >>having
> >>> >>close with a bounded timeout which was to avoid blocking too long on
> >>> >>shutdown.
> >>> >>
> >>> >>-Jay
> >>> >>
> >>> >>
> >>> >>
> >>> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> >>><jqin@linkedin.com.invalid
> >>> >
> >>> >>wrote:
> >>> >>
> >>> >>> Hi,
> >>> >>>
> >>> >>> I just created a KIP for adding a close(timeout) to new producer.
> >>>Most
> >>> >>>of
> >>> >>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt
> >>>has
> >>> >>> already done a lot of work.
> >>> >>> Since this is an interface change so we are going through the KIP
> >>> >>>process.
> >>> >>> Here is the KIP link:
> >>> >>>
> >>> >>>
> >>>
> >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >>> >>>2
> >>> >>>
> >>> >>> Thanks.
> >>> >>>
> >>> >>> Jiangjie (Becket) Qin
> >>> >>>
> >>> >
> >>>
> >>>
> >>
> >>
> >>--
> >>-- Guozhang
> >
>
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Hi folks,

Since there are actually a long interleaved discussions in KAFKA-1660 and
KAFKA-1659, I summarized them in the rejected alternatives sections - the
section exists for a reason and I shouldn’t have always put None there. .
. It looks we took a while to agree upon the current interface.

We basically has the following requirements:

*Assuming request timeout is available*

1. Need to close a producer forcefully if a message send failed.
2. Need to close a producer forcefully even if no message send failed.
3. Need to close a producer elegantly but with a bounded waiting time.
4. Need to close a producer forcefully after an arbitrary waiting time
other than request timeout.

The options we have now are ():
A. close(timeout) - address all four.
B. Close() + abort.on.send.fail -  address 1) and 3)
C. abort() + close() - address 1) and 2) and 3)

So I think we can narrow the discussion to be whether 2) and 4) are valid
use case or not. As soon as we reach conclusion on what requirements need
to be met, we would have a conclusion to the solution.
I think 2) might be a valid because of some application level logic.
I am not sure if 4) is a valid requirement, but form KAFKA-1660, my
understanding is it is.

Any idea?

Jiangjie (Becket) Qin


On 3/12/15, 11:50 AM, "Jiangjie Qin" <jq...@linkedin.com> wrote:

>Hey Guozhang,
>
>Thanks for the comments. I updated the page as suggested.
>For 3), that’s right, I put this in java doc. Do you think we need to
>reject value other than -1?
>For 4), I think user will notice this easily because the thread will block
>and producer is not going to shutdown. About using close(-1) quietly in
>sender thread when close() is called, my concern is that user might not be
>aware of that. Maybe we can put an error messages if user call close() in
>sender thread?
>
>Thanks,
>
>Jiangjie (Becket) Qin
>
>On 3/12/15, 5:13 AM, "Guozhang Wang" <wa...@gmail.com> wrote:
>
>>Hi Becket,
>>
>>Some comments on the wiki page:
>>
>>1. There are a few typos, for example "multivations", "wiithin".
>>
>>2. I think the main motivation could just be "current close() needs to
>>block on flushing all buffered data, however there are scenarios in which
>>producers would like to close without blocking on flushing data, or even
>>close immediately and make sure any buffered data are dropped instead of
>>sending out." You can probably give some examples for such scenarios.
>>
>>3. close(-1, TimeUnit.MILLISECONDS) => from the implementation it seems
>>any
>>negative value has the same semantics.
>>
>>4. In sender thread only close(-1, TimeUnit.MILLISECONDS) should be
>>called
>>=> this is not programmatically enforced. Shall we just try to enforce it
>>via, for example, checking if caller thread is the IO thread, and if yes
>>just use close(-1)?
>>
>>5. Proposed Changes => it seems you only talked about the close(-1) case,
>>how about close(>=0)?
>>
>>Guozhang
>>
>>On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
>>wrote:
>>
>>> Hey Joe & Jay,
>>>
>>> Thanks for the comments on the voting thread. Since it seems we
>>>probably
>>> will have more discussion on this, I am just replying from the
>>>discussion
>>> thread here.
>>> I’ve updated the KIP page to make it less like half-baked, apologize
>>>for
>>> the rush...
>>>
>>> The contract in current KIP is:
>>>   1. close() - wait until all requests either are sent or reach request
>>> timeout.
>>>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
>>>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e. Wait
>>> until all requests are sent or reach request timeout
>>>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish sending
>>>in 5
>>> milliseconds, if something went wrong, just shutdown the producer
>>>anyway,
>>> my callback will handle the failures.
>>>
>>> About how we define what timeout value stands for, I actually struggled
>>>a
>>> little bit when wrote the patch. Intuitively, close(0) should mean
>>> immediately, however it seems that all the existing java class have
>>>this
>>> convention of timeout=0 means no timeout or never timeout
>>>(Thread.join(0),
>>> Object.wait(0), etc.) So here the dilemma is either we follow the
>>> intuition or we follow the convention. What I chose is to follow the
>>> convention but document the interface to let user be aware of the
>>>usage.
>>> The reason is that I think producer.close() is a public interface so it
>>> might be better to follow java convention. Whereas selector is not a
>>> public interface that used by user, so as long as it makes sense to us,
>>>it
>>> is less a problem to be different from java convention. That said since
>>> consumer.poll(timeout) is also a public interface, I think it also
>>>makes
>>> sense to make producer.close() to have the same definition of
>>> consumer.poll(timeout).
>>>
>>> The main argument for keeping a timeout in close would be separating
>>>the
>>> close timeout from request timeout, which probably makes sense. I would
>>> guess typically the request timeout would be long (e.g. 60 seconds)
>>> because we might want to consider retries with back off time. If we
>>>have
>>> multiple batches in accumulator, in worst case that could take up to
>>> several minutes to complete all the requests. But when we close a
>>> producer, we might not want to wait for that long as it might cause
>>>some
>>> other problem like deployment tool timeout.
>>>
>>> There is also a subtle difference between close(timeout) and
>>> flush(timeout). The only purpose for flush() is to write data to the
>>> broker, so it makes perfect sense to wait until request timeout. I
>>>think
>>> that is why flush(timeout) looks strange. On the other hand, the top
>>> priority for close() is to close the producer rather than flush() data,
>>>so
>>> close(timeout) gives guarantee on bounded waiting for its main job.
>>>
>>> Sorry for the confusion about forceClose flag. It is not a public
>>> interface. I mentioned it in Proposed Changes section which I thought
>>>was
>>> supposed to provide implementation details.
>>>
>>> Thanks again for all the comments and suggestions!
>>>
>>> Jiangjie (Becket) Qin
>>>
>>> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>>>
>>> >The KIP page has been updated per Jay¹s comments.
>>> >I¹d like to initiate the voting process if no further comments are
>>> >received by tomorrow.
>>> >
>>> >Jiangjie (Becket) Qin
>>> >
>>> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>>> >
>>> >>Hey Jiangjie,
>>> >>
>>> >>Can you capture the full motivation and use cases for the feature?
>>>This
>>> >>mentions your interest in having a way of aborting from inside the
>>> >>Callback. But it doesn't really explain that usage or why other
>>>people
>>> >>would want to do that. It also doesn't list the primary use case for
>>> >>having
>>> >>close with a bounded timeout which was to avoid blocking too long on
>>> >>shutdown.
>>> >>
>>> >>-Jay
>>> >>
>>> >>
>>> >>
>>> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
>>><jqin@linkedin.com.invalid
>>> >
>>> >>wrote:
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> I just created a KIP for adding a close(timeout) to new producer.
>>>Most
>>> >>>of
>>> >>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt
>>>has
>>> >>> already done a lot of work.
>>> >>> Since this is an interface change so we are going through the KIP
>>> >>>process.
>>> >>> Here is the KIP link:
>>> >>>
>>> >>>
>>> 
>>>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>>> >>>2
>>> >>>
>>> >>> Thanks.
>>> >>>
>>> >>> Jiangjie (Becket) Qin
>>> >>>
>>> >
>>>
>>>
>>
>>
>>-- 
>>-- Guozhang
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Hey Guozhang,

Thanks for the comments. I updated the page as suggested.
For 3), that’s right, I put this in java doc. Do you think we need to
reject value other than -1?
For 4), I think user will notice this easily because the thread will block
and producer is not going to shutdown. About using close(-1) quietly in
sender thread when close() is called, my concern is that user might not be
aware of that. Maybe we can put an error messages if user call close() in
sender thread?

Thanks,

Jiangjie (Becket) Qin

On 3/12/15, 5:13 AM, "Guozhang Wang" <wa...@gmail.com> wrote:

>Hi Becket,
>
>Some comments on the wiki page:
>
>1. There are a few typos, for example "multivations", "wiithin".
>
>2. I think the main motivation could just be "current close() needs to
>block on flushing all buffered data, however there are scenarios in which
>producers would like to close without blocking on flushing data, or even
>close immediately and make sure any buffered data are dropped instead of
>sending out." You can probably give some examples for such scenarios.
>
>3. close(-1, TimeUnit.MILLISECONDS) => from the implementation it seems
>any
>negative value has the same semantics.
>
>4. In sender thread only close(-1, TimeUnit.MILLISECONDS) should be called
>=> this is not programmatically enforced. Shall we just try to enforce it
>via, for example, checking if caller thread is the IO thread, and if yes
>just use close(-1)?
>
>5. Proposed Changes => it seems you only talked about the close(-1) case,
>how about close(>=0)?
>
>Guozhang
>
>On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
>wrote:
>
>> Hey Joe & Jay,
>>
>> Thanks for the comments on the voting thread. Since it seems we probably
>> will have more discussion on this, I am just replying from the
>>discussion
>> thread here.
>> I’ve updated the KIP page to make it less like half-baked, apologize for
>> the rush...
>>
>> The contract in current KIP is:
>>   1. close() - wait until all requests either are sent or reach request
>> timeout.
>>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
>>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e. Wait
>> until all requests are sent or reach request timeout
>>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish sending
>>in 5
>> milliseconds, if something went wrong, just shutdown the producer
>>anyway,
>> my callback will handle the failures.
>>
>> About how we define what timeout value stands for, I actually struggled
>>a
>> little bit when wrote the patch. Intuitively, close(0) should mean
>> immediately, however it seems that all the existing java class have this
>> convention of timeout=0 means no timeout or never timeout
>>(Thread.join(0),
>> Object.wait(0), etc.) So here the dilemma is either we follow the
>> intuition or we follow the convention. What I chose is to follow the
>> convention but document the interface to let user be aware of the usage.
>> The reason is that I think producer.close() is a public interface so it
>> might be better to follow java convention. Whereas selector is not a
>> public interface that used by user, so as long as it makes sense to us,
>>it
>> is less a problem to be different from java convention. That said since
>> consumer.poll(timeout) is also a public interface, I think it also makes
>> sense to make producer.close() to have the same definition of
>> consumer.poll(timeout).
>>
>> The main argument for keeping a timeout in close would be separating the
>> close timeout from request timeout, which probably makes sense. I would
>> guess typically the request timeout would be long (e.g. 60 seconds)
>> because we might want to consider retries with back off time. If we have
>> multiple batches in accumulator, in worst case that could take up to
>> several minutes to complete all the requests. But when we close a
>> producer, we might not want to wait for that long as it might cause some
>> other problem like deployment tool timeout.
>>
>> There is also a subtle difference between close(timeout) and
>> flush(timeout). The only purpose for flush() is to write data to the
>> broker, so it makes perfect sense to wait until request timeout. I think
>> that is why flush(timeout) looks strange. On the other hand, the top
>> priority for close() is to close the producer rather than flush() data,
>>so
>> close(timeout) gives guarantee on bounded waiting for its main job.
>>
>> Sorry for the confusion about forceClose flag. It is not a public
>> interface. I mentioned it in Proposed Changes section which I thought
>>was
>> supposed to provide implementation details.
>>
>> Thanks again for all the comments and suggestions!
>>
>> Jiangjie (Becket) Qin
>>
>> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>>
>> >The KIP page has been updated per Jay¹s comments.
>> >I¹d like to initiate the voting process if no further comments are
>> >received by tomorrow.
>> >
>> >Jiangjie (Becket) Qin
>> >
>> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>> >
>> >>Hey Jiangjie,
>> >>
>> >>Can you capture the full motivation and use cases for the feature?
>>This
>> >>mentions your interest in having a way of aborting from inside the
>> >>Callback. But it doesn't really explain that usage or why other people
>> >>would want to do that. It also doesn't list the primary use case for
>> >>having
>> >>close with a bounded timeout which was to avoid blocking too long on
>> >>shutdown.
>> >>
>> >>-Jay
>> >>
>> >>
>> >>
>> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
>><jqin@linkedin.com.invalid
>> >
>> >>wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> I just created a KIP for adding a close(timeout) to new producer.
>>Most
>> >>>of
>> >>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt
>>has
>> >>> already done a lot of work.
>> >>> Since this is an interface change so we are going through the KIP
>> >>>process.
>> >>> Here is the KIP link:
>> >>>
>> >>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>> >>>2
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Jiangjie (Becket) Qin
>> >>>
>> >
>>
>>
>
>
>-- 
>-- Guozhang

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Guozhang Wang <wa...@gmail.com>.

Hi Becket,

Some comments on the wiki page:

1. There are a few typos, for example "multivations", "wiithin".

2. I think the main motivation could just be "current close() needs to
block on flushing all buffered data, however there are scenarios in which
producers would like to close without blocking on flushing data, or even
close immediately and make sure any buffered data are dropped instead of
sending out." You can probably give some examples for such scenarios.

3. close(-1, TimeUnit.MILLISECONDS) => from the implementation it seems any
negative value has the same semantics.

4. In sender thread only close(-1, TimeUnit.MILLISECONDS) should be called
=> this is not programmatically enforced. Shall we just try to enforce it
via, for example, checking if caller thread is the IO thread, and if yes
just use close(-1)?

5. Proposed Changes => it seems you only talked about the close(-1) case,
how about close(>=0)?

Guozhang

On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> Hey Joe & Jay,
>
> Thanks for the comments on the voting thread. Since it seems we probably
> will have more discussion on this, I am just replying from the discussion
> thread here.
> I’ve updated the KIP page to make it less like half-baked, apologize for
> the rush...
>
> The contract in current KIP is:
>   1. close() - wait until all requests either are sent or reach request
> timeout.
>   2. close(-1, TimeUnit.MILLISECONDS) - close immediately
>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e. Wait
> until all requests are sent or reach request timeout
>   4. close(5, TimeUnit.MILLISECONDS) - try the best to finish sending in 5
> milliseconds, if something went wrong, just shutdown the producer anyway,
> my callback will handle the failures.
>
> About how we define what timeout value stands for, I actually struggled a
> little bit when wrote the patch. Intuitively, close(0) should mean
> immediately, however it seems that all the existing java class have this
> convention of timeout=0 means no timeout or never timeout (Thread.join(0),
> Object.wait(0), etc.) So here the dilemma is either we follow the
> intuition or we follow the convention. What I chose is to follow the
> convention but document the interface to let user be aware of the usage.
> The reason is that I think producer.close() is a public interface so it
> might be better to follow java convention. Whereas selector is not a
> public interface that used by user, so as long as it makes sense to us, it
> is less a problem to be different from java convention. That said since
> consumer.poll(timeout) is also a public interface, I think it also makes
> sense to make producer.close() to have the same definition of
> consumer.poll(timeout).
>
> The main argument for keeping a timeout in close would be separating the
> close timeout from request timeout, which probably makes sense. I would
> guess typically the request timeout would be long (e.g. 60 seconds)
> because we might want to consider retries with back off time. If we have
> multiple batches in accumulator, in worst case that could take up to
> several minutes to complete all the requests. But when we close a
> producer, we might not want to wait for that long as it might cause some
> other problem like deployment tool timeout.
>
> There is also a subtle difference between close(timeout) and
> flush(timeout). The only purpose for flush() is to write data to the
> broker, so it makes perfect sense to wait until request timeout. I think
> that is why flush(timeout) looks strange. On the other hand, the top
> priority for close() is to close the producer rather than flush() data, so
> close(timeout) gives guarantee on bounded waiting for its main job.
>
> Sorry for the confusion about forceClose flag. It is not a public
> interface. I mentioned it in Proposed Changes section which I thought was
> supposed to provide implementation details.
>
> Thanks again for all the comments and suggestions!
>
> Jiangjie (Becket) Qin
>
> On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:
>
> >The KIP page has been updated per Jay¹s comments.
> >I¹d like to initiate the voting process if no further comments are
> >received by tomorrow.
> >
> >Jiangjie (Becket) Qin
> >
> >On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
> >
> >>Hey Jiangjie,
> >>
> >>Can you capture the full motivation and use cases for the feature? This
> >>mentions your interest in having a way of aborting from inside the
> >>Callback. But it doesn't really explain that usage or why other people
> >>would want to do that. It also doesn't list the primary use case for
> >>having
> >>close with a bounded timeout which was to avoid blocking too long on
> >>shutdown.
> >>
> >>-Jay
> >>
> >>
> >>
> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin <jqin@linkedin.com.invalid
> >
> >>wrote:
> >>
> >>> Hi,
> >>>
> >>> I just created a KIP for adding a close(timeout) to new producer. Most
> >>>of
> >>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt has
> >>> already done a lot of work.
> >>> Since this is an interface change so we are going through the KIP
> >>>process.
> >>> Here is the KIP link:
> >>>
> >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >>>2
> >>>
> >>> Thanks.
> >>>
> >>> Jiangjie (Becket) Qin
> >>>
> >
>
>


-- 
-- Guozhang

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

Hey Joe & Jay,

Thanks for the comments on the voting thread. Since it seems we probably
will have more discussion on this, I am just replying from the discussion
thread here.
I’ve updated the KIP page to make it less like half-baked, apologize for
the rush...

The contract in current KIP is:
  1. close() - wait until all requests either are sent or reach request
timeout.
  2. close(-1, TimeUnit.MILLISECONDS) - close immediately
  3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), i.e. Wait
until all requests are sent or reach request timeout
  4. close(5, TimeUnit.MILLISECONDS) - try the best to finish sending in 5
milliseconds, if something went wrong, just shutdown the producer anyway,
my callback will handle the failures.

About how we define what timeout value stands for, I actually struggled a
little bit when wrote the patch. Intuitively, close(0) should mean
immediately, however it seems that all the existing java class have this
convention of timeout=0 means no timeout or never timeout (Thread.join(0),
Object.wait(0), etc.) So here the dilemma is either we follow the
intuition or we follow the convention. What I chose is to follow the
convention but document the interface to let user be aware of the usage.
The reason is that I think producer.close() is a public interface so it
might be better to follow java convention. Whereas selector is not a
public interface that used by user, so as long as it makes sense to us, it
is less a problem to be different from java convention. That said since
consumer.poll(timeout) is also a public interface, I think it also makes
sense to make producer.close() to have the same definition of
consumer.poll(timeout).

The main argument for keeping a timeout in close would be separating the
close timeout from request timeout, which probably makes sense. I would
guess typically the request timeout would be long (e.g. 60 seconds)
because we might want to consider retries with back off time. If we have
multiple batches in accumulator, in worst case that could take up to
several minutes to complete all the requests. But when we close a
producer, we might not want to wait for that long as it might cause some
other problem like deployment tool timeout.

There is also a subtle difference between close(timeout) and
flush(timeout). The only purpose for flush() is to write data to the
broker, so it makes perfect sense to wait until request timeout. I think
that is why flush(timeout) looks strange. On the other hand, the top
priority for close() is to close the producer rather than flush() data, so
close(timeout) gives guarantee on bounded waiting for its main job.

Sorry for the confusion about forceClose flag. It is not a public
interface. I mentioned it in Proposed Changes section which I thought was
supposed to provide implementation details.

Thanks again for all the comments and suggestions!

Jiangjie (Becket) Qin

On 3/10/15, 8:57 PM, "Jiangjie Qin" <jq...@linkedin.com> wrote:

>The KIP page has been updated per Jay¹s comments.
>I¹d like to initiate the voting process if no further comments are
>received by tomorrow.
>
>Jiangjie (Becket) Qin
>
>On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:
>
>>Hey Jiangjie,
>>
>>Can you capture the full motivation and use cases for the feature? This
>>mentions your interest in having a way of aborting from inside the
>>Callback. But it doesn't really explain that usage or why other people
>>would want to do that. It also doesn't list the primary use case for
>>having
>>close with a bounded timeout which was to avoid blocking too long on
>>shutdown.
>>
>>-Jay
>>
>>
>>
>>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
>>wrote:
>>
>>> Hi,
>>>
>>> I just created a KIP for adding a close(timeout) to new producer. Most
>>>of
>>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt has
>>> already done a lot of work.
>>> Since this is an interface change so we are going through the KIP
>>>process.
>>> Here is the KIP link:
>>> 
>>>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
>>>2
>>>
>>> Thanks.
>>>
>>> Jiangjie (Becket) Qin
>>>
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jiangjie Qin <jq...@linkedin.com.INVALID>.

The KIP page has been updated per Jay¹s comments.
I¹d like to initiate the voting process if no further comments are
received by tomorrow.

Jiangjie (Becket) Qin

On 3/8/15, 9:45 AM, "Jay Kreps" <ja...@gmail.com> wrote:

>Hey Jiangjie,
>
>Can you capture the full motivation and use cases for the feature? This
>mentions your interest in having a way of aborting from inside the
>Callback. But it doesn't really explain that usage or why other people
>would want to do that. It also doesn't list the primary use case for
>having
>close with a bounded timeout which was to avoid blocking too long on
>shutdown.
>
>-Jay
>
>
>
>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
>wrote:
>
>> Hi,
>>
>> I just created a KIP for adding a close(timeout) to new producer. Most
>>of
>> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt has
>> already done a lot of work.
>> Since this is an interface change so we are going through the KIP
>>process.
>> Here is the KIP link:
>> 
>>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53739782
>>
>> Thanks.
>>
>> Jiangjie (Becket) Qin
>>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Posted by Jay Kreps <ja...@gmail.com>.

Hey Jiangjie,

Can you capture the full motivation and use cases for the feature? This
mentions your interest in having a way of aborting from inside the
Callback. But it doesn't really explain that usage or why other people
would want to do that. It also doesn't list the primary use case for having
close with a bounded timeout which was to avoid blocking too long on
shutdown.

-Jay

On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin <jq...@linkedin.com.invalid>
wrote:

> Hi,
>
> I just created a KIP for adding a close(timeout) to new producer. Most of
> the previous discussions are in KAFKA-1660 where Parth Brahmbhatt has
> already done a lot of work.
> Since this is an interface change so we are going through the KIP process.
> Here is the KIP link:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53739782
>
> Thanks.
>
> Jiangjie (Becket) Qin
>