You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Matt Farmer <ma...@frmr.me> on 2017/10/18 21:40:28 UTC

[DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Hello everyone,

This is the discussion thread for the KIP that I just filed here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce

Looking forward to getting some feedback from folks about this idea and
working toward a solution we can contribute back. :)

Cheers,
Matt Farmer

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Makes sense. I created a JIRA for it:

https://issues.apache.org/jira/browse/KAFKA-6376

@Matt: feel free to pick it up if you are interested
(or anybody else :))


-Matthias

On 12/13/17 4:07 PM, Guozhang Wang wrote:
> Metrics: this is a good point.
> 
> Note that currently we have two metrics for `skipped-records` on different
> levels:
> 
> 1) on the highest level, the thread-level, we have a `skipped-records`,
> that records all the skipped records due to deserialization errors.
> 2) on the lower processor-node level, we have a
> `skippedDueToDeserializationError`, that records the skipped records on
> that specific source node due to deserialization errors.
> 
> 
> So you can see that 1) does not cover any other scenarios and can just be
> thought of as an aggregate of 2) across all the tasks' source nodes.
> However, there are other places that can cause a record to be dropped, for
> example:
> 
> 1) https://issues.apache.org/jira/browse/KAFKA-5784: records could be
> dropped due to window elapsed.
> 2) KIP-210: records could be dropped on the producer side.
> 3) records could be dropped during user-customized processing on errors.
> 
> 
> I think improving the skipped records of all these scenarios itself worth
> having another KIP; so I'd suggest we do not drag this KIP-210 into this.
> 
> 
> Guozhang
> 
> 
> On Wed, Dec 13, 2017 at 3:45 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
> 
>> One more after thought: should we add a metric for this? We also have a
>> metric for `skippedDueToDeserializationError-rate` ?
>>
>>
>> -Matthias
>>
>>
>>
>> On 12/6/17 7:54 AM, Bill Bejeck wrote:
>>> Thanks for the clearly written KIP, no further comments from my end.
>>>
>>> -Bill
>>>
>>> On Wed, Dec 6, 2017 at 9:52 AM, Matt Farmer <ma...@frmr.me> wrote:
>>>
>>>> There is already a vote thread for this KIP. I can bump it so that it’s
>>>> towards the top of your inbox.
>>>>
>>>> With regard to your concerns:
>>>>
>>>> 1) We do not have the "ProductionExceptionHandler" interface defined in
>> the
>>>> wiki page, thought it is sort of clear that it is a one-function
>> interface
>>>> with record and exception. Could you add it?
>>>>
>>>>
>>>> It is defined, it’s just not defined using a code snippet. The KIP
>> reads as
>>>> follows:
>>>>
>>>> ===
>>>>
>>>> A public interface named ProductionExceptionHandler with a single
>> method,
>>>> handle, that has the following signature:
>>>>
>>>>    - ProductionExceptionHandlerResponse handle(ProducerRecord<byte[],
>>>>    byte[]> record, Exception exception)
>>>>
>>>>
>>>> ===
>>>>
>>>> If you’d like me to add a code snippet illustrating this that’s simple
>> for
>>>> me to do, but it seemed superfluous.
>>>>
>>>> 2) A quick question about your example code: where would be the "logger"
>>>> object be created?
>>>>
>>>>
>>>> SLF4J loggers are typically created as a class member in the class. Such
>>>> as:
>>>>
>>>> private Logger logger = LoggerFactory.getLogger(HelloWorld.class);
>>>>
>>>> I omit that in my implementation examples for brevity.
>>>>
>>>> On December 6, 2017 at 2:14:58 AM, Guozhang Wang (wangguoz@gmail.com)
>>>> wrote:
>>>>
>>>> Hello Matt,
>>>>
>>>> Thanks for writing up the KIP. I made a pass over it and here is a few
>>>> minor comments. I think you can consider starting a voting thread for
>> this
>>>> KIP while addressing them.
>>>>
>>>> 1) We do not have the "ProductionExceptionHandler" interface defined in
>> the
>>>> wiki page, thought it is sort of clear that it is a one-function
>> interface
>>>> with record and exception. Could you add it?
>>>>
>>>> 2) A quick question about your example code: where would be the "logger"
>>>> object be created? Note that the implementation of this interface have
>> to
>>>> give a non-param constructor, or as a static field of the class but in
>> that
>>>> case you would not be able to log which instance is throwing this error
>> (we
>>>> may have multiple producers within a single instance, even within a
>>>> thread). Just a reminder to consider in your implementation.
>>>>
>>>>
>>>> Guozhang
>>>>
>>>> On Tue, Dec 5, 2017 at 3:15 PM, Matthias J. Sax <ma...@confluent.io>
>>>> wrote:
>>>>
>>>>> Thanks a lot for the update! Great write-up! Very clearly explained
>> what
>>>>> the change will look like!
>>>>>
>>>>> Looks good to me. No further comments from my side.
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>> On 12/5/17 9:14 AM, Matt Farmer wrote:
>>>>>> I have updated this KIP accordingly.
>>>>>>
>>>>>> Can you please take a look and let me know if what I wrote looks
>>>> correct
>>>>> to
>>>>>> you?
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>> 210+-+Provide+for+custom+error+handling++when+Kafka+
>>>>> Streams+fails+to+produce
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>>
>>>>>> On December 4, 2017 at 9:39:13 PM, Matt Farmer (matt@frmr.me) wrote:
>>>>>>
>>>>>> Hey Matthias, thanks for getting back to me.
>>>>>>
>>>>>> That's fine. But if we add it to `test` package, we don't need to talk
>>>>>> about it in the KIP. `test` is not public API.
>>>>>>
>>>>>> Yes, that makes sense. It was in the KIP originally because I was, at
>>>> one
>>>>>> point, planning on including it. We can remove it now that we’ve
>>>> decided
>>>>> we
>>>>>> won’t include it in the public API.
>>>>>>
>>>>>> Understood. That makes sense. We should explain this clearly in the
>> KIP
>>>>>> and maybe log all other following exceptions at DEBUG level?
>>>>>>
>>>>>>
>>>>>> I thought it was clear in the KIP, but I can go back and double check
>>>> my
>>>>>> wording and revise it to try and make it clearer.
>>>>>>
>>>>>> I’ll take a look at doing more work on the KIP and the Pull Request
>>>>>> tomorrow.
>>>>>>
>>>>>> Thanks again!
>>>>>>
>>>>>> On December 4, 2017 at 5:50:33 PM, Matthias J. Sax (
>>>>> matthias@confluent.io)
>>>>>> wrote:
>>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> About your questions:
>>>>>>
>>>>>>>>> Acknowledged, so is ProducerFencedException the only kind of
>>>>> exception I
>>>>>>>>> need to change my behavior on? Or are there other types I need to
>>>>>> check? Is
>>>>>>>>> there a comprehensive list somewhere?
>>>>>>
>>>>>> I cannot think if any other atm. We should list all fatal exceptions
>>>> for
>>>>>> which we don't call the handler and explain why (exception is "global"
>>>>>> and will affect all other records, too | ProducerFenced is
>>>> self-healing).
>>>>>>
>>>>>> We started to collect and categorize exception here (not completed
>>>> yet):
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>>>> Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions
>>>>>> :
>>>>>>
>>>>>> This list should be a good starting point though.
>>>>>>
>>>>>>> I include it in the test package because I have tests that assert
>> that
>>>>> if
>>>>>>> the record collector impl encounters an Exception and receives a
>>>>> CONTINUE
>>>>>>> that it actually does CONTINUE.
>>>>>>
>>>>>> That's fine. But if we add it to `test` package, we don't need to talk
>>>>>> about it in the KIP. `test` is not public API.
>>>>>>
>>>>>>> I didn't want to invoke the handler in places where the CONTINUE or
>>>> FAIL
>>>>>>> result would just be ignored. Presumably, after a FAIL has been
>>>>> returned,
>>>>>>> subsequent exceptions are likely to be repeats or noise from my
>>>>>>> understanding of the code paths. If this is incorrect we can revisit.
>>>>>>
>>>>>> Understood. That makes sense. We should explain this clearly in the
>> KIP
>>>>>> and maybe log all other following exceptions at DEBUG level?
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>> On 12/1/17 11:43 AM, Matt Farmer wrote:
>>>>>>> Bump! It's been three days here and I haven't seen any further
>>>> feedback.
>>>>>>> Eager to get this resolved, approved, and merged. =)
>>>>>>>
>>>>>>> On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <ma...@frmr.me> wrote:
>>>>>>>
>>>>>>>> Hi there, sorry for the delay in responding. Last week had a holiday
>>>>> and
>>>>>>>> several busy work days in it so I'm just now getting around to
>>>>>> responding.
>>>>>>>>
>>>>>>>>> We would only exclude
>>>>>>>>> exception Streams can handle itself (like ProducerFencedException)
>>>> --
>>>>>>>>> thus, if the handler has code to react to this, it would not be
>> bad,
>>>>> as
>>>>>>>>> this code is just never called.
>>>>>>>> [...]
>>>>>>>>> Thus, I am still in favor of not calling the
>>>>> ProductionExceptionHandler
>>>>>>>>> for fatal exception.
>>>>>>>>
>>>>>>>> Acknowledged, so is ProducerFencedException the only kind of
>>>> exception
>>>>> I
>>>>>>>> need to change my behavior on? Or are there other types I need to
>>>>> check?
>>>>>> Is
>>>>>>>> there a comprehensive list somewhere?
>>>>>>>>
>>>>>>>>> About the "always continue" case. Sounds good to me to remove it
>>>> (not
>>>>>>>>> sure why we need it in test package?)
>>>>>>>>
>>>>>>>> I include it in the test package because I have tests that assert
>>>> that
>>>>> if
>>>>>>>> the record collector impl encounters an Exception and receives a
>>>>> CONTINUE
>>>>>>>> that it actually does CONTINUE.
>>>>>>>>
>>>>>>>>> What is there reasoning for invoking the handler only for the first
>>>>>>>>> exception?
>>>>>>>>
>>>>>>>> I didn't want to invoke the handler in places where the CONTINUE or
>>>>> FAIL
>>>>>>>> result would just be ignored. Presumably, after a FAIL has been
>>>>> returned,
>>>>>>>> subsequent exceptions are likely to be repeats or noise from my
>>>>>>>> understanding of the code paths. If this is incorrect we can
>> revisit.
>>>>>>>>
>>>>>>>> Once I get the answers to these questions I can make another pass on
>>>>> the
>>>>>>>> pull request!
>>>>>>>>
>>>>>>>> Matt
>>>>>>>>
>>>>>>>> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <
>>>> matthias@confluent.io
>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for following up!
>>>>>>>>>
>>>>>>>>> One thought about an older reply from you:
>>>>>>>>>
>>>>>>>>>>>>> I strongly disagree here. The purpose of this handler isn't
>>>> *just*
>>>>>> to
>>>>>>>>>>>>> make a decision for streams. There may also be desirable side
>>>>>>>>> effects that
>>>>>>>>>>>>> users wish to cause when production exceptions occur. There may
>>>> be
>>>>>>>>>>>>> side-effects that they wish to cause when
>>>> AuthenticationExceptions
>>>>>>>>> occur,
>>>>>>>>>>>>> as well. We should still give them the hooks to preform those
>>>> side
>>>>>>>>> effects.
>>>>>>>>>
>>>>>>>>> And your follow up:
>>>>>>>>>
>>>>>>>>>>> - I think I would rather invoke it for all exceptions that could
>>>>>>>>> occur
>>>>>>>>>>> from attempting to produce - even those exceptions were returning
>>>>>>>>> CONTINUE
>>>>>>>>>>> may not be a good idea (e.g. Authorization exception). Until
>> there
>>>>>>>>> is a
>>>>>>>>>>> different type for exceptions that are totally fatal (for example
>>>> a
>>>>>>>>>>> FatalStreamsException or some sort), maintaining a list of
>>>>>>>>> exceptions that
>>>>>>>>>>> you can intercept with this handler and exceptions you cannot
>>>> would
>>>>>>>>> be
>>>>>>>>>>> really error-prone and hard to keep correct.
>>>>>>>>>
>>>>>>>>> I understand what you are saying, however, consider that Streams
>>>> needs
>>>>>>>>> to die for a fatal exception. Thus, if you call the handler for a
>>>>> fatal
>>>>>>>>> exception, we would need to ignore the return value and fail --
>> this
>>>>>>>>> seems to be rather intuitive. Furthermore, users can register an
>>>>>>>>> uncaught-exception-handler and side effects for fatal errors can be
>>>>>>>>> triggered there.
>>>>>>>>>
>>>>>>>>> Btw: an AuthorizationException is fatal -- not sure what you mean
>> by
>>>>> an
>>>>>>>>> "totally fatal" exception -- there is no superlative to fatal from
>>>> my
>>>>>>>>> understanding.
>>>>>>>>>
>>>>>>>>> About maintaining a list of exceptions: I don't think this is too
>>>>> hard,
>>>>>>>>> and users also don't need to worry about it IMHO. We would only
>>>>> exclude
>>>>>>>>> exception Streams can handle itself (like ProducerFencedException)
>>>> --
>>>>>>>>> thus, if the handler has code to react to this, it would not be
>> bad,
>>>>> as
>>>>>>>>> this code is just never called. In case that there is an exception
>>>>>>>>> Streams could actually handle but we still call the handler (ie,
>>>> bug),
>>>>>>>>> there should be no harm either -- the handler needs to return
>> either
>>>>>>>>> CONTINUE or FAIL and we would obey. It could only happen, that
>>>> Streams
>>>>>>>>> dies---as request by the user(!)---even if Streams could actually
>>>>> handle
>>>>>>>>> the error and move on. But this seems to be not a issue from my
>>>> point
>>>>> of
>>>>>>>>> view.
>>>>>>>>>
>>>>>>>>> Thus, I am still in favor of not calling the
>>>>> ProductionExceptionHandler
>>>>>>>>> for fatal exception.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> About the "always continue" case. Sounds good to me to remove it
>>>> (not
>>>>>>>>> sure why we need it in test package?) and to rename the "failing"
>>>>>>>>> handler to "Default" (even if "default" is less descriptive and I
>>>>> would
>>>>>>>>> still prefer "Fail" in the name).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Last question:
>>>>>>>>>
>>>>>>>>>>> - Continue to *only* invoke it on the first exception that we
>>>>>>>>>>> encounter (before sendException is set)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What is there reasoning for invoking the handler only for the first
>>>>>>>>> exception?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>> On 11/20/17 10:43 AM, Matt Farmer wrote:
>>>>>>>>>> Alright, here are some updates I'm planning to make after thinking
>>>> on
>>>>>>>>> this
>>>>>>>>>> for awhile:
>>>>>>>>>>
>>>>>>>>>> - Given that the "always continue" handler isn't something I'd
>>>>>>>>> recommend
>>>>>>>>>> for production use as is, I'm going to move it into the test
>>>>>>>>> namespace and
>>>>>>>>>> remove it from mention in the public API.
>>>>>>>>>> - I'm going to rename the "AlwaysFailProductionExceptionHandler"
>>>> to
>>>>>>>>>> "DefaultProductionExceptionHandler"
>>>>>>>>>> - Given that the API for the exception handler involves being
>>>>>>>>> invoked by
>>>>>>>>>> streams to make a decision about whether or not to continue — I
>>>>>>>>> think that
>>>>>>>>>> we should:
>>>>>>>>>> - Continue to *only* invoke it on the first exception that we
>>>>>>>>>> encounter (before sendException is set)
>>>>>>>>>> - Stop invoking it for the self-healing fenced exceptions.
>>>>>>>>>> - I think I would rather invoke it for all exceptions that could
>>>>>>>>> occur
>>>>>>>>>> from attempting to produce - even those exceptions were returning
>>>>>>>>> CONTINUE
>>>>>>>>>> may not be a good idea (e.g. Authorization exception). Until there
>>>>>>>>> is a
>>>>>>>>>> different type for exceptions that are totally fatal (for example
>> a
>>>>>>>>>> FatalStreamsException or some sort), maintaining a list of
>>>>>>>>> exceptions that
>>>>>>>>>> you can intercept with this handler and exceptions you cannot
>> would
>>>>>>>>> be
>>>>>>>>>> really error-prone and hard to keep correct.
>>>>>>>>>> - I'm happy to file a KIP for the creation of this new Exception
>>>>>>>>> type
>>>>>>>>>> if there is interest.
>>>>>>>>>>
>>>>>>>>>> @ Matthias — What do you think about the above?
>>>>>>>>>>
>>>>>>>>>> On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:
>>>>>>>>>>
>>>>>>>>>>> I responded before reading your code review and didn't see the
>> bit
>>>>>>>>> about
>>>>>>>>>>> how ProducerFencedException is self-healing. This error handling
>>>>> logic
>>>>>>>>> is
>>>>>>>>>>> *quite* confusing to reason about... I think I'm going to sit
>> down
>>>>>> with
>>>>>>>>>>> the code a bit more today, but I'm inclined to think that if the
>>>>>> fenced
>>>>>>>>>>> exceptions are, indeed, self healing that we still invoke the
>>>>> handler
>>>>>>>>> but
>>>>>>>>>>> ignore its result for those exceptions.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me>
>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi there,
>>>>>>>>>>>>
>>>>>>>>>>>> Following up here...
>>>>>>>>>>>>
>>>>>>>>>>>>> One tiny comment: I would prefer to remove the "Always" from
>> the
>>>>>>>>>>>> handler implementation names -- it sounds "cleaner" to me
>> without
>>>>> it.
>>>>>>>>>>>>
>>>>>>>>>>>> Let me think on this. I generally prefer expressiveness to
>>>>>> clean-ness,
>>>>>>>>>>>> and I think that comes out in the names I chose for things. The
>>>>> straw
>>>>>>>>> man
>>>>>>>>>>>> in my head is the person that tried to substitute in the
>>>>>>>>> "AlwaysContinue"
>>>>>>>>>>>> variant and the "Always" is a trigger to really consider whether
>>>> or
>>>>>>>>> not
>>>>>>>>>>>> they *always* want to try to continue.
>>>>>>>>>>>>
>>>>>>>>>>>> To be truthful, after some thought, using the "AlwaysContinue"
>>>>>> variant
>>>>>>>>>>>> isn't something I'd recommend generally in a production system.
>>>>>>>>> Ideally
>>>>>>>>>>>> these handlers are targeted at handling a specific series of
>>>>>>>>> exceptions
>>>>>>>>>>>> that a user wants to ignore, and not ignoring all exceptions.
>>>> More
>>>>> on
>>>>>>>>> this
>>>>>>>>>>>> in a minute.
>>>>>>>>>>>>
>>>>>>>>>>>>> For the first category, it seems to not make sense to call the
>>>>>>>>> handle but
>>>>>>>>>>>> Streams should always fail. If we follow this design, the KIP
>>>>> should
>>>>>>>>> list
>>>>>>>>>>>> all exceptions for which the handler is not called.
>>>>>>>>>>>>
>>>>>>>>>>>> I strongly disagree here. The purpose of this handler isn't
>>>> *just*
>>>>> to
>>>>>>>>>>>> make a decision for streams. There may also be desirable side
>>>>> effects
>>>>>>>>> that
>>>>>>>>>>>> users wish to cause when production exceptions occur. There may
>>>> be
>>>>>>>>>>>> side-effects that they wish to cause when
>>>> AuthenticationExceptions
>>>>>>>>> occur,
>>>>>>>>>>>> as well. We should still give them the hooks to preform those
>>>> side
>>>>>>>>> effects.
>>>>>>>>>>>>
>>>>>>>>>>>> In light of the above, I'm thinking that the
>>>>>>>>>>>> "AlwaysContinueProductionExceptionHandler" variant should
>>>>> probably be
>>>>>>>>>>>> removed from the public API and moved into tests since that's
>>>> where
>>>>>>>>> it's
>>>>>>>>>>>> most useful. The more I think on it, the more I feel that having
>>>>> that
>>>>>>>>> in
>>>>>>>>>>>> the public API is a landmine. If you agree, then, we can rename
>>>> the
>>>>>>>>>>>> "AlwaysFailProductionExceptionHandler" to
>>>>>>>>>>>> "DefaultProductionExceptionHandler".
>>>>>>>>>>>>
>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <
>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I just review the PR, and there is one thing we should discuss.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are different types of exceptions that could occur. Some
>>>>> apply
>>>>>>>>> to
>>>>>>>>>>>>> all records (like Authorization exception) while others are for
>>>>>>>>> single
>>>>>>>>>>>>> records only (like record too large).
>>>>>>>>>>>>>
>>>>>>>>>>>>> For the first category, it seems to not make sense to call the
>>>>>> handle
>>>>>>>>>>>>> but Streams should always fail. If we follow this design, the
>>>> KIP
>>>>>>>>> should
>>>>>>>>>>>>> list all exceptions for which the handler is not called.
>>>>>>>>>>>>>
>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
>>>>>>>>>>>>>> Just catching up on this KIP.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One tiny comment: I would prefer to remove the "Always" from
>>>> the
>>>>>>>>>>>>> handler
>>>>>>>>>>>>>> implementation names -- it sounds "cleaner" to me without it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 11/5/17 12:57 PM, Matt Farmer wrote:
>>>>>>>>>>>>>>> It is agreed, then. I've updated the pull request. I'm trying
>>>> to
>>>>>>>>> also
>>>>>>>>>>>>>>> update the KIP accordingly, but cwiki is being slow and
>>>> dropping
>>>>>>>>>>>>>>> connections..... I'll try again a bit later but please
>>>> consider
>>>>>> the
>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>> updated for all intents and purposes. heh.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <
>>>>> wangguoz@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That makes sense.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me>
>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Interesting. I'm not sure I agree. I've been bitten many
>>>> times
>>>>>> by
>>>>>>>>>>>>>>>>> unintentionally shipping code that fails to properly
>>>> implement
>>>>>>>>>>>>> logging. I
>>>>>>>>>>>>>>>>> always discover this at the exact *worst* moment, too.
>>>>> (Normally
>>>>>>>>> at
>>>>>>>>>>>>> 3 AM
>>>>>>>>>>>>>>>>> during an on-call shift. Hah.) However, if others feel the
>>>>> same
>>>>>>>>> way
>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>> probably be convinced to remove it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We could also meet halfway and say that when a customized
>>>>>>>>>>>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE,
>> we
>>>>> log
>>>>>>>>> at
>>>>>>>>>>>>> DEBUG
>>>>>>>>>>>>>>>>> level instead of WARN level. Would that alternative be
>>>>> appealing
>>>>>>>>> to
>>>>>>>>>>>>> you?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <
>>>>>>>>> wangguoz@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for the updates. I made a pass over the wiki again
>>>> and
>>>>>> it
>>>>>>>>>>>>> looks
>>>>>>>>>>>>>>>>>> good.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> About whether record collector should still internally log
>>>>> the
>>>>>>>>>>>>> error in
>>>>>>>>>>>>>>>>>> addition to what the customized ProductionExceptionHandler
>>>>>>>>> does. I
>>>>>>>>>>>>>>>>>> personally would prefer only to log if the returned value
>>>> is
>>>>>>>>> FAIL
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> indicate that this thread is going to shutdown and trigger
>>>>> the
>>>>>>>>>>>>>>>> exception
>>>>>>>>>>>>>>>>>> handler.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <matt@frmr.me
>>>
>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updated
>>>>> this
>>>>>>>>>>>>> KIP as
>>>>>>>>>>>>>>>>>>> outlined above. The updated KIP is now ready for review
>>>>> again!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <matt@frmr.me
>>>
>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Ah. I actually created both of those in the PR and
>> forgot
>>>>> to
>>>>>>>>>>>>>>>> mention
>>>>>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>>>>>>> by name in the KIP! Thanks for pointing out the
>>>> oversight.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I’ll revise the KIP this afternoon accordingly.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The logging is actually provided for in the record
>>>>> collector.
>>>>>>>>>>>>>>>>> Whenever
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> handler continues it’ll log a warning to ensure that
>> it’s
>>>>>>>>>>>>>>>>> *impossible*
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> write a handler that totally suppresses production
>>>>> exceptions
>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> log.
>>>>>>>>>>>>>>>>>>>> As such, the default continue handler just returns the
>>>>>>>>> continue
>>>>>>>>>>>>>>>>> value.
>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>> can add details about those semantics to the KIP as
>> well.
>>>>>>>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
>>>>>>>>>>>>>>>>> matthias@confluent.io
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> One more comment.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> You mention a default implementation for the handler
>>>> that
>>>>>>>>>>>>> fails. I
>>>>>>>>>>>>>>>>>>>>> think, this should be part of the public API and thus
>>>>> should
>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> proper defined named that is mentioned in the KIP.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We could also add a second implementation for the
>>>>>>>>>>>>> log-and-move-on
>>>>>>>>>>>>>>>>>>>>> strategy, as both are the two most common cases. This
>>>>> class
>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>>>> be part of public API (so users can just set in the
>>>>> config)
>>>>>>>>>>>>> with a
>>>>>>>>>>>>>>>>>>>>> proper name.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>>>>>>>>>>>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are
>>>>>>>>> compatible
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>> PR, but there will be a merge conflict that happens
>>>>>> whenever
>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> PRs
>>>>>>>>>>>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR if
>>>>> 4148
>>>>>>>>> goes
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>> first. :)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
>>>>>>>>>>>>>>>> wangguoz@gmail.com
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> That sounds reasonable, thanks Matt.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> As for the implementation, please note that there is
>>>>>>>>> another
>>>>>>>>>>>>>>>>>> ongoing
>>>>>>>>>>>>>>>>>>> PR
>>>>>>>>>>>>>>>>>>>>>>> that may touch the same classes that you are working
>>>> on:
>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> So it may help if you can also take a look at that PR
>>>>> and
>>>>>>>>> see
>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>> compatible with your changes.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <
>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I've opened this pull request to implement the KIP
>> as
>>>>>>>>>>>>>>>> currently
>>>>>>>>>>>>>>>>>>>>> written:
>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still
>>>>> needs
>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>>>>>>>> added,
>>>>>>>>>>>>>>>>>>>>>>>> but largely represents the shape I was going for.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> If there are more points that folks would like to
>>>>>> discuss,
>>>>>>>>>>>>>>>>> please
>>>>>>>>>>>>>>>>>>> let
>>>>>>>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon
>>>>> I'll
>>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>> [VOTE] thread.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <
>>>>> matt@frmr.me
>>>>>>>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I can’t think of a reason that would be
>> problematic.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Most of the time I would write a handler like this,
>>>> I
>>>>>>>>> either
>>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>> ignore the error or fail and bring everything down
>>>> so
>>>>>>>>> that I
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> spin
>>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>> back up later and resume from earlier offsets. When
>>>> we
>>>>>>>>> start
>>>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>>>>> crashing we’ll eventually try to process the
>> message
>>>>> we
>>>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>>>>>>>>>>>> again.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I’m concerned that “putting in a queue for later”
>>>>> opens
>>>>>>>>> you
>>>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> putting
>>>>>>>>>>>>>>>>>>>>>>>>> messages into the destination topic in an
>> unexpected
>>>>>>>>> order.
>>>>>>>>>>>>>>>>>> However
>>>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>>>>>>> others feel differently, I’m happy to talk about
>> it.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
>>>>>>>>>>>>>>>>>> wangguoz@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my
>>>> understanding
>>>>>> is
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
>>>>> while
>>>>>>>>>>>>>>>> trying
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> That is right. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I looked at the example code, and one thing I
>>>>> realized
>>>>>>>>> that
>>>>>>>>>>>>>>>>>> since
>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>>>> not passing the context in the handle function, we
>>>>> may
>>>>>>>>> not
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> logic to send the fail records into another queue
>>>> for
>>>>>>>>>>>>> future
>>>>>>>>>>>>>>>>>>>>>>> processing.
>>>>>>>>>>>>>>>>>>>>>>>>>> Would people think that would be a big issue?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
>>>>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation,
>>>> and
>>>>>>>>> made
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>>>>>>>>>>> interface, config setting, and parameters line up
>>>>> more
>>>>>>>>>>>>>>>>> closely
>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I believe there are a few specific questions I
>>>> need
>>>>> to
>>>>>>>>>>>>>>>> reply
>>>>>>>>>>>>>>>>>>>>> to.....
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters
>>>> are
>>>>>>>>>>>>> around
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`,
>> or
>>>>> be
>>>>>>>>>>>>>>>>> generics
>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>>>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that
>>>> this
>>>>>>>>> is a
>>>>>>>>>>>>>>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics
>>>>> would
>>>>>>>>> just
>>>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>>> harder
>>>>>>>>>>>>>>>>>>>>>>>>>>> to work with the key and value.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>>>>>>>>>>> `RecordMetadata`
>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>> well
>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my
>>>> understanding
>>>>>> is
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
>>>>> while
>>>>>>>>>>>>>>>> trying
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>>>>>>>>> following
>>>>>>>>>>>>>>>>>>> handling
>>>>>>>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know
>>>>> what
>>>>>>>>> you
>>>>>>>>>>>>>>>>>> think.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <
>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference
>>>> right
>>>>>>>>> now
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>>>>>>>>> planning
>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>>>> updating the KIP again with details from this
>>>>>>>>>>>>> conversation
>>>>>>>>>>>>>>>>>> later
>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
>>>>>>>>>>>>>>>>>>> wangguoz@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding the handle interface of
>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResp
>>>>>>>>>>>>>>>>>>>>>>>> onse,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the
>>>>> actual
>>>>>>>>>>>>>>>> added
>>>>>>>>>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (e.g. what
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> described).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters
>>>>> are
>>>>>>>>>>>>>>>> around
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`,
>>>> or
>>>>> be
>>>>>>>>>>>>>>>>> generics
>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>>>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>>>>>>>>>>> `RecordMetadata`
>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>>> well in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>>>>>>>>> following
>>>>>>>>>>>>>>>>>>> handling
>>>>>>>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1)
>>>>> throw
>>>>>>>>>>>>>>>>> exception
>>>>>>>>>>>>>>>>>>>>>>>>>>> immediately
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the
>>>> error
>>>>>> and
>>>>>>>>>>>>>>>> drop
>>>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a
>>>>> specific
>>>>>>>>>>>>>>>> "error"
>>>>>>>>>>>>>>>>>>> Kafka
>>>>>>>>>>>>>>>>>>>>>>>>>> topic,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> record it as an app-level metrics (
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_
>>>>>>>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> monitoring.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
>>>>>>>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I did some more digging tonight.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization
>> handler
>>>>>> uses
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler"
>>>> for
>>>>> the
>>>>>>>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>>>>>>>> name. No
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think
>> this
>>>>>>>>> should
>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler".
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
>>>>>>>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized
>>>> record
>>>>> is
>>>>>>>>>>>>>>>>>> possible,
>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> changing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly
>>>> doable.
>>>>>>>>>>>>>>>> However,
>>>>>>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>>>>>>> we're
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we
>> have
>>>>> no
>>>>>>>>>>>>>>>>> guarantee
>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the
>>>>> time
>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change
>> to
>>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>>>> like:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse
>>>> handle(final
>>>>>>>>>>>>>>>>>>>>>>>>>> ProducerRecord<..>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final Exception exception)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would this be acceptable?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
>>>>>>>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming
>>>>> and
>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>> type
>>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>> I’m
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context
>> and
>>>>> the
>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>>>>> itself
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other changes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J.
>>>>> Sax <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design
>>>>> (parameter
>>>>>>>>>>>>>>>> naming,
>>>>>>>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> types,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to
>>>> get
>>>>> a
>>>>>>>>>>>>>>>> unified
>>>>>>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> experience.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll
>>>>> update
>>>>>>>>> the
>>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> accordingly.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>>>>>>>>>>>>>>>>>>>>>>>> yuzhihong@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to
>>>> the
>>>>>>>>> KIP ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name
>>>>> should
>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>> specified.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt
>>>>> Farmer <
>>>>>>>>>>>>>>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the
>>>> KIP
>>>>>>>>> that I
>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>>>> filed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> here:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
>>>>>>>>>>>>>>>>> error+handling++when+Kafka+
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some
>> feedback
>>>>> from
>>>>>>>>>>>>>>>> folks
>>>>>>>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can
>>>>> contribute
>>>>>>>>> back.
>>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> -- Guozhang
>>>>
>>>
>>
>>
> 
> 


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Guozhang Wang <wa...@gmail.com>.
Metrics: this is a good point.

Note that currently we have two metrics for `skipped-records` on different
levels:

1) on the highest level, the thread-level, we have a `skipped-records`,
that records all the skipped records due to deserialization errors.
2) on the lower processor-node level, we have a
`skippedDueToDeserializationError`, that records the skipped records on
that specific source node due to deserialization errors.


So you can see that 1) does not cover any other scenarios and can just be
thought of as an aggregate of 2) across all the tasks' source nodes.
However, there are other places that can cause a record to be dropped, for
example:

1) https://issues.apache.org/jira/browse/KAFKA-5784: records could be
dropped due to window elapsed.
2) KIP-210: records could be dropped on the producer side.
3) records could be dropped during user-customized processing on errors.


I think improving the skipped records of all these scenarios itself worth
having another KIP; so I'd suggest we do not drag this KIP-210 into this.


Guozhang


On Wed, Dec 13, 2017 at 3:45 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> One more after thought: should we add a metric for this? We also have a
> metric for `skippedDueToDeserializationError-rate` ?
>
>
> -Matthias
>
>
>
> On 12/6/17 7:54 AM, Bill Bejeck wrote:
> > Thanks for the clearly written KIP, no further comments from my end.
> >
> > -Bill
> >
> > On Wed, Dec 6, 2017 at 9:52 AM, Matt Farmer <ma...@frmr.me> wrote:
> >
> >> There is already a vote thread for this KIP. I can bump it so that it’s
> >> towards the top of your inbox.
> >>
> >> With regard to your concerns:
> >>
> >> 1) We do not have the "ProductionExceptionHandler" interface defined in
> the
> >> wiki page, thought it is sort of clear that it is a one-function
> interface
> >> with record and exception. Could you add it?
> >>
> >>
> >> It is defined, it’s just not defined using a code snippet. The KIP
> reads as
> >> follows:
> >>
> >> ===
> >>
> >> A public interface named ProductionExceptionHandler with a single
> method,
> >> handle, that has the following signature:
> >>
> >>    - ProductionExceptionHandlerResponse handle(ProducerRecord<byte[],
> >>    byte[]> record, Exception exception)
> >>
> >>
> >> ===
> >>
> >> If you’d like me to add a code snippet illustrating this that’s simple
> for
> >> me to do, but it seemed superfluous.
> >>
> >> 2) A quick question about your example code: where would be the "logger"
> >> object be created?
> >>
> >>
> >> SLF4J loggers are typically created as a class member in the class. Such
> >> as:
> >>
> >> private Logger logger = LoggerFactory.getLogger(HelloWorld.class);
> >>
> >> I omit that in my implementation examples for brevity.
> >>
> >> On December 6, 2017 at 2:14:58 AM, Guozhang Wang (wangguoz@gmail.com)
> >> wrote:
> >>
> >> Hello Matt,
> >>
> >> Thanks for writing up the KIP. I made a pass over it and here is a few
> >> minor comments. I think you can consider starting a voting thread for
> this
> >> KIP while addressing them.
> >>
> >> 1) We do not have the "ProductionExceptionHandler" interface defined in
> the
> >> wiki page, thought it is sort of clear that it is a one-function
> interface
> >> with record and exception. Could you add it?
> >>
> >> 2) A quick question about your example code: where would be the "logger"
> >> object be created? Note that the implementation of this interface have
> to
> >> give a non-param constructor, or as a static field of the class but in
> that
> >> case you would not be able to log which instance is throwing this error
> (we
> >> may have multiple producers within a single instance, even within a
> >> thread). Just a reminder to consider in your implementation.
> >>
> >>
> >> Guozhang
> >>
> >> On Tue, Dec 5, 2017 at 3:15 PM, Matthias J. Sax <ma...@confluent.io>
> >> wrote:
> >>
> >>> Thanks a lot for the update! Great write-up! Very clearly explained
> what
> >>> the change will look like!
> >>>
> >>> Looks good to me. No further comments from my side.
> >>>
> >>>
> >>> -Matthias
> >>>
> >>>
> >>> On 12/5/17 9:14 AM, Matt Farmer wrote:
> >>>> I have updated this KIP accordingly.
> >>>>
> >>>> Can you please take a look and let me know if what I wrote looks
> >> correct
> >>> to
> >>>> you?
> >>>>
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
> >>> Streams+fails+to+produce
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Matt
> >>>>
> >>>>
> >>>> On December 4, 2017 at 9:39:13 PM, Matt Farmer (matt@frmr.me) wrote:
> >>>>
> >>>> Hey Matthias, thanks for getting back to me.
> >>>>
> >>>> That's fine. But if we add it to `test` package, we don't need to talk
> >>>> about it in the KIP. `test` is not public API.
> >>>>
> >>>> Yes, that makes sense. It was in the KIP originally because I was, at
> >> one
> >>>> point, planning on including it. We can remove it now that we’ve
> >> decided
> >>> we
> >>>> won’t include it in the public API.
> >>>>
> >>>> Understood. That makes sense. We should explain this clearly in the
> KIP
> >>>> and maybe log all other following exceptions at DEBUG level?
> >>>>
> >>>>
> >>>> I thought it was clear in the KIP, but I can go back and double check
> >> my
> >>>> wording and revise it to try and make it clearer.
> >>>>
> >>>> I’ll take a look at doing more work on the KIP and the Pull Request
> >>>> tomorrow.
> >>>>
> >>>> Thanks again!
> >>>>
> >>>> On December 4, 2017 at 5:50:33 PM, Matthias J. Sax (
> >>> matthias@confluent.io)
> >>>> wrote:
> >>>>
> >>>> Hey,
> >>>>
> >>>> About your questions:
> >>>>
> >>>>>>> Acknowledged, so is ProducerFencedException the only kind of
> >>> exception I
> >>>>>>> need to change my behavior on? Or are there other types I need to
> >>>> check? Is
> >>>>>>> there a comprehensive list somewhere?
> >>>>
> >>>> I cannot think if any other atm. We should list all fatal exceptions
> >> for
> >>>> which we don't call the handler and explain why (exception is "global"
> >>>> and will affect all other records, too | ProducerFenced is
> >> self-healing).
> >>>>
> >>>> We started to collect and categorize exception here (not completed
> >> yet):
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/
> >>> Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions
> >>>> :
> >>>>
> >>>> This list should be a good starting point though.
> >>>>
> >>>>> I include it in the test package because I have tests that assert
> that
> >>> if
> >>>>> the record collector impl encounters an Exception and receives a
> >>> CONTINUE
> >>>>> that it actually does CONTINUE.
> >>>>
> >>>> That's fine. But if we add it to `test` package, we don't need to talk
> >>>> about it in the KIP. `test` is not public API.
> >>>>
> >>>>> I didn't want to invoke the handler in places where the CONTINUE or
> >> FAIL
> >>>>> result would just be ignored. Presumably, after a FAIL has been
> >>> returned,
> >>>>> subsequent exceptions are likely to be repeats or noise from my
> >>>>> understanding of the code paths. If this is incorrect we can revisit.
> >>>>
> >>>> Understood. That makes sense. We should explain this clearly in the
> KIP
> >>>> and maybe log all other following exceptions at DEBUG level?
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>> On 12/1/17 11:43 AM, Matt Farmer wrote:
> >>>>> Bump! It's been three days here and I haven't seen any further
> >> feedback.
> >>>>> Eager to get this resolved, approved, and merged. =)
> >>>>>
> >>>>> On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <ma...@frmr.me> wrote:
> >>>>>
> >>>>>> Hi there, sorry for the delay in responding. Last week had a holiday
> >>> and
> >>>>>> several busy work days in it so I'm just now getting around to
> >>>> responding.
> >>>>>>
> >>>>>>> We would only exclude
> >>>>>>> exception Streams can handle itself (like ProducerFencedException)
> >> --
> >>>>>>> thus, if the handler has code to react to this, it would not be
> bad,
> >>> as
> >>>>>>> this code is just never called.
> >>>>>> [...]
> >>>>>>> Thus, I am still in favor of not calling the
> >>> ProductionExceptionHandler
> >>>>>>> for fatal exception.
> >>>>>>
> >>>>>> Acknowledged, so is ProducerFencedException the only kind of
> >> exception
> >>> I
> >>>>>> need to change my behavior on? Or are there other types I need to
> >>> check?
> >>>> Is
> >>>>>> there a comprehensive list somewhere?
> >>>>>>
> >>>>>>> About the "always continue" case. Sounds good to me to remove it
> >> (not
> >>>>>>> sure why we need it in test package?)
> >>>>>>
> >>>>>> I include it in the test package because I have tests that assert
> >> that
> >>> if
> >>>>>> the record collector impl encounters an Exception and receives a
> >>> CONTINUE
> >>>>>> that it actually does CONTINUE.
> >>>>>>
> >>>>>>> What is there reasoning for invoking the handler only for the first
> >>>>>>> exception?
> >>>>>>
> >>>>>> I didn't want to invoke the handler in places where the CONTINUE or
> >>> FAIL
> >>>>>> result would just be ignored. Presumably, after a FAIL has been
> >>> returned,
> >>>>>> subsequent exceptions are likely to be repeats or noise from my
> >>>>>> understanding of the code paths. If this is incorrect we can
> revisit.
> >>>>>>
> >>>>>> Once I get the answers to these questions I can make another pass on
> >>> the
> >>>>>> pull request!
> >>>>>>
> >>>>>> Matt
> >>>>>>
> >>>>>> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <
> >> matthias@confluent.io
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks for following up!
> >>>>>>>
> >>>>>>> One thought about an older reply from you:
> >>>>>>>
> >>>>>>>>>>> I strongly disagree here. The purpose of this handler isn't
> >> *just*
> >>>> to
> >>>>>>>>>>> make a decision for streams. There may also be desirable side
> >>>>>>> effects that
> >>>>>>>>>>> users wish to cause when production exceptions occur. There may
> >> be
> >>>>>>>>>>> side-effects that they wish to cause when
> >> AuthenticationExceptions
> >>>>>>> occur,
> >>>>>>>>>>> as well. We should still give them the hooks to preform those
> >> side
> >>>>>>> effects.
> >>>>>>>
> >>>>>>> And your follow up:
> >>>>>>>
> >>>>>>>>> - I think I would rather invoke it for all exceptions that could
> >>>>>>> occur
> >>>>>>>>> from attempting to produce - even those exceptions were returning
> >>>>>>> CONTINUE
> >>>>>>>>> may not be a good idea (e.g. Authorization exception). Until
> there
> >>>>>>> is a
> >>>>>>>>> different type for exceptions that are totally fatal (for example
> >> a
> >>>>>>>>> FatalStreamsException or some sort), maintaining a list of
> >>>>>>> exceptions that
> >>>>>>>>> you can intercept with this handler and exceptions you cannot
> >> would
> >>>>>>> be
> >>>>>>>>> really error-prone and hard to keep correct.
> >>>>>>>
> >>>>>>> I understand what you are saying, however, consider that Streams
> >> needs
> >>>>>>> to die for a fatal exception. Thus, if you call the handler for a
> >>> fatal
> >>>>>>> exception, we would need to ignore the return value and fail --
> this
> >>>>>>> seems to be rather intuitive. Furthermore, users can register an
> >>>>>>> uncaught-exception-handler and side effects for fatal errors can be
> >>>>>>> triggered there.
> >>>>>>>
> >>>>>>> Btw: an AuthorizationException is fatal -- not sure what you mean
> by
> >>> an
> >>>>>>> "totally fatal" exception -- there is no superlative to fatal from
> >> my
> >>>>>>> understanding.
> >>>>>>>
> >>>>>>> About maintaining a list of exceptions: I don't think this is too
> >>> hard,
> >>>>>>> and users also don't need to worry about it IMHO. We would only
> >>> exclude
> >>>>>>> exception Streams can handle itself (like ProducerFencedException)
> >> --
> >>>>>>> thus, if the handler has code to react to this, it would not be
> bad,
> >>> as
> >>>>>>> this code is just never called. In case that there is an exception
> >>>>>>> Streams could actually handle but we still call the handler (ie,
> >> bug),
> >>>>>>> there should be no harm either -- the handler needs to return
> either
> >>>>>>> CONTINUE or FAIL and we would obey. It could only happen, that
> >> Streams
> >>>>>>> dies---as request by the user(!)---even if Streams could actually
> >>> handle
> >>>>>>> the error and move on. But this seems to be not a issue from my
> >> point
> >>> of
> >>>>>>> view.
> >>>>>>>
> >>>>>>> Thus, I am still in favor of not calling the
> >>> ProductionExceptionHandler
> >>>>>>> for fatal exception.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> About the "always continue" case. Sounds good to me to remove it
> >> (not
> >>>>>>> sure why we need it in test package?) and to rename the "failing"
> >>>>>>> handler to "Default" (even if "default" is less descriptive and I
> >>> would
> >>>>>>> still prefer "Fail" in the name).
> >>>>>>>
> >>>>>>>
> >>>>>>> Last question:
> >>>>>>>
> >>>>>>>>> - Continue to *only* invoke it on the first exception that we
> >>>>>>>>> encounter (before sendException is set)
> >>>>>>>
> >>>>>>>
> >>>>>>> What is there reasoning for invoking the handler only for the first
> >>>>>>> exception?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>> On 11/20/17 10:43 AM, Matt Farmer wrote:
> >>>>>>>> Alright, here are some updates I'm planning to make after thinking
> >> on
> >>>>>>> this
> >>>>>>>> for awhile:
> >>>>>>>>
> >>>>>>>> - Given that the "always continue" handler isn't something I'd
> >>>>>>> recommend
> >>>>>>>> for production use as is, I'm going to move it into the test
> >>>>>>> namespace and
> >>>>>>>> remove it from mention in the public API.
> >>>>>>>> - I'm going to rename the "AlwaysFailProductionExceptionHandler"
> >> to
> >>>>>>>> "DefaultProductionExceptionHandler"
> >>>>>>>> - Given that the API for the exception handler involves being
> >>>>>>> invoked by
> >>>>>>>> streams to make a decision about whether or not to continue — I
> >>>>>>> think that
> >>>>>>>> we should:
> >>>>>>>> - Continue to *only* invoke it on the first exception that we
> >>>>>>>> encounter (before sendException is set)
> >>>>>>>> - Stop invoking it for the self-healing fenced exceptions.
> >>>>>>>> - I think I would rather invoke it for all exceptions that could
> >>>>>>> occur
> >>>>>>>> from attempting to produce - even those exceptions were returning
> >>>>>>> CONTINUE
> >>>>>>>> may not be a good idea (e.g. Authorization exception). Until there
> >>>>>>> is a
> >>>>>>>> different type for exceptions that are totally fatal (for example
> a
> >>>>>>>> FatalStreamsException or some sort), maintaining a list of
> >>>>>>> exceptions that
> >>>>>>>> you can intercept with this handler and exceptions you cannot
> would
> >>>>>>> be
> >>>>>>>> really error-prone and hard to keep correct.
> >>>>>>>> - I'm happy to file a KIP for the creation of this new Exception
> >>>>>>> type
> >>>>>>>> if there is interest.
> >>>>>>>>
> >>>>>>>> @ Matthias — What do you think about the above?
> >>>>>>>>
> >>>>>>>> On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:
> >>>>>>>>
> >>>>>>>>> I responded before reading your code review and didn't see the
> bit
> >>>>>>> about
> >>>>>>>>> how ProducerFencedException is self-healing. This error handling
> >>> logic
> >>>>>>> is
> >>>>>>>>> *quite* confusing to reason about... I think I'm going to sit
> down
> >>>> with
> >>>>>>>>> the code a bit more today, but I'm inclined to think that if the
> >>>> fenced
> >>>>>>>>> exceptions are, indeed, self healing that we still invoke the
> >>> handler
> >>>>>>> but
> >>>>>>>>> ignore its result for those exceptions.
> >>>>>>>>>
> >>>>>>>>> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me>
> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi there,
> >>>>>>>>>>
> >>>>>>>>>> Following up here...
> >>>>>>>>>>
> >>>>>>>>>>> One tiny comment: I would prefer to remove the "Always" from
> the
> >>>>>>>>>> handler implementation names -- it sounds "cleaner" to me
> without
> >>> it.
> >>>>>>>>>>
> >>>>>>>>>> Let me think on this. I generally prefer expressiveness to
> >>>> clean-ness,
> >>>>>>>>>> and I think that comes out in the names I chose for things. The
> >>> straw
> >>>>>>> man
> >>>>>>>>>> in my head is the person that tried to substitute in the
> >>>>>>> "AlwaysContinue"
> >>>>>>>>>> variant and the "Always" is a trigger to really consider whether
> >> or
> >>>>>>> not
> >>>>>>>>>> they *always* want to try to continue.
> >>>>>>>>>>
> >>>>>>>>>> To be truthful, after some thought, using the "AlwaysContinue"
> >>>> variant
> >>>>>>>>>> isn't something I'd recommend generally in a production system.
> >>>>>>> Ideally
> >>>>>>>>>> these handlers are targeted at handling a specific series of
> >>>>>>> exceptions
> >>>>>>>>>> that a user wants to ignore, and not ignoring all exceptions.
> >> More
> >>> on
> >>>>>>> this
> >>>>>>>>>> in a minute.
> >>>>>>>>>>
> >>>>>>>>>>> For the first category, it seems to not make sense to call the
> >>>>>>> handle but
> >>>>>>>>>> Streams should always fail. If we follow this design, the KIP
> >>> should
> >>>>>>> list
> >>>>>>>>>> all exceptions for which the handler is not called.
> >>>>>>>>>>
> >>>>>>>>>> I strongly disagree here. The purpose of this handler isn't
> >> *just*
> >>> to
> >>>>>>>>>> make a decision for streams. There may also be desirable side
> >>> effects
> >>>>>>> that
> >>>>>>>>>> users wish to cause when production exceptions occur. There may
> >> be
> >>>>>>>>>> side-effects that they wish to cause when
> >> AuthenticationExceptions
> >>>>>>> occur,
> >>>>>>>>>> as well. We should still give them the hooks to preform those
> >> side
> >>>>>>> effects.
> >>>>>>>>>>
> >>>>>>>>>> In light of the above, I'm thinking that the
> >>>>>>>>>> "AlwaysContinueProductionExceptionHandler" variant should
> >>> probably be
> >>>>>>>>>> removed from the public API and moved into tests since that's
> >> where
> >>>>>>> it's
> >>>>>>>>>> most useful. The more I think on it, the more I feel that having
> >>> that
> >>>>>>> in
> >>>>>>>>>> the public API is a landmine. If you agree, then, we can rename
> >> the
> >>>>>>>>>> "AlwaysFailProductionExceptionHandler" to
> >>>>>>>>>> "DefaultProductionExceptionHandler".
> >>>>>>>>>>
> >>>>>>>>>> Thoughts?
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <
> >>>>>>> matthias@confluent.io>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I just review the PR, and there is one thing we should discuss.
> >>>>>>>>>>>
> >>>>>>>>>>> There are different types of exceptions that could occur. Some
> >>> apply
> >>>>>>> to
> >>>>>>>>>>> all records (like Authorization exception) while others are for
> >>>>>>> single
> >>>>>>>>>>> records only (like record too large).
> >>>>>>>>>>>
> >>>>>>>>>>> For the first category, it seems to not make sense to call the
> >>>> handle
> >>>>>>>>>>> but Streams should always fail. If we follow this design, the
> >> KIP
> >>>>>>> should
> >>>>>>>>>>> list all exceptions for which the handler is not called.
> >>>>>>>>>>>
> >>>>>>>>>>> WDYT?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -Matthias
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
> >>>>>>>>>>>> Just catching up on this KIP.
> >>>>>>>>>>>>
> >>>>>>>>>>>> One tiny comment: I would prefer to remove the "Always" from
> >> the
> >>>>>>>>>>> handler
> >>>>>>>>>>>> implementation names -- it sounds "cleaner" to me without it.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 11/5/17 12:57 PM, Matt Farmer wrote:
> >>>>>>>>>>>>> It is agreed, then. I've updated the pull request. I'm trying
> >> to
> >>>>>>> also
> >>>>>>>>>>>>> update the KIP accordingly, but cwiki is being slow and
> >> dropping
> >>>>>>>>>>>>> connections..... I'll try again a bit later but please
> >> consider
> >>>> the
> >>>>>>>>>>> KIP
> >>>>>>>>>>>>> updated for all intents and purposes. heh.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <
> >>> wangguoz@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> That makes sense.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me>
> >>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Interesting. I'm not sure I agree. I've been bitten many
> >> times
> >>>> by
> >>>>>>>>>>>>>>> unintentionally shipping code that fails to properly
> >> implement
> >>>>>>>>>>> logging. I
> >>>>>>>>>>>>>>> always discover this at the exact *worst* moment, too.
> >>> (Normally
> >>>>>>> at
> >>>>>>>>>>> 3 AM
> >>>>>>>>>>>>>>> during an on-call shift. Hah.) However, if others feel the
> >>> same
> >>>>>>> way
> >>>>>>>>>>> I
> >>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>> probably be convinced to remove it.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> We could also meet halfway and say that when a customized
> >>>>>>>>>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE,
> we
> >>> log
> >>>>>>> at
> >>>>>>>>>>> DEBUG
> >>>>>>>>>>>>>>> level instead of WARN level. Would that alternative be
> >>> appealing
> >>>>>>> to
> >>>>>>>>>>> you?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <
> >>>>>>> wangguoz@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for the updates. I made a pass over the wiki again
> >> and
> >>>> it
> >>>>>>>>>>> looks
> >>>>>>>>>>>>>>>> good.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> About whether record collector should still internally log
> >>> the
> >>>>>>>>>>> error in
> >>>>>>>>>>>>>>>> addition to what the customized ProductionExceptionHandler
> >>>>>>> does. I
> >>>>>>>>>>>>>>>> personally would prefer only to log if the returned value
> >> is
> >>>>>>> FAIL
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>> indicate that this thread is going to shutdown and trigger
> >>> the
> >>>>>>>>>>>>>> exception
> >>>>>>>>>>>>>>>> handler.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <matt@frmr.me
> >
> >>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updated
> >>> this
> >>>>>>>>>>> KIP as
> >>>>>>>>>>>>>>>>> outlined above. The updated KIP is now ready for review
> >>> again!
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <matt@frmr.me
> >
> >>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Ah. I actually created both of those in the PR and
> forgot
> >>> to
> >>>>>>>>>>>>>> mention
> >>>>>>>>>>>>>>>> them
> >>>>>>>>>>>>>>>>>> by name in the KIP! Thanks for pointing out the
> >> oversight.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I’ll revise the KIP this afternoon accordingly.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The logging is actually provided for in the record
> >>> collector.
> >>>>>>>>>>>>>>> Whenever
> >>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>> handler continues it’ll log a warning to ensure that
> it’s
> >>>>>>>>>>>>>>> *impossible*
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> write a handler that totally suppresses production
> >>> exceptions
> >>>>>>>>>>> from
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> log.
> >>>>>>>>>>>>>>>>>> As such, the default continue handler just returns the
> >>>>>>> continue
> >>>>>>>>>>>>>>> value.
> >>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>> can add details about those semantics to the KIP as
> well.
> >>>>>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
> >>>>>>>>>>>>>>> matthias@confluent.io
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> One more comment.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> You mention a default implementation for the handler
> >> that
> >>>>>>>>>>> fails. I
> >>>>>>>>>>>>>>>>>>> think, this should be part of the public API and thus
> >>> should
> >>>>>>>>>>> have
> >>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>> proper defined named that is mentioned in the KIP.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> We could also add a second implementation for the
> >>>>>>>>>>> log-and-move-on
> >>>>>>>>>>>>>>>>>>> strategy, as both are the two most common cases. This
> >>> class
> >>>>>>>>>>> should
> >>>>>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>>>>> be part of public API (so users can just set in the
> >>> config)
> >>>>>>>>>>> with a
> >>>>>>>>>>>>>>>>>>> proper name.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
> >>>>>>>>>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are
> >>>>>>> compatible
> >>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> PR, but there will be a merge conflict that happens
> >>>> whenever
> >>>>>>>>>>> one
> >>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> PRs
> >>>>>>>>>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR if
> >>> 4148
> >>>>>>> goes
> >>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> first. :)
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
> >>>>>>>>>>>>>> wangguoz@gmail.com
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> That sounds reasonable, thanks Matt.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> As for the implementation, please note that there is
> >>>>>>> another
> >>>>>>>>>>>>>>>> ongoing
> >>>>>>>>>>>>>>>>> PR
> >>>>>>>>>>>>>>>>>>>>> that may touch the same classes that you are working
> >> on:
> >>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> So it may help if you can also take a look at that PR
> >>> and
> >>>>>>> see
> >>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>> compatible with your changes.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <
> >>>>>>> matt@frmr.me>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I've opened this pull request to implement the KIP
> as
> >>>>>>>>>>>>>> currently
> >>>>>>>>>>>>>>>>>>> written:
> >>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still
> >>> needs
> >>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>> tests
> >>>>>>>>>>>>>>>>>>>>>> added,
> >>>>>>>>>>>>>>>>>>>>>> but largely represents the shape I was going for.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> If there are more points that folks would like to
> >>>> discuss,
> >>>>>>>>>>>>>>> please
> >>>>>>>>>>>>>>>>> let
> >>>>>>>>>>>>>>>>>>> me
> >>>>>>>>>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon
> >>> I'll
> >>>>>>>>>>>>>>> probably
> >>>>>>>>>>>>>>>>>>> start
> >>>>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>> [VOTE] thread.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>> Matt
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <
> >>> matt@frmr.me
> >>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> I can’t think of a reason that would be
> problematic.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Most of the time I would write a handler like this,
> >> I
> >>>>>>> either
> >>>>>>>>>>>>>>> want
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>> ignore the error or fail and bring everything down
> >> so
> >>>>>>> that I
> >>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>> spin
> >>>>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>> back up later and resume from earlier offsets. When
> >> we
> >>>>>>> start
> >>>>>>>>>>>>>> up
> >>>>>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>>>>> crashing we’ll eventually try to process the
> message
> >>> we
> >>>>>>>>>>>>>> failed
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> produce
> >>>>>>>>>>>>>>>>>>>>>>> again.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> I’m concerned that “putting in a queue for later”
> >>> opens
> >>>>>>> you
> >>>>>>>>>>>>>> up
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> putting
> >>>>>>>>>>>>>>>>>>>>>>> messages into the destination topic in an
> unexpected
> >>>>>>> order.
> >>>>>>>>>>>>>>>> However
> >>>>>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>>>>>> others feel differently, I’m happy to talk about
> it.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
> >>>>>>>>>>>>>>>> wangguoz@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my
> >> understanding
> >>>> is
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
> >>> while
> >>>>>>>>>>>>>> trying
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> produce.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> That is right. Thanks.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I looked at the example code, and one thing I
> >>> realized
> >>>>>>> that
> >>>>>>>>>>>>>>>> since
> >>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>>>>> not passing the context in the handle function, we
> >>> may
> >>>>>>> not
> >>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>> implement
> >>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> logic to send the fail records into another queue
> >> for
> >>>>>>>>>>> future
> >>>>>>>>>>>>>>>>>>>>> processing.
> >>>>>>>>>>>>>>>>>>>>>>>> Would people think that would be a big issue?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
> >>>>>>>>>>> matt@frmr.me
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Hello all,
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation,
> >> and
> >>>>>>> made
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>> its
> >>>>>>>>>>>>>>>>>>>>>>>>> interface, config setting, and parameters line up
> >>> more
> >>>>>>>>>>>>>>> closely
> >>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I believe there are a few specific questions I
> >> need
> >>> to
> >>>>>>>>>>>>>> reply
> >>>>>>>>>>>>>>>>>>> to.....
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters
> >> are
> >>>>>>>>>>> around
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`,
> or
> >>> be
> >>>>>>>>>>>>>>> generics
> >>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> >>>>>>>>>>>>>>>> `ProducerRecord<?
> >>>>>>>>>>>>>>>>>>>>>>>> extends
> >>>>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that
> >> this
> >>>>>>> is a
> >>>>>>>>>>>>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics
> >>> would
> >>>>>>> just
> >>>>>>>>>>>>>>> make
> >>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>>>>> harder
> >>>>>>>>>>>>>>>>>>>>>>>>> to work with the key and value.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
> >>>>>>>>>>>>>>> `RecordMetadata`
> >>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>>> well
> >>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my
> >> understanding
> >>>> is
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
> >>> while
> >>>>>>>>>>>>>> trying
> >>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>> produce.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
> >>>>>>> following
> >>>>>>>>>>>>>>>>> handling
> >>>>>>>>>>>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know
> >>> what
> >>>>>>> you
> >>>>>>>>>>>>>>>> think.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>>> Matt
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <
> >>>>>>> matt@frmr.me
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference
> >> right
> >>>>>>> now
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>> am
> >>>>>>>>>>>>>>>>>>>>>>>> planning
> >>>>>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>>>>> updating the KIP again with details from this
> >>>>>>>>>>> conversation
> >>>>>>>>>>>>>>>> later
> >>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>> week.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
> >>>>>>>>>>>>>>>>> wangguoz@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding the handle interface of
> >>>>>>>>>>>>>>>>> ProductionExceptionHandlerResp
> >>>>>>>>>>>>>>>>>>>>>> onse,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the
> >>> actual
> >>>>>>>>>>>>>> added
> >>>>>>>>>>>>>>>>> config
> >>>>>>>>>>>>>>>>>>>>>>>> names
> >>>>>>>>>>>>>>>>>>>>>>>>>>> (e.g. what
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>>>>> 161%3A+streams+
> >>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
> >>>>>>>>>>>>>>>>>>>>>>>>>>> described).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters
> >>> are
> >>>>>>>>>>>>>> around
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`,
> >> or
> >>> be
> >>>>>>>>>>>>>>> generics
> >>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> >>>>>>>>>>>>>>>> `ProducerRecord<?
> >>>>>>>>>>>>>>>>>>>>>>>> extends
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
> >>>>>>>>>>>>>>> `RecordMetadata`
> >>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>>>>> well in
> >>>>>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
> >>>>>>> following
> >>>>>>>>>>>>>>>>> handling
> >>>>>>>>>>>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1)
> >>> throw
> >>>>>>>>>>>>>>> exception
> >>>>>>>>>>>>>>>>>>>>>>>>> immediately
> >>>>>>>>>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the
> >> error
> >>>> and
> >>>>>>>>>>>>>> drop
> >>>>>>>>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a
> >>> specific
> >>>>>>>>>>>>>> "error"
> >>>>>>>>>>>>>>>>> Kafka
> >>>>>>>>>>>>>>>>>>>>>>>> topic,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>>>>> record it as an app-level metrics (
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>> https://kafka.apache.org/documentation/#kafka_streams_
> >>>>>>>>>>>>>>>>> monitoring
> >>>>>>>>>>>>>>>>>>>>> )
> >>>>>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>>>> monitoring.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
> >>>>>>>>>>>>>> matt@frmr.me
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I did some more digging tonight.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization
> handler
> >>>> uses
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler"
> >> for
> >>> the
> >>>>>>>>>>>>>>> config
> >>>>>>>>>>>>>>>>>>>>>> name. No
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think
> this
> >>>>>>> should
> >>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler".
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
> >>>>>>>>>>>>>> matt@frmr.me
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized
> >> record
> >>> is
> >>>>>>>>>>>>>>>> possible,
> >>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>> changing
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly
> >> doable.
> >>>>>>>>>>>>>> However,
> >>>>>>>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>>>>>>> we're
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we
> have
> >>> no
> >>>>>>>>>>>>>>> guarantee
> >>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the
> >>> time
> >>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>> particular
> >>>>>>>>>>>>>>>>>>>>>>>>>>> handler
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change
> to
> >>>>>>>>>>>>>> something
> >>>>>>>>>>>>>>>>>>>>> like:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse
> >> handle(final
> >>>>>>>>>>>>>>>>>>>>>>>> ProducerRecord<..>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> final Exception exception)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would this be acceptable?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
> >>>>>>>>>>>>>>> matt@frmr.me>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming
> >>> and
> >>>>>>>>>>>>>> return
> >>>>>>>>>>>>>>>> type
> >>>>>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>>>>> I’m
> >>>>>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context
> and
> >>> the
> >>>>>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>>>>>> itself
> >>>>>>>>>>>>>>>>>>>>>>>>>>> without
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other changes.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J.
> >>> Sax <
> >>>>>>>>>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>>>>> 161%3A+streams+
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design
> >>> (parameter
> >>>>>>>>>>>>>> naming,
> >>>>>>>>>>>>>>>>>>>>>> return
> >>>>>>>>>>>>>>>>>>>>>>>>>>> types,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to
> >> get
> >>> a
> >>>>>>>>>>>>>> unified
> >>>>>>>>>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> experience.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll
> >>> update
> >>>>>>> the
> >>>>>>>>>>>>>>> KIP
> >>>>>>>>>>>>>>>>>>>>>>>>>>> accordingly.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
> >>>>>>>>>>>>>>>>>>>>>> yuzhihong@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to
> >> the
> >>>>>>> KIP ?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name
> >>> should
> >>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>> specified.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt
> >>> Farmer <
> >>>>>>>>>>>>>>>>>>>>>> matt@frmr.me>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the
> >> KIP
> >>>>>>> that I
> >>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>>>>> filed
> >>>>>>>>>>>>>>>>>>>>>>>>>>> here:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
> >>>>>>>>>>>>>>> error+handling++when+Kafka+
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some
> feedback
> >>> from
> >>>>>>>>>>>>>> folks
> >>>>>>>>>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>> idea
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can
> >>> contribute
> >>>>>>> back.
> >>>>>>>>>>>>>> :)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
>
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by "Matthias J. Sax" <ma...@confluent.io>.
One more after thought: should we add a metric for this? We also have a
metric for `skippedDueToDeserializationError-rate` ?


-Matthias



On 12/6/17 7:54 AM, Bill Bejeck wrote:
> Thanks for the clearly written KIP, no further comments from my end.
> 
> -Bill
> 
> On Wed, Dec 6, 2017 at 9:52 AM, Matt Farmer <ma...@frmr.me> wrote:
> 
>> There is already a vote thread for this KIP. I can bump it so that it’s
>> towards the top of your inbox.
>>
>> With regard to your concerns:
>>
>> 1) We do not have the "ProductionExceptionHandler" interface defined in the
>> wiki page, thought it is sort of clear that it is a one-function interface
>> with record and exception. Could you add it?
>>
>>
>> It is defined, it’s just not defined using a code snippet. The KIP reads as
>> follows:
>>
>> ===
>>
>> A public interface named ProductionExceptionHandler with a single method,
>> handle, that has the following signature:
>>
>>    - ProductionExceptionHandlerResponse handle(ProducerRecord<byte[],
>>    byte[]> record, Exception exception)
>>
>>
>> ===
>>
>> If you’d like me to add a code snippet illustrating this that’s simple for
>> me to do, but it seemed superfluous.
>>
>> 2) A quick question about your example code: where would be the "logger"
>> object be created?
>>
>>
>> SLF4J loggers are typically created as a class member in the class. Such
>> as:
>>
>> private Logger logger = LoggerFactory.getLogger(HelloWorld.class);
>>
>> I omit that in my implementation examples for brevity.
>>
>> On December 6, 2017 at 2:14:58 AM, Guozhang Wang (wangguoz@gmail.com)
>> wrote:
>>
>> Hello Matt,
>>
>> Thanks for writing up the KIP. I made a pass over it and here is a few
>> minor comments. I think you can consider starting a voting thread for this
>> KIP while addressing them.
>>
>> 1) We do not have the "ProductionExceptionHandler" interface defined in the
>> wiki page, thought it is sort of clear that it is a one-function interface
>> with record and exception. Could you add it?
>>
>> 2) A quick question about your example code: where would be the "logger"
>> object be created? Note that the implementation of this interface have to
>> give a non-param constructor, or as a static field of the class but in that
>> case you would not be able to log which instance is throwing this error (we
>> may have multiple producers within a single instance, even within a
>> thread). Just a reminder to consider in your implementation.
>>
>>
>> Guozhang
>>
>> On Tue, Dec 5, 2017 at 3:15 PM, Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>
>>> Thanks a lot for the update! Great write-up! Very clearly explained what
>>> the change will look like!
>>>
>>> Looks good to me. No further comments from my side.
>>>
>>>
>>> -Matthias
>>>
>>>
>>> On 12/5/17 9:14 AM, Matt Farmer wrote:
>>>> I have updated this KIP accordingly.
>>>>
>>>> Can you please take a look and let me know if what I wrote looks
>> correct
>>> to
>>>> you?
>>>>
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 210+-+Provide+for+custom+error+handling++when+Kafka+
>>> Streams+fails+to+produce
>>>>
>>>> Thanks!
>>>>
>>>> Matt
>>>>
>>>>
>>>> On December 4, 2017 at 9:39:13 PM, Matt Farmer (matt@frmr.me) wrote:
>>>>
>>>> Hey Matthias, thanks for getting back to me.
>>>>
>>>> That's fine. But if we add it to `test` package, we don't need to talk
>>>> about it in the KIP. `test` is not public API.
>>>>
>>>> Yes, that makes sense. It was in the KIP originally because I was, at
>> one
>>>> point, planning on including it. We can remove it now that we’ve
>> decided
>>> we
>>>> won’t include it in the public API.
>>>>
>>>> Understood. That makes sense. We should explain this clearly in the KIP
>>>> and maybe log all other following exceptions at DEBUG level?
>>>>
>>>>
>>>> I thought it was clear in the KIP, but I can go back and double check
>> my
>>>> wording and revise it to try and make it clearer.
>>>>
>>>> I’ll take a look at doing more work on the KIP and the Pull Request
>>>> tomorrow.
>>>>
>>>> Thanks again!
>>>>
>>>> On December 4, 2017 at 5:50:33 PM, Matthias J. Sax (
>>> matthias@confluent.io)
>>>> wrote:
>>>>
>>>> Hey,
>>>>
>>>> About your questions:
>>>>
>>>>>>> Acknowledged, so is ProducerFencedException the only kind of
>>> exception I
>>>>>>> need to change my behavior on? Or are there other types I need to
>>>> check? Is
>>>>>>> there a comprehensive list somewhere?
>>>>
>>>> I cannot think if any other atm. We should list all fatal exceptions
>> for
>>>> which we don't call the handler and explain why (exception is "global"
>>>> and will affect all other records, too | ProducerFenced is
>> self-healing).
>>>>
>>>> We started to collect and categorize exception here (not completed
>> yet):
>>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>> Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions
>>>> :
>>>>
>>>> This list should be a good starting point though.
>>>>
>>>>> I include it in the test package because I have tests that assert that
>>> if
>>>>> the record collector impl encounters an Exception and receives a
>>> CONTINUE
>>>>> that it actually does CONTINUE.
>>>>
>>>> That's fine. But if we add it to `test` package, we don't need to talk
>>>> about it in the KIP. `test` is not public API.
>>>>
>>>>> I didn't want to invoke the handler in places where the CONTINUE or
>> FAIL
>>>>> result would just be ignored. Presumably, after a FAIL has been
>>> returned,
>>>>> subsequent exceptions are likely to be repeats or noise from my
>>>>> understanding of the code paths. If this is incorrect we can revisit.
>>>>
>>>> Understood. That makes sense. We should explain this clearly in the KIP
>>>> and maybe log all other following exceptions at DEBUG level?
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 12/1/17 11:43 AM, Matt Farmer wrote:
>>>>> Bump! It's been three days here and I haven't seen any further
>> feedback.
>>>>> Eager to get this resolved, approved, and merged. =)
>>>>>
>>>>> On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <ma...@frmr.me> wrote:
>>>>>
>>>>>> Hi there, sorry for the delay in responding. Last week had a holiday
>>> and
>>>>>> several busy work days in it so I'm just now getting around to
>>>> responding.
>>>>>>
>>>>>>> We would only exclude
>>>>>>> exception Streams can handle itself (like ProducerFencedException)
>> --
>>>>>>> thus, if the handler has code to react to this, it would not be bad,
>>> as
>>>>>>> this code is just never called.
>>>>>> [...]
>>>>>>> Thus, I am still in favor of not calling the
>>> ProductionExceptionHandler
>>>>>>> for fatal exception.
>>>>>>
>>>>>> Acknowledged, so is ProducerFencedException the only kind of
>> exception
>>> I
>>>>>> need to change my behavior on? Or are there other types I need to
>>> check?
>>>> Is
>>>>>> there a comprehensive list somewhere?
>>>>>>
>>>>>>> About the "always continue" case. Sounds good to me to remove it
>> (not
>>>>>>> sure why we need it in test package?)
>>>>>>
>>>>>> I include it in the test package because I have tests that assert
>> that
>>> if
>>>>>> the record collector impl encounters an Exception and receives a
>>> CONTINUE
>>>>>> that it actually does CONTINUE.
>>>>>>
>>>>>>> What is there reasoning for invoking the handler only for the first
>>>>>>> exception?
>>>>>>
>>>>>> I didn't want to invoke the handler in places where the CONTINUE or
>>> FAIL
>>>>>> result would just be ignored. Presumably, after a FAIL has been
>>> returned,
>>>>>> subsequent exceptions are likely to be repeats or noise from my
>>>>>> understanding of the code paths. If this is incorrect we can revisit.
>>>>>>
>>>>>> Once I get the answers to these questions I can make another pass on
>>> the
>>>>>> pull request!
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <
>> matthias@confluent.io
>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for following up!
>>>>>>>
>>>>>>> One thought about an older reply from you:
>>>>>>>
>>>>>>>>>>> I strongly disagree here. The purpose of this handler isn't
>> *just*
>>>> to
>>>>>>>>>>> make a decision for streams. There may also be desirable side
>>>>>>> effects that
>>>>>>>>>>> users wish to cause when production exceptions occur. There may
>> be
>>>>>>>>>>> side-effects that they wish to cause when
>> AuthenticationExceptions
>>>>>>> occur,
>>>>>>>>>>> as well. We should still give them the hooks to preform those
>> side
>>>>>>> effects.
>>>>>>>
>>>>>>> And your follow up:
>>>>>>>
>>>>>>>>> - I think I would rather invoke it for all exceptions that could
>>>>>>> occur
>>>>>>>>> from attempting to produce - even those exceptions were returning
>>>>>>> CONTINUE
>>>>>>>>> may not be a good idea (e.g. Authorization exception). Until there
>>>>>>> is a
>>>>>>>>> different type for exceptions that are totally fatal (for example
>> a
>>>>>>>>> FatalStreamsException or some sort), maintaining a list of
>>>>>>> exceptions that
>>>>>>>>> you can intercept with this handler and exceptions you cannot
>> would
>>>>>>> be
>>>>>>>>> really error-prone and hard to keep correct.
>>>>>>>
>>>>>>> I understand what you are saying, however, consider that Streams
>> needs
>>>>>>> to die for a fatal exception. Thus, if you call the handler for a
>>> fatal
>>>>>>> exception, we would need to ignore the return value and fail -- this
>>>>>>> seems to be rather intuitive. Furthermore, users can register an
>>>>>>> uncaught-exception-handler and side effects for fatal errors can be
>>>>>>> triggered there.
>>>>>>>
>>>>>>> Btw: an AuthorizationException is fatal -- not sure what you mean by
>>> an
>>>>>>> "totally fatal" exception -- there is no superlative to fatal from
>> my
>>>>>>> understanding.
>>>>>>>
>>>>>>> About maintaining a list of exceptions: I don't think this is too
>>> hard,
>>>>>>> and users also don't need to worry about it IMHO. We would only
>>> exclude
>>>>>>> exception Streams can handle itself (like ProducerFencedException)
>> --
>>>>>>> thus, if the handler has code to react to this, it would not be bad,
>>> as
>>>>>>> this code is just never called. In case that there is an exception
>>>>>>> Streams could actually handle but we still call the handler (ie,
>> bug),
>>>>>>> there should be no harm either -- the handler needs to return either
>>>>>>> CONTINUE or FAIL and we would obey. It could only happen, that
>> Streams
>>>>>>> dies---as request by the user(!)---even if Streams could actually
>>> handle
>>>>>>> the error and move on. But this seems to be not a issue from my
>> point
>>> of
>>>>>>> view.
>>>>>>>
>>>>>>> Thus, I am still in favor of not calling the
>>> ProductionExceptionHandler
>>>>>>> for fatal exception.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> About the "always continue" case. Sounds good to me to remove it
>> (not
>>>>>>> sure why we need it in test package?) and to rename the "failing"
>>>>>>> handler to "Default" (even if "default" is less descriptive and I
>>> would
>>>>>>> still prefer "Fail" in the name).
>>>>>>>
>>>>>>>
>>>>>>> Last question:
>>>>>>>
>>>>>>>>> - Continue to *only* invoke it on the first exception that we
>>>>>>>>> encounter (before sendException is set)
>>>>>>>
>>>>>>>
>>>>>>> What is there reasoning for invoking the handler only for the first
>>>>>>> exception?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>> On 11/20/17 10:43 AM, Matt Farmer wrote:
>>>>>>>> Alright, here are some updates I'm planning to make after thinking
>> on
>>>>>>> this
>>>>>>>> for awhile:
>>>>>>>>
>>>>>>>> - Given that the "always continue" handler isn't something I'd
>>>>>>> recommend
>>>>>>>> for production use as is, I'm going to move it into the test
>>>>>>> namespace and
>>>>>>>> remove it from mention in the public API.
>>>>>>>> - I'm going to rename the "AlwaysFailProductionExceptionHandler"
>> to
>>>>>>>> "DefaultProductionExceptionHandler"
>>>>>>>> - Given that the API for the exception handler involves being
>>>>>>> invoked by
>>>>>>>> streams to make a decision about whether or not to continue — I
>>>>>>> think that
>>>>>>>> we should:
>>>>>>>> - Continue to *only* invoke it on the first exception that we
>>>>>>>> encounter (before sendException is set)
>>>>>>>> - Stop invoking it for the self-healing fenced exceptions.
>>>>>>>> - I think I would rather invoke it for all exceptions that could
>>>>>>> occur
>>>>>>>> from attempting to produce - even those exceptions were returning
>>>>>>> CONTINUE
>>>>>>>> may not be a good idea (e.g. Authorization exception). Until there
>>>>>>> is a
>>>>>>>> different type for exceptions that are totally fatal (for example a
>>>>>>>> FatalStreamsException or some sort), maintaining a list of
>>>>>>> exceptions that
>>>>>>>> you can intercept with this handler and exceptions you cannot would
>>>>>>> be
>>>>>>>> really error-prone and hard to keep correct.
>>>>>>>> - I'm happy to file a KIP for the creation of this new Exception
>>>>>>> type
>>>>>>>> if there is interest.
>>>>>>>>
>>>>>>>> @ Matthias — What do you think about the above?
>>>>>>>>
>>>>>>>> On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:
>>>>>>>>
>>>>>>>>> I responded before reading your code review and didn't see the bit
>>>>>>> about
>>>>>>>>> how ProducerFencedException is self-healing. This error handling
>>> logic
>>>>>>> is
>>>>>>>>> *quite* confusing to reason about... I think I'm going to sit down
>>>> with
>>>>>>>>> the code a bit more today, but I'm inclined to think that if the
>>>> fenced
>>>>>>>>> exceptions are, indeed, self healing that we still invoke the
>>> handler
>>>>>>> but
>>>>>>>>> ignore its result for those exceptions.
>>>>>>>>>
>>>>>>>>> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me> wrote:
>>>>>>>>>
>>>>>>>>>> Hi there,
>>>>>>>>>>
>>>>>>>>>> Following up here...
>>>>>>>>>>
>>>>>>>>>>> One tiny comment: I would prefer to remove the "Always" from the
>>>>>>>>>> handler implementation names -- it sounds "cleaner" to me without
>>> it.
>>>>>>>>>>
>>>>>>>>>> Let me think on this. I generally prefer expressiveness to
>>>> clean-ness,
>>>>>>>>>> and I think that comes out in the names I chose for things. The
>>> straw
>>>>>>> man
>>>>>>>>>> in my head is the person that tried to substitute in the
>>>>>>> "AlwaysContinue"
>>>>>>>>>> variant and the "Always" is a trigger to really consider whether
>> or
>>>>>>> not
>>>>>>>>>> they *always* want to try to continue.
>>>>>>>>>>
>>>>>>>>>> To be truthful, after some thought, using the "AlwaysContinue"
>>>> variant
>>>>>>>>>> isn't something I'd recommend generally in a production system.
>>>>>>> Ideally
>>>>>>>>>> these handlers are targeted at handling a specific series of
>>>>>>> exceptions
>>>>>>>>>> that a user wants to ignore, and not ignoring all exceptions.
>> More
>>> on
>>>>>>> this
>>>>>>>>>> in a minute.
>>>>>>>>>>
>>>>>>>>>>> For the first category, it seems to not make sense to call the
>>>>>>> handle but
>>>>>>>>>> Streams should always fail. If we follow this design, the KIP
>>> should
>>>>>>> list
>>>>>>>>>> all exceptions for which the handler is not called.
>>>>>>>>>>
>>>>>>>>>> I strongly disagree here. The purpose of this handler isn't
>> *just*
>>> to
>>>>>>>>>> make a decision for streams. There may also be desirable side
>>> effects
>>>>>>> that
>>>>>>>>>> users wish to cause when production exceptions occur. There may
>> be
>>>>>>>>>> side-effects that they wish to cause when
>> AuthenticationExceptions
>>>>>>> occur,
>>>>>>>>>> as well. We should still give them the hooks to preform those
>> side
>>>>>>> effects.
>>>>>>>>>>
>>>>>>>>>> In light of the above, I'm thinking that the
>>>>>>>>>> "AlwaysContinueProductionExceptionHandler" variant should
>>> probably be
>>>>>>>>>> removed from the public API and moved into tests since that's
>> where
>>>>>>> it's
>>>>>>>>>> most useful. The more I think on it, the more I feel that having
>>> that
>>>>>>> in
>>>>>>>>>> the public API is a landmine. If you agree, then, we can rename
>> the
>>>>>>>>>> "AlwaysFailProductionExceptionHandler" to
>>>>>>>>>> "DefaultProductionExceptionHandler".
>>>>>>>>>>
>>>>>>>>>> Thoughts?
>>>>>>>>>>
>>>>>>>>>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <
>>>>>>> matthias@confluent.io>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I just review the PR, and there is one thing we should discuss.
>>>>>>>>>>>
>>>>>>>>>>> There are different types of exceptions that could occur. Some
>>> apply
>>>>>>> to
>>>>>>>>>>> all records (like Authorization exception) while others are for
>>>>>>> single
>>>>>>>>>>> records only (like record too large).
>>>>>>>>>>>
>>>>>>>>>>> For the first category, it seems to not make sense to call the
>>>> handle
>>>>>>>>>>> but Streams should always fail. If we follow this design, the
>> KIP
>>>>>>> should
>>>>>>>>>>> list all exceptions for which the handler is not called.
>>>>>>>>>>>
>>>>>>>>>>> WDYT?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
>>>>>>>>>>>> Just catching up on this KIP.
>>>>>>>>>>>>
>>>>>>>>>>>> One tiny comment: I would prefer to remove the "Always" from
>> the
>>>>>>>>>>> handler
>>>>>>>>>>>> implementation names -- it sounds "cleaner" to me without it.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/5/17 12:57 PM, Matt Farmer wrote:
>>>>>>>>>>>>> It is agreed, then. I've updated the pull request. I'm trying
>> to
>>>>>>> also
>>>>>>>>>>>>> update the KIP accordingly, but cwiki is being slow and
>> dropping
>>>>>>>>>>>>> connections..... I'll try again a bit later but please
>> consider
>>>> the
>>>>>>>>>>> KIP
>>>>>>>>>>>>> updated for all intents and purposes. heh.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <
>>> wangguoz@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> That makes sense.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me>
>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Interesting. I'm not sure I agree. I've been bitten many
>> times
>>>> by
>>>>>>>>>>>>>>> unintentionally shipping code that fails to properly
>> implement
>>>>>>>>>>> logging. I
>>>>>>>>>>>>>>> always discover this at the exact *worst* moment, too.
>>> (Normally
>>>>>>> at
>>>>>>>>>>> 3 AM
>>>>>>>>>>>>>>> during an on-call shift. Hah.) However, if others feel the
>>> same
>>>>>>> way
>>>>>>>>>>> I
>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>> probably be convinced to remove it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We could also meet halfway and say that when a customized
>>>>>>>>>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE, we
>>> log
>>>>>>> at
>>>>>>>>>>> DEBUG
>>>>>>>>>>>>>>> level instead of WARN level. Would that alternative be
>>> appealing
>>>>>>> to
>>>>>>>>>>> you?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <
>>>>>>> wangguoz@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the updates. I made a pass over the wiki again
>> and
>>>> it
>>>>>>>>>>> looks
>>>>>>>>>>>>>>>> good.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> About whether record collector should still internally log
>>> the
>>>>>>>>>>> error in
>>>>>>>>>>>>>>>> addition to what the customized ProductionExceptionHandler
>>>>>>> does. I
>>>>>>>>>>>>>>>> personally would prefer only to log if the returned value
>> is
>>>>>>> FAIL
>>>>>>>>>>> to
>>>>>>>>>>>>>>>> indicate that this thread is going to shutdown and trigger
>>> the
>>>>>>>>>>>>>> exception
>>>>>>>>>>>>>>>> handler.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me>
>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updated
>>> this
>>>>>>>>>>> KIP as
>>>>>>>>>>>>>>>>> outlined above. The updated KIP is now ready for review
>>> again!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me>
>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ah. I actually created both of those in the PR and forgot
>>> to
>>>>>>>>>>>>>> mention
>>>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>>>>> by name in the KIP! Thanks for pointing out the
>> oversight.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I’ll revise the KIP this afternoon accordingly.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The logging is actually provided for in the record
>>> collector.
>>>>>>>>>>>>>>> Whenever
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> handler continues it’ll log a warning to ensure that it’s
>>>>>>>>>>>>>>> *impossible*
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> write a handler that totally suppresses production
>>> exceptions
>>>>>>>>>>> from
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> log.
>>>>>>>>>>>>>>>>>> As such, the default continue handler just returns the
>>>>>>> continue
>>>>>>>>>>>>>>> value.
>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>> can add details about those semantics to the KIP as well.
>>>>>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
>>>>>>>>>>>>>>> matthias@confluent.io
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> One more comment.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> You mention a default implementation for the handler
>> that
>>>>>>>>>>> fails. I
>>>>>>>>>>>>>>>>>>> think, this should be part of the public API and thus
>>> should
>>>>>>>>>>> have
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> proper defined named that is mentioned in the KIP.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We could also add a second implementation for the
>>>>>>>>>>> log-and-move-on
>>>>>>>>>>>>>>>>>>> strategy, as both are the two most common cases. This
>>> class
>>>>>>>>>>> should
>>>>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>>>>> be part of public API (so users can just set in the
>>> config)
>>>>>>>>>>> with a
>>>>>>>>>>>>>>>>>>> proper name.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>>>>>>>>>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are
>>>>>>> compatible
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> PR, but there will be a merge conflict that happens
>>>> whenever
>>>>>>>>>>> one
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> PRs
>>>>>>>>>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR if
>>> 4148
>>>>>>> goes
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> first. :)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
>>>>>>>>>>>>>> wangguoz@gmail.com
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> That sounds reasonable, thanks Matt.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> As for the implementation, please note that there is
>>>>>>> another
>>>>>>>>>>>>>>>> ongoing
>>>>>>>>>>>>>>>>> PR
>>>>>>>>>>>>>>>>>>>>> that may touch the same classes that you are working
>> on:
>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> So it may help if you can also take a look at that PR
>>> and
>>>>>>> see
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> compatible with your changes.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <
>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I've opened this pull request to implement the KIP as
>>>>>>>>>>>>>> currently
>>>>>>>>>>>>>>>>>>> written:
>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still
>>> needs
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>>>>>> added,
>>>>>>>>>>>>>>>>>>>>>> but largely represents the shape I was going for.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> If there are more points that folks would like to
>>>> discuss,
>>>>>>>>>>>>>>> please
>>>>>>>>>>>>>>>>> let
>>>>>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon
>>> I'll
>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>> [VOTE] thread.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <
>>> matt@frmr.me
>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I can’t think of a reason that would be problematic.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Most of the time I would write a handler like this,
>> I
>>>>>>> either
>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> ignore the error or fail and bring everything down
>> so
>>>>>>> that I
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> spin
>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>> back up later and resume from earlier offsets. When
>> we
>>>>>>> start
>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>>>>> crashing we’ll eventually try to process the message
>>> we
>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>>>>>>>>>> again.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I’m concerned that “putting in a queue for later”
>>> opens
>>>>>>> you
>>>>>>>>>>>>>> up
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> putting
>>>>>>>>>>>>>>>>>>>>>>> messages into the destination topic in an unexpected
>>>>>>> order.
>>>>>>>>>>>>>>>> However
>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>>>>> others feel differently, I’m happy to talk about it.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
>>>>>>>>>>>>>>>> wangguoz@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my
>> understanding
>>>> is
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
>>> while
>>>>>>>>>>>>>> trying
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> That is right. Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I looked at the example code, and one thing I
>>> realized
>>>>>>> that
>>>>>>>>>>>>>>>> since
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>>>>> not passing the context in the handle function, we
>>> may
>>>>>>> not
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> logic to send the fail records into another queue
>> for
>>>>>>>>>>> future
>>>>>>>>>>>>>>>>>>>>> processing.
>>>>>>>>>>>>>>>>>>>>>>>> Would people think that would be a big issue?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
>>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation,
>> and
>>>>>>> made
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>>>>>>>>> interface, config setting, and parameters line up
>>> more
>>>>>>>>>>>>>>> closely
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I believe there are a few specific questions I
>> need
>>> to
>>>>>>>>>>>>>> reply
>>>>>>>>>>>>>>>>>>> to.....
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters
>> are
>>>>>>>>>>> around
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or
>>> be
>>>>>>>>>>>>>>> generics
>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that
>> this
>>>>>>> is a
>>>>>>>>>>>>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics
>>> would
>>>>>>> just
>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>> harder
>>>>>>>>>>>>>>>>>>>>>>>>> to work with the key and value.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>>>>>>>>> `RecordMetadata`
>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>> well
>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my
>> understanding
>>>> is
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
>>> while
>>>>>>>>>>>>>> trying
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>>>>>>> following
>>>>>>>>>>>>>>>>> handling
>>>>>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know
>>> what
>>>>>>> you
>>>>>>>>>>>>>>>> think.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <
>>>>>>> matt@frmr.me
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference
>> right
>>>>>>> now
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>>>>>>> planning
>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>>> updating the KIP again with details from this
>>>>>>>>>>> conversation
>>>>>>>>>>>>>>>> later
>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
>>>>>>>>>>>>>>>>> wangguoz@gmail.com
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding the handle interface of
>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResp
>>>>>>>>>>>>>>>>>>>>>> onse,
>>>>>>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the
>>> actual
>>>>>>>>>>>>>> added
>>>>>>>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>>>>>>>>>>>> (e.g. what
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>>>>>>>> described).
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters
>>> are
>>>>>>>>>>>>>> around
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`,
>> or
>>> be
>>>>>>>>>>>>>>> generics
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>>>>>>>>> `RecordMetadata`
>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>> well in
>>>>>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>>>>>>> following
>>>>>>>>>>>>>>>>> handling
>>>>>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1)
>>> throw
>>>>>>>>>>>>>>> exception
>>>>>>>>>>>>>>>>>>>>>>>>> immediately
>>>>>>>>>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the
>> error
>>>> and
>>>>>>>>>>>>>> drop
>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a
>>> specific
>>>>>>>>>>>>>> "error"
>>>>>>>>>>>>>>>>> Kafka
>>>>>>>>>>>>>>>>>>>>>>>> topic,
>>>>>>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>>>> record it as an app-level metrics (
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_
>>>>>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>> monitoring.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
>>>>>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I did some more digging tonight.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler
>>>> uses
>>>>>>>>>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler"
>> for
>>> the
>>>>>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>>>>>> name. No
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this
>>>>>>> should
>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler".
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
>>>>>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized
>> record
>>> is
>>>>>>>>>>>>>>>> possible,
>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>> changing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly
>> doable.
>>>>>>>>>>>>>> However,
>>>>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>>>>> we're
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have
>>> no
>>>>>>>>>>>>>>> guarantee
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the
>>> time
>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>>>>> like:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse
>> handle(final
>>>>>>>>>>>>>>>>>>>>>>>> ProducerRecord<..>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> final Exception exception)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would this be acceptable?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
>>>>>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming
>>> and
>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>> type
>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>> I’m
>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and
>>> the
>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>>> itself
>>>>>>>>>>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other changes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J.
>>> Sax <
>>>>>>>>>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design
>>> (parameter
>>>>>>>>>>>>>> naming,
>>>>>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>>>>>>>>>>> types,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to
>> get
>>> a
>>>>>>>>>>>>>> unified
>>>>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>>>>> experience.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll
>>> update
>>>>>>> the
>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>>>>>>> accordingly.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>>>>>>>>>>>>>>>>>>>>>> yuzhihong@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to
>> the
>>>>>>> KIP ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name
>>> should
>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> specified.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt
>>> Farmer <
>>>>>>>>>>>>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the
>> KIP
>>>>>>> that I
>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>> filed
>>>>>>>>>>>>>>>>>>>>>>>>>>> here:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
>>>>>>>>>>>>>>> error+handling++when+Kafka+
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback
>>> from
>>>>>>>>>>>>>> folks
>>>>>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can
>>> contribute
>>>>>>> back.
>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> -- Guozhang
>>
> 


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Bill Bejeck <bb...@gmail.com>.
Thanks for the clearly written KIP, no further comments from my end.

-Bill

On Wed, Dec 6, 2017 at 9:52 AM, Matt Farmer <ma...@frmr.me> wrote:

> There is already a vote thread for this KIP. I can bump it so that it’s
> towards the top of your inbox.
>
> With regard to your concerns:
>
> 1) We do not have the "ProductionExceptionHandler" interface defined in the
> wiki page, thought it is sort of clear that it is a one-function interface
> with record and exception. Could you add it?
>
>
> It is defined, it’s just not defined using a code snippet. The KIP reads as
> follows:
>
> ===
>
> A public interface named ProductionExceptionHandler with a single method,
> handle, that has the following signature:
>
>    - ProductionExceptionHandlerResponse handle(ProducerRecord<byte[],
>    byte[]> record, Exception exception)
>
>
> ===
>
> If you’d like me to add a code snippet illustrating this that’s simple for
> me to do, but it seemed superfluous.
>
> 2) A quick question about your example code: where would be the "logger"
> object be created?
>
>
> SLF4J loggers are typically created as a class member in the class. Such
> as:
>
> private Logger logger = LoggerFactory.getLogger(HelloWorld.class);
>
> I omit that in my implementation examples for brevity.
>
> On December 6, 2017 at 2:14:58 AM, Guozhang Wang (wangguoz@gmail.com)
> wrote:
>
> Hello Matt,
>
> Thanks for writing up the KIP. I made a pass over it and here is a few
> minor comments. I think you can consider starting a voting thread for this
> KIP while addressing them.
>
> 1) We do not have the "ProductionExceptionHandler" interface defined in the
> wiki page, thought it is sort of clear that it is a one-function interface
> with record and exception. Could you add it?
>
> 2) A quick question about your example code: where would be the "logger"
> object be created? Note that the implementation of this interface have to
> give a non-param constructor, or as a static field of the class but in that
> case you would not be able to log which instance is throwing this error (we
> may have multiple producers within a single instance, even within a
> thread). Just a reminder to consider in your implementation.
>
>
> Guozhang
>
> On Tue, Dec 5, 2017 at 3:15 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> > Thanks a lot for the update! Great write-up! Very clearly explained what
> > the change will look like!
> >
> > Looks good to me. No further comments from my side.
> >
> >
> > -Matthias
> >
> >
> > On 12/5/17 9:14 AM, Matt Farmer wrote:
> > > I have updated this KIP accordingly.
> > >
> > > Can you please take a look and let me know if what I wrote looks
> correct
> > to
> > > you?
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 210+-+Provide+for+custom+error+handling++when+Kafka+
> > Streams+fails+to+produce
> > >
> > > Thanks!
> > >
> > > Matt
> > >
> > >
> > > On December 4, 2017 at 9:39:13 PM, Matt Farmer (matt@frmr.me) wrote:
> > >
> > > Hey Matthias, thanks for getting back to me.
> > >
> > > That's fine. But if we add it to `test` package, we don't need to talk
> > > about it in the KIP. `test` is not public API.
> > >
> > > Yes, that makes sense. It was in the KIP originally because I was, at
> one
> > > point, planning on including it. We can remove it now that we’ve
> decided
> > we
> > > won’t include it in the public API.
> > >
> > > Understood. That makes sense. We should explain this clearly in the KIP
> > > and maybe log all other following exceptions at DEBUG level?
> > >
> > >
> > > I thought it was clear in the KIP, but I can go back and double check
> my
> > > wording and revise it to try and make it clearer.
> > >
> > > I’ll take a look at doing more work on the KIP and the Pull Request
> > > tomorrow.
> > >
> > > Thanks again!
> > >
> > > On December 4, 2017 at 5:50:33 PM, Matthias J. Sax (
> > matthias@confluent.io)
> > > wrote:
> > >
> > > Hey,
> > >
> > > About your questions:
> > >
> > >>>> Acknowledged, so is ProducerFencedException the only kind of
> > exception I
> > >>>> need to change my behavior on? Or are there other types I need to
> > > check? Is
> > >>>> there a comprehensive list somewhere?
> > >
> > > I cannot think if any other atm. We should list all fatal exceptions
> for
> > > which we don't call the handler and explain why (exception is "global"
> > > and will affect all other records, too | ProducerFenced is
> self-healing).
> > >
> > > We started to collect and categorize exception here (not completed
> yet):
> > > https://cwiki.apache.org/confluence/display/KAFKA/
> > Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions
> > > :
> > >
> > > This list should be a good starting point though.
> > >
> > >> I include it in the test package because I have tests that assert that
> > if
> > >> the record collector impl encounters an Exception and receives a
> > CONTINUE
> > >> that it actually does CONTINUE.
> > >
> > > That's fine. But if we add it to `test` package, we don't need to talk
> > > about it in the KIP. `test` is not public API.
> > >
> > >> I didn't want to invoke the handler in places where the CONTINUE or
> FAIL
> > >> result would just be ignored. Presumably, after a FAIL has been
> > returned,
> > >> subsequent exceptions are likely to be repeats or noise from my
> > >> understanding of the code paths. If this is incorrect we can revisit.
> > >
> > > Understood. That makes sense. We should explain this clearly in the KIP
> > > and maybe log all other following exceptions at DEBUG level?
> > >
> > >
> > > -Matthias
> > >
> > >
> > > On 12/1/17 11:43 AM, Matt Farmer wrote:
> > >> Bump! It's been three days here and I haven't seen any further
> feedback.
> > >> Eager to get this resolved, approved, and merged. =)
> > >>
> > >> On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <ma...@frmr.me> wrote:
> > >>
> > >>> Hi there, sorry for the delay in responding. Last week had a holiday
> > and
> > >>> several busy work days in it so I'm just now getting around to
> > > responding.
> > >>>
> > >>>> We would only exclude
> > >>>> exception Streams can handle itself (like ProducerFencedException)
> --
> > >>>> thus, if the handler has code to react to this, it would not be bad,
> > as
> > >>>> this code is just never called.
> > >>> [...]
> > >>>> Thus, I am still in favor of not calling the
> > ProductionExceptionHandler
> > >>>> for fatal exception.
> > >>>
> > >>> Acknowledged, so is ProducerFencedException the only kind of
> exception
> > I
> > >>> need to change my behavior on? Or are there other types I need to
> > check?
> > > Is
> > >>> there a comprehensive list somewhere?
> > >>>
> > >>>> About the "always continue" case. Sounds good to me to remove it
> (not
> > >>>> sure why we need it in test package?)
> > >>>
> > >>> I include it in the test package because I have tests that assert
> that
> > if
> > >>> the record collector impl encounters an Exception and receives a
> > CONTINUE
> > >>> that it actually does CONTINUE.
> > >>>
> > >>>> What is there reasoning for invoking the handler only for the first
> > >>>> exception?
> > >>>
> > >>> I didn't want to invoke the handler in places where the CONTINUE or
> > FAIL
> > >>> result would just be ignored. Presumably, after a FAIL has been
> > returned,
> > >>> subsequent exceptions are likely to be repeats or noise from my
> > >>> understanding of the code paths. If this is incorrect we can revisit.
> > >>>
> > >>> Once I get the answers to these questions I can make another pass on
> > the
> > >>> pull request!
> > >>>
> > >>> Matt
> > >>>
> > >>> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <
> matthias@confluent.io
> > >
> > >>> wrote:
> > >>>
> > >>>> Thanks for following up!
> > >>>>
> > >>>> One thought about an older reply from you:
> > >>>>
> > >>>>>>>> I strongly disagree here. The purpose of this handler isn't
> *just*
> > > to
> > >>>>>>>> make a decision for streams. There may also be desirable side
> > >>>> effects that
> > >>>>>>>> users wish to cause when production exceptions occur. There may
> be
> > >>>>>>>> side-effects that they wish to cause when
> AuthenticationExceptions
> > >>>> occur,
> > >>>>>>>> as well. We should still give them the hooks to preform those
> side
> > >>>> effects.
> > >>>>
> > >>>> And your follow up:
> > >>>>
> > >>>>>> - I think I would rather invoke it for all exceptions that could
> > >>>> occur
> > >>>>>> from attempting to produce - even those exceptions were returning
> > >>>> CONTINUE
> > >>>>>> may not be a good idea (e.g. Authorization exception). Until there
> > >>>> is a
> > >>>>>> different type for exceptions that are totally fatal (for example
> a
> > >>>>>> FatalStreamsException or some sort), maintaining a list of
> > >>>> exceptions that
> > >>>>>> you can intercept with this handler and exceptions you cannot
> would
> > >>>> be
> > >>>>>> really error-prone and hard to keep correct.
> > >>>>
> > >>>> I understand what you are saying, however, consider that Streams
> needs
> > >>>> to die for a fatal exception. Thus, if you call the handler for a
> > fatal
> > >>>> exception, we would need to ignore the return value and fail -- this
> > >>>> seems to be rather intuitive. Furthermore, users can register an
> > >>>> uncaught-exception-handler and side effects for fatal errors can be
> > >>>> triggered there.
> > >>>>
> > >>>> Btw: an AuthorizationException is fatal -- not sure what you mean by
> > an
> > >>>> "totally fatal" exception -- there is no superlative to fatal from
> my
> > >>>> understanding.
> > >>>>
> > >>>> About maintaining a list of exceptions: I don't think this is too
> > hard,
> > >>>> and users also don't need to worry about it IMHO. We would only
> > exclude
> > >>>> exception Streams can handle itself (like ProducerFencedException)
> --
> > >>>> thus, if the handler has code to react to this, it would not be bad,
> > as
> > >>>> this code is just never called. In case that there is an exception
> > >>>> Streams could actually handle but we still call the handler (ie,
> bug),
> > >>>> there should be no harm either -- the handler needs to return either
> > >>>> CONTINUE or FAIL and we would obey. It could only happen, that
> Streams
> > >>>> dies---as request by the user(!)---even if Streams could actually
> > handle
> > >>>> the error and move on. But this seems to be not a issue from my
> point
> > of
> > >>>> view.
> > >>>>
> > >>>> Thus, I am still in favor of not calling the
> > ProductionExceptionHandler
> > >>>> for fatal exception.
> > >>>>
> > >>>>
> > >>>>
> > >>>> About the "always continue" case. Sounds good to me to remove it
> (not
> > >>>> sure why we need it in test package?) and to rename the "failing"
> > >>>> handler to "Default" (even if "default" is less descriptive and I
> > would
> > >>>> still prefer "Fail" in the name).
> > >>>>
> > >>>>
> > >>>> Last question:
> > >>>>
> > >>>>>> - Continue to *only* invoke it on the first exception that we
> > >>>>>> encounter (before sendException is set)
> > >>>>
> > >>>>
> > >>>> What is there reasoning for invoking the handler only for the first
> > >>>> exception?
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> -Matthias
> > >>>>
> > >>>> On 11/20/17 10:43 AM, Matt Farmer wrote:
> > >>>>> Alright, here are some updates I'm planning to make after thinking
> on
> > >>>> this
> > >>>>> for awhile:
> > >>>>>
> > >>>>> - Given that the "always continue" handler isn't something I'd
> > >>>> recommend
> > >>>>> for production use as is, I'm going to move it into the test
> > >>>> namespace and
> > >>>>> remove it from mention in the public API.
> > >>>>> - I'm going to rename the "AlwaysFailProductionExceptionHandler"
> to
> > >>>>> "DefaultProductionExceptionHandler"
> > >>>>> - Given that the API for the exception handler involves being
> > >>>> invoked by
> > >>>>> streams to make a decision about whether or not to continue — I
> > >>>> think that
> > >>>>> we should:
> > >>>>> - Continue to *only* invoke it on the first exception that we
> > >>>>> encounter (before sendException is set)
> > >>>>> - Stop invoking it for the self-healing fenced exceptions.
> > >>>>> - I think I would rather invoke it for all exceptions that could
> > >>>> occur
> > >>>>> from attempting to produce - even those exceptions were returning
> > >>>> CONTINUE
> > >>>>> may not be a good idea (e.g. Authorization exception). Until there
> > >>>> is a
> > >>>>> different type for exceptions that are totally fatal (for example a
> > >>>>> FatalStreamsException or some sort), maintaining a list of
> > >>>> exceptions that
> > >>>>> you can intercept with this handler and exceptions you cannot would
> > >>>> be
> > >>>>> really error-prone and hard to keep correct.
> > >>>>> - I'm happy to file a KIP for the creation of this new Exception
> > >>>> type
> > >>>>> if there is interest.
> > >>>>>
> > >>>>> @ Matthias — What do you think about the above?
> > >>>>>
> > >>>>> On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:
> > >>>>>
> > >>>>>> I responded before reading your code review and didn't see the bit
> > >>>> about
> > >>>>>> how ProducerFencedException is self-healing. This error handling
> > logic
> > >>>> is
> > >>>>>> *quite* confusing to reason about... I think I'm going to sit down
> > > with
> > >>>>>> the code a bit more today, but I'm inclined to think that if the
> > > fenced
> > >>>>>> exceptions are, indeed, self healing that we still invoke the
> > handler
> > >>>> but
> > >>>>>> ignore its result for those exceptions.
> > >>>>>>
> > >>>>>> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me> wrote:
> > >>>>>>
> > >>>>>>> Hi there,
> > >>>>>>>
> > >>>>>>> Following up here...
> > >>>>>>>
> > >>>>>>>> One tiny comment: I would prefer to remove the "Always" from the
> > >>>>>>> handler implementation names -- it sounds "cleaner" to me without
> > it.
> > >>>>>>>
> > >>>>>>> Let me think on this. I generally prefer expressiveness to
> > > clean-ness,
> > >>>>>>> and I think that comes out in the names I chose for things. The
> > straw
> > >>>> man
> > >>>>>>> in my head is the person that tried to substitute in the
> > >>>> "AlwaysContinue"
> > >>>>>>> variant and the "Always" is a trigger to really consider whether
> or
> > >>>> not
> > >>>>>>> they *always* want to try to continue.
> > >>>>>>>
> > >>>>>>> To be truthful, after some thought, using the "AlwaysContinue"
> > > variant
> > >>>>>>> isn't something I'd recommend generally in a production system.
> > >>>> Ideally
> > >>>>>>> these handlers are targeted at handling a specific series of
> > >>>> exceptions
> > >>>>>>> that a user wants to ignore, and not ignoring all exceptions.
> More
> > on
> > >>>> this
> > >>>>>>> in a minute.
> > >>>>>>>
> > >>>>>>>> For the first category, it seems to not make sense to call the
> > >>>> handle but
> > >>>>>>> Streams should always fail. If we follow this design, the KIP
> > should
> > >>>> list
> > >>>>>>> all exceptions for which the handler is not called.
> > >>>>>>>
> > >>>>>>> I strongly disagree here. The purpose of this handler isn't
> *just*
> > to
> > >>>>>>> make a decision for streams. There may also be desirable side
> > effects
> > >>>> that
> > >>>>>>> users wish to cause when production exceptions occur. There may
> be
> > >>>>>>> side-effects that they wish to cause when
> AuthenticationExceptions
> > >>>> occur,
> > >>>>>>> as well. We should still give them the hooks to preform those
> side
> > >>>> effects.
> > >>>>>>>
> > >>>>>>> In light of the above, I'm thinking that the
> > >>>>>>> "AlwaysContinueProductionExceptionHandler" variant should
> > probably be
> > >>>>>>> removed from the public API and moved into tests since that's
> where
> > >>>> it's
> > >>>>>>> most useful. The more I think on it, the more I feel that having
> > that
> > >>>> in
> > >>>>>>> the public API is a landmine. If you agree, then, we can rename
> the
> > >>>>>>> "AlwaysFailProductionExceptionHandler" to
> > >>>>>>> "DefaultProductionExceptionHandler".
> > >>>>>>>
> > >>>>>>> Thoughts?
> > >>>>>>>
> > >>>>>>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <
> > >>>> matthias@confluent.io>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> I just review the PR, and there is one thing we should discuss.
> > >>>>>>>>
> > >>>>>>>> There are different types of exceptions that could occur. Some
> > apply
> > >>>> to
> > >>>>>>>> all records (like Authorization exception) while others are for
> > >>>> single
> > >>>>>>>> records only (like record too large).
> > >>>>>>>>
> > >>>>>>>> For the first category, it seems to not make sense to call the
> > > handle
> > >>>>>>>> but Streams should always fail. If we follow this design, the
> KIP
> > >>>> should
> > >>>>>>>> list all exceptions for which the handler is not called.
> > >>>>>>>>
> > >>>>>>>> WDYT?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> -Matthias
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
> > >>>>>>>>> Just catching up on this KIP.
> > >>>>>>>>>
> > >>>>>>>>> One tiny comment: I would prefer to remove the "Always" from
> the
> > >>>>>>>> handler
> > >>>>>>>>> implementation names -- it sounds "cleaner" to me without it.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> -Matthias
> > >>>>>>>>>
> > >>>>>>>>> On 11/5/17 12:57 PM, Matt Farmer wrote:
> > >>>>>>>>>> It is agreed, then. I've updated the pull request. I'm trying
> to
> > >>>> also
> > >>>>>>>>>> update the KIP accordingly, but cwiki is being slow and
> dropping
> > >>>>>>>>>> connections..... I'll try again a bit later but please
> consider
> > > the
> > >>>>>>>> KIP
> > >>>>>>>>>> updated for all intents and purposes. heh.
> > >>>>>>>>>>
> > >>>>>>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <
> > wangguoz@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> That makes sense.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Guozhang
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me>
> > >>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Interesting. I'm not sure I agree. I've been bitten many
> times
> > > by
> > >>>>>>>>>>>> unintentionally shipping code that fails to properly
> implement
> > >>>>>>>> logging. I
> > >>>>>>>>>>>> always discover this at the exact *worst* moment, too.
> > (Normally
> > >>>> at
> > >>>>>>>> 3 AM
> > >>>>>>>>>>>> during an on-call shift. Hah.) However, if others feel the
> > same
> > >>>> way
> > >>>>>>>> I
> > >>>>>>>>>>> could
> > >>>>>>>>>>>> probably be convinced to remove it.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> We could also meet halfway and say that when a customized
> > >>>>>>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE, we
> > log
> > >>>> at
> > >>>>>>>> DEBUG
> > >>>>>>>>>>>> level instead of WARN level. Would that alternative be
> > appealing
> > >>>> to
> > >>>>>>>> you?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <
> > >>>> wangguoz@gmail.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks for the updates. I made a pass over the wiki again
> and
> > > it
> > >>>>>>>> looks
> > >>>>>>>>>>>>> good.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> About whether record collector should still internally log
> > the
> > >>>>>>>> error in
> > >>>>>>>>>>>>> addition to what the customized ProductionExceptionHandler
> > >>>> does. I
> > >>>>>>>>>>>>> personally would prefer only to log if the returned value
> is
> > >>>> FAIL
> > >>>>>>>> to
> > >>>>>>>>>>>>> indicate that this thread is going to shutdown and trigger
> > the
> > >>>>>>>>>>> exception
> > >>>>>>>>>>>>> handler.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Guozhang
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me>
> > >>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updated
> > this
> > >>>>>>>> KIP as
> > >>>>>>>>>>>>>> outlined above. The updated KIP is now ready for review
> > again!
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me>
> > >>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Ah. I actually created both of those in the PR and forgot
> > to
> > >>>>>>>>>>> mention
> > >>>>>>>>>>>>> them
> > >>>>>>>>>>>>>>> by name in the KIP! Thanks for pointing out the
> oversight.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> I’ll revise the KIP this afternoon accordingly.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> The logging is actually provided for in the record
> > collector.
> > >>>>>>>>>>>> Whenever
> > >>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>> handler continues it’ll log a warning to ensure that it’s
> > >>>>>>>>>>>> *impossible*
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> write a handler that totally suppresses production
> > exceptions
> > >>>>>>>> from
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>> log.
> > >>>>>>>>>>>>>>> As such, the default continue handler just returns the
> > >>>> continue
> > >>>>>>>>>>>> value.
> > >>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>> can add details about those semantics to the KIP as well.
> > >>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
> > >>>>>>>>>>>> matthias@confluent.io
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> One more comment.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> You mention a default implementation for the handler
> that
> > >>>>>>>> fails. I
> > >>>>>>>>>>>>>>>> think, this should be part of the public API and thus
> > should
> > >>>>>>>> have
> > >>>>>>>>>>> a
> > >>>>>>>>>>>>>>>> proper defined named that is mentioned in the KIP.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> We could also add a second implementation for the
> > >>>>>>>> log-and-move-on
> > >>>>>>>>>>>>>>>> strategy, as both are the two most common cases. This
> > class
> > >>>>>>>> should
> > >>>>>>>>>>>>> also
> > >>>>>>>>>>>>>>>> be part of public API (so users can just set in the
> > config)
> > >>>>>>>> with a
> > >>>>>>>>>>>>>>>> proper name.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
> > >>>>>>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are
> > >>>> compatible
> > >>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>> PR, but there will be a merge conflict that happens
> > > whenever
> > >>>>>>>> one
> > >>>>>>>>>>>> of
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>> PRs
> > >>>>>>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR if
> > 4148
> > >>>> goes
> > >>>>>>>>>>> in
> > >>>>>>>>>>>>>>>> first. :)
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
> > >>>>>>>>>>> wangguoz@gmail.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> That sounds reasonable, thanks Matt.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> As for the implementation, please note that there is
> > >>>> another
> > >>>>>>>>>>>>> ongoing
> > >>>>>>>>>>>>>> PR
> > >>>>>>>>>>>>>>>>>> that may touch the same classes that you are working
> on:
> > >>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> So it may help if you can also take a look at that PR
> > and
> > >>>> see
> > >>>>>>>>>>> if
> > >>>>>>>>>>>> it
> > >>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>> compatible with your changes.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Guozhang
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <
> > >>>> matt@frmr.me>
> > >>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> I've opened this pull request to implement the KIP as
> > >>>>>>>>>>> currently
> > >>>>>>>>>>>>>>>> written:
> > >>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still
> > needs
> > >>>>>>>>>>> some
> > >>>>>>>>>>>>>> tests
> > >>>>>>>>>>>>>>>>>>> added,
> > >>>>>>>>>>>>>>>>>>> but largely represents the shape I was going for.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> If there are more points that folks would like to
> > > discuss,
> > >>>>>>>>>>>> please
> > >>>>>>>>>>>>>> let
> > >>>>>>>>>>>>>>>> me
> > >>>>>>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon
> > I'll
> > >>>>>>>>>>>> probably
> > >>>>>>>>>>>>>>>> start
> > >>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>> [VOTE] thread.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>>>>>>> Matt
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <
> > matt@frmr.me
> > >>>>>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> I can’t think of a reason that would be problematic.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Most of the time I would write a handler like this,
> I
> > >>>> either
> > >>>>>>>>>>>> want
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>> ignore the error or fail and bring everything down
> so
> > >>>> that I
> > >>>>>>>>>>>> can
> > >>>>>>>>>>>>>> spin
> > >>>>>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>> back up later and resume from earlier offsets. When
> we
> > >>>> start
> > >>>>>>>>>>> up
> > >>>>>>>>>>>>>> after
> > >>>>>>>>>>>>>>>>>>>> crashing we’ll eventually try to process the message
> > we
> > >>>>>>>>>>> failed
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>> produce
> > >>>>>>>>>>>>>>>>>>>> again.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> I’m concerned that “putting in a queue for later”
> > opens
> > >>>> you
> > >>>>>>>>>>> up
> > >>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>> putting
> > >>>>>>>>>>>>>>>>>>>> messages into the destination topic in an unexpected
> > >>>> order.
> > >>>>>>>>>>>>> However
> > >>>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>>>>> others feel differently, I’m happy to talk about it.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
> > >>>>>>>>>>>>> wangguoz@gmail.com>
> > >>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my
> understanding
> > > is
> > >>>>>>>>>>> that
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> record
> > >>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
> > while
> > >>>>>>>>>>> trying
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>> produce.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> That is right. Thanks.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> I looked at the example code, and one thing I
> > realized
> > >>>> that
> > >>>>>>>>>>>>> since
> > >>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>> are
> > >>>>>>>>>>>>>>>>>>>>> not passing the context in the handle function, we
> > may
> > >>>> not
> > >>>>>>>>>>> be
> > >>>>>>>>>>>>>>>>>> implement
> > >>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> logic to send the fail records into another queue
> for
> > >>>>>>>> future
> > >>>>>>>>>>>>>>>>>> processing.
> > >>>>>>>>>>>>>>>>>>>>> Would people think that would be a big issue?
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Guozhang
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
> > >>>>>>>> matt@frmr.me
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Hello all,
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation,
> and
> > >>>> made
> > >>>>>>>>>>> it
> > >>>>>>>>>>>> so
> > >>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>> its
> > >>>>>>>>>>>>>>>>>>>>>> interface, config setting, and parameters line up
> > more
> > >>>>>>>>>>>> closely
> > >>>>>>>>>>>>>> with
> > >>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> I believe there are a few specific questions I
> need
> > to
> > >>>>>>>>>>> reply
> > >>>>>>>>>>>>>>>> to.....
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters
> are
> > >>>>>>>> around
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> record,
> > >>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or
> > be
> > >>>>>>>>>>>> generics
> > >>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> > >>>>>>>>>>>>> `ProducerRecord<?
> > >>>>>>>>>>>>>>>>>>>>> extends
> > >>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that
> this
> > >>>> is a
> > >>>>>>>>>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics
> > would
> > >>>> just
> > >>>>>>>>>>>> make
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>>>>>> harder
> > >>>>>>>>>>>>>>>>>>>>>> to work with the key and value.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
> > >>>>>>>>>>>> `RecordMetadata`
> > >>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>>>>>> well
> > >>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>> case it is not null?
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my
> understanding
> > > is
> > >>>>>>>>>>> that
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> record
> > >>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
> > while
> > >>>>>>>>>>> trying
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>> produce.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
> > >>>> following
> > >>>>>>>>>>>>>> handling
> > >>>>>>>>>>>>>>>>>>>>> logic
> > >>>>>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know
> > what
> > >>>> you
> > >>>>>>>>>>>>> think.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>>>>>> Matt
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <
> > >>>> matt@frmr.me
> > >>>>>>>>>
> > >>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference
> right
> > >>>> now
> > >>>>>>>>>>> and
> > >>>>>>>>>>>> am
> > >>>>>>>>>>>>>>>>>>>>> planning
> > >>>>>>>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>>>> updating the KIP again with details from this
> > >>>>>>>> conversation
> > >>>>>>>>>>>>> later
> > >>>>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>>>> week.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
> > >>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
> > >>>>>>>>>>>>>> wangguoz@gmail.com
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Regarding the handle interface of
> > >>>>>>>>>>>>>> ProductionExceptionHandlerResp
> > >>>>>>>>>>>>>>>>>>> onse,
> > >>>>>>>>>>>>>>>>>>>>>>>> could
> > >>>>>>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the
> > actual
> > >>>>>>>>>>> added
> > >>>>>>>>>>>>>> config
> > >>>>>>>>>>>>>>>>>>>>> names
> > >>>>>>>>>>>>>>>>>>>>>>>> (e.g. what
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>>>>>>>>>>> 161%3A+streams+
> > >>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
> > >>>>>>>>>>>>>>>>>>>>>>>> described).
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters
> > are
> > >>>>>>>>>>> around
> > >>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>> record,
> > >>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`,
> or
> > be
> > >>>>>>>>>>>> generics
> > >>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> > >>>>>>>>>>>>> `ProducerRecord<?
> > >>>>>>>>>>>>>>>>>>>>> extends
> > >>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
> > >>>>>>>>>>>> `RecordMetadata`
> > >>>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>>>>>>>> well in
> > >>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
> > >>>> following
> > >>>>>>>>>>>>>> handling
> > >>>>>>>>>>>>>>>>>>>>> logic
> > >>>>>>>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1)
> > throw
> > >>>>>>>>>>>> exception
> > >>>>>>>>>>>>>>>>>>>>>> immediately
> > >>>>>>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the
> error
> > > and
> > >>>>>>>>>>> drop
> > >>>>>>>>>>>>>> record
> > >>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a
> > specific
> > >>>>>>>>>>> "error"
> > >>>>>>>>>>>>>> Kafka
> > >>>>>>>>>>>>>>>>>>>>> topic,
> > >>>>>>>>>>>>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>>>>>>>>>>> record it as an app-level metrics (
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>> https://kafka.apache.org/documentation/#kafka_streams_
> > >>>>>>>>>>>>>> monitoring
> > >>>>>>>>>>>>>>>>>> )
> > >>>>>>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>>>>>>> monitoring.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> Guozhang
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
> > >>>>>>>>>>> matt@frmr.me
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> I did some more digging tonight.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler
> > > uses
> > >>>>>>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler"
> for
> > the
> > >>>>>>>>>>>> config
> > >>>>>>>>>>>>>>>>>>> name. No
> > >>>>>>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this
> > >>>> should
> > >>>>>>>>>>> use
> > >>>>>>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler".
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
> > >>>>>>>>>>> matt@frmr.me
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized
> record
> > is
> > >>>>>>>>>>>>> possible,
> > >>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>>> changing
> > >>>>>>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly
> doable.
> > >>>>>>>>>>> However,
> > >>>>>>>>>>>>>>>>>>> because
> > >>>>>>>>>>>>>>>>>>>>>> we're
> > >>>>>>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have
> > no
> > >>>>>>>>>>>> guarantee
> > >>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the
> > time
> > >>>> this
> > >>>>>>>>>>>>>>>>>>> particular
> > >>>>>>>>>>>>>>>>>>>>>>>> handler
> > >>>>>>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
> > >>>>>>>>>>> something
> > >>>>>>>>>>>>>>>>>> like:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse
> handle(final
> > >>>>>>>>>>>>>>>>>>>>> ProducerRecord<..>
> > >>>>>>>>>>>>>>>>>>>>>>>>> record,
> > >>>>>>>>>>>>>>>>>>>>>>>>>> final Exception exception)
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Would this be acceptable?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
> > >>>>>>>>>>>> matt@frmr.me>
> > >>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming
> > and
> > >>>>>>>>>>> return
> > >>>>>>>>>>>>> type
> > >>>>>>>>>>>>>>>>>>> but
> > >>>>>>>>>>>>>>>>>>>>>> I’m
> > >>>>>>>>>>>>>>>>>>>>>>>> not
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and
> > the
> > >>>>>>>>>>> record
> > >>>>>>>>>>>>>>>>>>> itself
> > >>>>>>>>>>>>>>>>>>>>>>>> without
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> other changes.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J.
> > Sax <
> > >>>>>>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>>>>>>>>>>> 161%3A+streams+
> > >>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design
> > (parameter
> > >>>>>>>>>>> naming,
> > >>>>>>>>>>>>>>>>>>> return
> > >>>>>>>>>>>>>>>>>>>>>>>> types,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to
> get
> > a
> > >>>>>>>>>>> unified
> > >>>>>>>>>>>>>>>>>> user
> > >>>>>>>>>>>>>>>>>>>>>>>>> experience.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll
> > update
> > >>>> the
> > >>>>>>>>>>>> KIP
> > >>>>>>>>>>>>>>>>>>>>>>>> accordingly.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
> > >>>>>>>>>>>>>>>>>>> yuzhihong@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to
> the
> > >>>> KIP ?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name
> > should
> > >>>> be
> > >>>>>>>>>>>>>>>>>>> specified.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt
> > Farmer <
> > >>>>>>>>>>>>>>>>>>> matt@frmr.me>
> > >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the
> KIP
> > >>>> that I
> > >>>>>>>>>>>> just
> > >>>>>>>>>>>>>>>>>>> filed
> > >>>>>>>>>>>>>>>>>>>>>>>> here:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
> > >>>>>>>>>>>> error+handling++when+Kafka+
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback
> > from
> > >>>>>>>>>>> folks
> > >>>>>>>>>>>>>>>>>> about
> > >>>>>>>>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>>>>>> idea
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can
> > contribute
> > >>>> back.
> > >>>>>>>>>>> :)
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>>>>> -- Guozhang
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>>>>>> -- Guozhang
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> --
> > >>>>>>>>>>>>> -- Guozhang
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> --
> > >>>>>>>>>>> -- Guozhang
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>
> > >
> >
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
There is already a vote thread for this KIP. I can bump it so that it’s
towards the top of your inbox.

With regard to your concerns:

1) We do not have the "ProductionExceptionHandler" interface defined in the
wiki page, thought it is sort of clear that it is a one-function interface
with record and exception. Could you add it?


It is defined, it’s just not defined using a code snippet. The KIP reads as
follows:

===

A public interface named ProductionExceptionHandler with a single method,
handle, that has the following signature:

   - ProductionExceptionHandlerResponse handle(ProducerRecord<byte[],
   byte[]> record, Exception exception)


===

If you’d like me to add a code snippet illustrating this that’s simple for
me to do, but it seemed superfluous.

2) A quick question about your example code: where would be the "logger"
object be created?


SLF4J loggers are typically created as a class member in the class. Such as:

private Logger logger = LoggerFactory.getLogger(HelloWorld.class);

I omit that in my implementation examples for brevity.

On December 6, 2017 at 2:14:58 AM, Guozhang Wang (wangguoz@gmail.com) wrote:

Hello Matt,

Thanks for writing up the KIP. I made a pass over it and here is a few
minor comments. I think you can consider starting a voting thread for this
KIP while addressing them.

1) We do not have the "ProductionExceptionHandler" interface defined in the
wiki page, thought it is sort of clear that it is a one-function interface
with record and exception. Could you add it?

2) A quick question about your example code: where would be the "logger"
object be created? Note that the implementation of this interface have to
give a non-param constructor, or as a static field of the class but in that
case you would not be able to log which instance is throwing this error (we
may have multiple producers within a single instance, even within a
thread). Just a reminder to consider in your implementation.


Guozhang

On Tue, Dec 5, 2017 at 3:15 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Thanks a lot for the update! Great write-up! Very clearly explained what
> the change will look like!
>
> Looks good to me. No further comments from my side.
>
>
> -Matthias
>
>
> On 12/5/17 9:14 AM, Matt Farmer wrote:
> > I have updated this KIP accordingly.
> >
> > Can you please take a look and let me know if what I wrote looks
correct
> to
> > you?
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 210+-+Provide+for+custom+error+handling++when+Kafka+
> Streams+fails+to+produce
> >
> > Thanks!
> >
> > Matt
> >
> >
> > On December 4, 2017 at 9:39:13 PM, Matt Farmer (matt@frmr.me) wrote:
> >
> > Hey Matthias, thanks for getting back to me.
> >
> > That's fine. But if we add it to `test` package, we don't need to talk
> > about it in the KIP. `test` is not public API.
> >
> > Yes, that makes sense. It was in the KIP originally because I was, at
one
> > point, planning on including it. We can remove it now that we’ve
decided
> we
> > won’t include it in the public API.
> >
> > Understood. That makes sense. We should explain this clearly in the KIP
> > and maybe log all other following exceptions at DEBUG level?
> >
> >
> > I thought it was clear in the KIP, but I can go back and double check
my
> > wording and revise it to try and make it clearer.
> >
> > I’ll take a look at doing more work on the KIP and the Pull Request
> > tomorrow.
> >
> > Thanks again!
> >
> > On December 4, 2017 at 5:50:33 PM, Matthias J. Sax (
> matthias@confluent.io)
> > wrote:
> >
> > Hey,
> >
> > About your questions:
> >
> >>>> Acknowledged, so is ProducerFencedException the only kind of
> exception I
> >>>> need to change my behavior on? Or are there other types I need to
> > check? Is
> >>>> there a comprehensive list somewhere?
> >
> > I cannot think if any other atm. We should list all fatal exceptions
for
> > which we don't call the handler and explain why (exception is "global"
> > and will affect all other records, too | ProducerFenced is
self-healing).
> >
> > We started to collect and categorize exception here (not completed
yet):
> > https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions
> > :
> >
> > This list should be a good starting point though.
> >
> >> I include it in the test package because I have tests that assert that
> if
> >> the record collector impl encounters an Exception and receives a
> CONTINUE
> >> that it actually does CONTINUE.
> >
> > That's fine. But if we add it to `test` package, we don't need to talk
> > about it in the KIP. `test` is not public API.
> >
> >> I didn't want to invoke the handler in places where the CONTINUE or
FAIL
> >> result would just be ignored. Presumably, after a FAIL has been
> returned,
> >> subsequent exceptions are likely to be repeats or noise from my
> >> understanding of the code paths. If this is incorrect we can revisit.
> >
> > Understood. That makes sense. We should explain this clearly in the KIP
> > and maybe log all other following exceptions at DEBUG level?
> >
> >
> > -Matthias
> >
> >
> > On 12/1/17 11:43 AM, Matt Farmer wrote:
> >> Bump! It's been three days here and I haven't seen any further
feedback.
> >> Eager to get this resolved, approved, and merged. =)
> >>
> >> On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <ma...@frmr.me> wrote:
> >>
> >>> Hi there, sorry for the delay in responding. Last week had a holiday
> and
> >>> several busy work days in it so I'm just now getting around to
> > responding.
> >>>
> >>>> We would only exclude
> >>>> exception Streams can handle itself (like ProducerFencedException)
-- 
> >>>> thus, if the handler has code to react to this, it would not be bad,
> as
> >>>> this code is just never called.
> >>> [...]
> >>>> Thus, I am still in favor of not calling the
> ProductionExceptionHandler
> >>>> for fatal exception.
> >>>
> >>> Acknowledged, so is ProducerFencedException the only kind of
exception
> I
> >>> need to change my behavior on? Or are there other types I need to
> check?
> > Is
> >>> there a comprehensive list somewhere?
> >>>
> >>>> About the "always continue" case. Sounds good to me to remove it
(not
> >>>> sure why we need it in test package?)
> >>>
> >>> I include it in the test package because I have tests that assert
that
> if
> >>> the record collector impl encounters an Exception and receives a
> CONTINUE
> >>> that it actually does CONTINUE.
> >>>
> >>>> What is there reasoning for invoking the handler only for the first
> >>>> exception?
> >>>
> >>> I didn't want to invoke the handler in places where the CONTINUE or
> FAIL
> >>> result would just be ignored. Presumably, after a FAIL has been
> returned,
> >>> subsequent exceptions are likely to be repeats or noise from my
> >>> understanding of the code paths. If this is incorrect we can revisit.
> >>>
> >>> Once I get the answers to these questions I can make another pass on
> the
> >>> pull request!
> >>>
> >>> Matt
> >>>
> >>> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <matthias@confluent.io
> >
> >>> wrote:
> >>>
> >>>> Thanks for following up!
> >>>>
> >>>> One thought about an older reply from you:
> >>>>
> >>>>>>>> I strongly disagree here. The purpose of this handler isn't
*just*
> > to
> >>>>>>>> make a decision for streams. There may also be desirable side
> >>>> effects that
> >>>>>>>> users wish to cause when production exceptions occur. There may
be
> >>>>>>>> side-effects that they wish to cause when
AuthenticationExceptions
> >>>> occur,
> >>>>>>>> as well. We should still give them the hooks to preform those
side
> >>>> effects.
> >>>>
> >>>> And your follow up:
> >>>>
> >>>>>> - I think I would rather invoke it for all exceptions that could
> >>>> occur
> >>>>>> from attempting to produce - even those exceptions were returning
> >>>> CONTINUE
> >>>>>> may not be a good idea (e.g. Authorization exception). Until there
> >>>> is a
> >>>>>> different type for exceptions that are totally fatal (for example
a
> >>>>>> FatalStreamsException or some sort), maintaining a list of
> >>>> exceptions that
> >>>>>> you can intercept with this handler and exceptions you cannot
would
> >>>> be
> >>>>>> really error-prone and hard to keep correct.
> >>>>
> >>>> I understand what you are saying, however, consider that Streams
needs
> >>>> to die for a fatal exception. Thus, if you call the handler for a
> fatal
> >>>> exception, we would need to ignore the return value and fail -- this
> >>>> seems to be rather intuitive. Furthermore, users can register an
> >>>> uncaught-exception-handler and side effects for fatal errors can be
> >>>> triggered there.
> >>>>
> >>>> Btw: an AuthorizationException is fatal -- not sure what you mean by
> an
> >>>> "totally fatal" exception -- there is no superlative to fatal from
my
> >>>> understanding.
> >>>>
> >>>> About maintaining a list of exceptions: I don't think this is too
> hard,
> >>>> and users also don't need to worry about it IMHO. We would only
> exclude
> >>>> exception Streams can handle itself (like ProducerFencedException)
-- 
> >>>> thus, if the handler has code to react to this, it would not be bad,
> as
> >>>> this code is just never called. In case that there is an exception
> >>>> Streams could actually handle but we still call the handler (ie,
bug),
> >>>> there should be no harm either -- the handler needs to return either
> >>>> CONTINUE or FAIL and we would obey. It could only happen, that
Streams
> >>>> dies---as request by the user(!)---even if Streams could actually
> handle
> >>>> the error and move on. But this seems to be not a issue from my
point
> of
> >>>> view.
> >>>>
> >>>> Thus, I am still in favor of not calling the
> ProductionExceptionHandler
> >>>> for fatal exception.
> >>>>
> >>>>
> >>>>
> >>>> About the "always continue" case. Sounds good to me to remove it
(not
> >>>> sure why we need it in test package?) and to rename the "failing"
> >>>> handler to "Default" (even if "default" is less descriptive and I
> would
> >>>> still prefer "Fail" in the name).
> >>>>
> >>>>
> >>>> Last question:
> >>>>
> >>>>>> - Continue to *only* invoke it on the first exception that we
> >>>>>> encounter (before sendException is set)
> >>>>
> >>>>
> >>>> What is there reasoning for invoking the handler only for the first
> >>>> exception?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>> On 11/20/17 10:43 AM, Matt Farmer wrote:
> >>>>> Alright, here are some updates I'm planning to make after thinking
on
> >>>> this
> >>>>> for awhile:
> >>>>>
> >>>>> - Given that the "always continue" handler isn't something I'd
> >>>> recommend
> >>>>> for production use as is, I'm going to move it into the test
> >>>> namespace and
> >>>>> remove it from mention in the public API.
> >>>>> - I'm going to rename the "AlwaysFailProductionExceptionHandler" to
> >>>>> "DefaultProductionExceptionHandler"
> >>>>> - Given that the API for the exception handler involves being
> >>>> invoked by
> >>>>> streams to make a decision about whether or not to continue — I
> >>>> think that
> >>>>> we should:
> >>>>> - Continue to *only* invoke it on the first exception that we
> >>>>> encounter (before sendException is set)
> >>>>> - Stop invoking it for the self-healing fenced exceptions.
> >>>>> - I think I would rather invoke it for all exceptions that could
> >>>> occur
> >>>>> from attempting to produce - even those exceptions were returning
> >>>> CONTINUE
> >>>>> may not be a good idea (e.g. Authorization exception). Until there
> >>>> is a
> >>>>> different type for exceptions that are totally fatal (for example a
> >>>>> FatalStreamsException or some sort), maintaining a list of
> >>>> exceptions that
> >>>>> you can intercept with this handler and exceptions you cannot would
> >>>> be
> >>>>> really error-prone and hard to keep correct.
> >>>>> - I'm happy to file a KIP for the creation of this new Exception
> >>>> type
> >>>>> if there is interest.
> >>>>>
> >>>>> @ Matthias — What do you think about the above?
> >>>>>
> >>>>> On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:
> >>>>>
> >>>>>> I responded before reading your code review and didn't see the bit
> >>>> about
> >>>>>> how ProducerFencedException is self-healing. This error handling
> logic
> >>>> is
> >>>>>> *quite* confusing to reason about... I think I'm going to sit down
> > with
> >>>>>> the code a bit more today, but I'm inclined to think that if the
> > fenced
> >>>>>> exceptions are, indeed, self healing that we still invoke the
> handler
> >>>> but
> >>>>>> ignore its result for those exceptions.
> >>>>>>
> >>>>>> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me> wrote:
> >>>>>>
> >>>>>>> Hi there,
> >>>>>>>
> >>>>>>> Following up here...
> >>>>>>>
> >>>>>>>> One tiny comment: I would prefer to remove the "Always" from the
> >>>>>>> handler implementation names -- it sounds "cleaner" to me without
> it.
> >>>>>>>
> >>>>>>> Let me think on this. I generally prefer expressiveness to
> > clean-ness,
> >>>>>>> and I think that comes out in the names I chose for things. The
> straw
> >>>> man
> >>>>>>> in my head is the person that tried to substitute in the
> >>>> "AlwaysContinue"
> >>>>>>> variant and the "Always" is a trigger to really consider whether
or
> >>>> not
> >>>>>>> they *always* want to try to continue.
> >>>>>>>
> >>>>>>> To be truthful, after some thought, using the "AlwaysContinue"
> > variant
> >>>>>>> isn't something I'd recommend generally in a production system.
> >>>> Ideally
> >>>>>>> these handlers are targeted at handling a specific series of
> >>>> exceptions
> >>>>>>> that a user wants to ignore, and not ignoring all exceptions.
More
> on
> >>>> this
> >>>>>>> in a minute.
> >>>>>>>
> >>>>>>>> For the first category, it seems to not make sense to call the
> >>>> handle but
> >>>>>>> Streams should always fail. If we follow this design, the KIP
> should
> >>>> list
> >>>>>>> all exceptions for which the handler is not called.
> >>>>>>>
> >>>>>>> I strongly disagree here. The purpose of this handler isn't
*just*
> to
> >>>>>>> make a decision for streams. There may also be desirable side
> effects
> >>>> that
> >>>>>>> users wish to cause when production exceptions occur. There may
be
> >>>>>>> side-effects that they wish to cause when
AuthenticationExceptions
> >>>> occur,
> >>>>>>> as well. We should still give them the hooks to preform those
side
> >>>> effects.
> >>>>>>>
> >>>>>>> In light of the above, I'm thinking that the
> >>>>>>> "AlwaysContinueProductionExceptionHandler" variant should
> probably be
> >>>>>>> removed from the public API and moved into tests since that's
where
> >>>> it's
> >>>>>>> most useful. The more I think on it, the more I feel that having
> that
> >>>> in
> >>>>>>> the public API is a landmine. If you agree, then, we can rename
the
> >>>>>>> "AlwaysFailProductionExceptionHandler" to
> >>>>>>> "DefaultProductionExceptionHandler".
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>>
> >>>>>>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <
> >>>> matthias@confluent.io>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I just review the PR, and there is one thing we should discuss.
> >>>>>>>>
> >>>>>>>> There are different types of exceptions that could occur. Some
> apply
> >>>> to
> >>>>>>>> all records (like Authorization exception) while others are for
> >>>> single
> >>>>>>>> records only (like record too large).
> >>>>>>>>
> >>>>>>>> For the first category, it seems to not make sense to call the
> > handle
> >>>>>>>> but Streams should always fail. If we follow this design, the
KIP
> >>>> should
> >>>>>>>> list all exceptions for which the handler is not called.
> >>>>>>>>
> >>>>>>>> WDYT?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> -Matthias
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
> >>>>>>>>> Just catching up on this KIP.
> >>>>>>>>>
> >>>>>>>>> One tiny comment: I would prefer to remove the "Always" from
the
> >>>>>>>> handler
> >>>>>>>>> implementation names -- it sounds "cleaner" to me without it.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>> On 11/5/17 12:57 PM, Matt Farmer wrote:
> >>>>>>>>>> It is agreed, then. I've updated the pull request. I'm trying
to
> >>>> also
> >>>>>>>>>> update the KIP accordingly, but cwiki is being slow and
dropping
> >>>>>>>>>> connections..... I'll try again a bit later but please
consider
> > the
> >>>>>>>> KIP
> >>>>>>>>>> updated for all intents and purposes. heh.
> >>>>>>>>>>
> >>>>>>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <
> wangguoz@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> That makes sense.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Guozhang
> >>>>>>>>>>>
> >>>>>>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me>
> >>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Interesting. I'm not sure I agree. I've been bitten many
times
> > by
> >>>>>>>>>>>> unintentionally shipping code that fails to properly
implement
> >>>>>>>> logging. I
> >>>>>>>>>>>> always discover this at the exact *worst* moment, too.
> (Normally
> >>>> at
> >>>>>>>> 3 AM
> >>>>>>>>>>>> during an on-call shift. Hah.) However, if others feel the
> same
> >>>> way
> >>>>>>>> I
> >>>>>>>>>>> could
> >>>>>>>>>>>> probably be convinced to remove it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> We could also meet halfway and say that when a customized
> >>>>>>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE, we
> log
> >>>> at
> >>>>>>>> DEBUG
> >>>>>>>>>>>> level instead of WARN level. Would that alternative be
> appealing
> >>>> to
> >>>>>>>> you?
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <
> >>>> wangguoz@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks for the updates. I made a pass over the wiki again
and
> > it
> >>>>>>>> looks
> >>>>>>>>>>>>> good.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> About whether record collector should still internally log
> the
> >>>>>>>> error in
> >>>>>>>>>>>>> addition to what the customized ProductionExceptionHandler
> >>>> does. I
> >>>>>>>>>>>>> personally would prefer only to log if the returned value
is
> >>>> FAIL
> >>>>>>>> to
> >>>>>>>>>>>>> indicate that this thread is going to shutdown and trigger
> the
> >>>>>>>>>>> exception
> >>>>>>>>>>>>> handler.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me>
> >>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updated
> this
> >>>>>>>> KIP as
> >>>>>>>>>>>>>> outlined above. The updated KIP is now ready for review
> again!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me>
> >>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Ah. I actually created both of those in the PR and forgot
> to
> >>>>>>>>>>> mention
> >>>>>>>>>>>>> them
> >>>>>>>>>>>>>>> by name in the KIP! Thanks for pointing out the
oversight.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I’ll revise the KIP this afternoon accordingly.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The logging is actually provided for in the record
> collector.
> >>>>>>>>>>>> Whenever
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>>>> handler continues it’ll log a warning to ensure that it’s
> >>>>>>>>>>>> *impossible*
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> write a handler that totally suppresses production
> exceptions
> >>>>>>>> from
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> log.
> >>>>>>>>>>>>>>> As such, the default continue handler just returns the
> >>>> continue
> >>>>>>>>>>>> value.
> >>>>>>>>>>>>> I
> >>>>>>>>>>>>>>> can add details about those semantics to the KIP as well.
> >>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
> >>>>>>>>>>>> matthias@confluent.io
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> One more comment.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> You mention a default implementation for the handler
that
> >>>>>>>> fails. I
> >>>>>>>>>>>>>>>> think, this should be part of the public API and thus
> should
> >>>>>>>> have
> >>>>>>>>>>> a
> >>>>>>>>>>>>>>>> proper defined named that is mentioned in the KIP.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> We could also add a second implementation for the
> >>>>>>>> log-and-move-on
> >>>>>>>>>>>>>>>> strategy, as both are the two most common cases. This
> class
> >>>>>>>> should
> >>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>> be part of public API (so users can just set in the
> config)
> >>>>>>>> with a
> >>>>>>>>>>>>>>>> proper name.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
> >>>>>>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are
> >>>> compatible
> >>>>>>>>>>>> with
> >>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>> PR, but there will be a merge conflict that happens
> > whenever
> >>>>>>>> one
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> PRs
> >>>>>>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR if
> 4148
> >>>> goes
> >>>>>>>>>>> in
> >>>>>>>>>>>>>>>> first. :)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
> >>>>>>>>>>> wangguoz@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> That sounds reasonable, thanks Matt.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> As for the implementation, please note that there is
> >>>> another
> >>>>>>>>>>>>> ongoing
> >>>>>>>>>>>>>> PR
> >>>>>>>>>>>>>>>>>> that may touch the same classes that you are working
on:
> >>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> So it may help if you can also take a look at that PR
> and
> >>>> see
> >>>>>>>>>>> if
> >>>>>>>>>>>> it
> >>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>> compatible with your changes.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <
> >>>> matt@frmr.me>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I've opened this pull request to implement the KIP as
> >>>>>>>>>>> currently
> >>>>>>>>>>>>>>>> written:
> >>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still
> needs
> >>>>>>>>>>> some
> >>>>>>>>>>>>>> tests
> >>>>>>>>>>>>>>>>>>> added,
> >>>>>>>>>>>>>>>>>>> but largely represents the shape I was going for.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> If there are more points that folks would like to
> > discuss,
> >>>>>>>>>>>> please
> >>>>>>>>>>>>>> let
> >>>>>>>>>>>>>>>> me
> >>>>>>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon
> I'll
> >>>>>>>>>>>> probably
> >>>>>>>>>>>>>>>> start
> >>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>> [VOTE] thread.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>> Matt
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <
> matt@frmr.me
> >>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I can’t think of a reason that would be problematic.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Most of the time I would write a handler like this,
I
> >>>> either
> >>>>>>>>>>>> want
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> ignore the error or fail and bring everything down
so
> >>>> that I
> >>>>>>>>>>>> can
> >>>>>>>>>>>>>> spin
> >>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>> back up later and resume from earlier offsets. When
we
> >>>> start
> >>>>>>>>>>> up
> >>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>> crashing we’ll eventually try to process the message
> we
> >>>>>>>>>>> failed
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> produce
> >>>>>>>>>>>>>>>>>>>> again.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I’m concerned that “putting in a queue for later”
> opens
> >>>> you
> >>>>>>>>>>> up
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> putting
> >>>>>>>>>>>>>>>>>>>> messages into the destination topic in an unexpected
> >>>> order.
> >>>>>>>>>>>>> However
> >>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>>> others feel differently, I’m happy to talk about it.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
> >>>>>>>>>>>>> wangguoz@gmail.com>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my
understanding
> > is
> >>>>>>>>>>> that
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
> while
> >>>>>>>>>>> trying
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> produce.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> That is right. Thanks.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I looked at the example code, and one thing I
> realized
> >>>> that
> >>>>>>>>>>>>> since
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>> not passing the context in the handle function, we
> may
> >>>> not
> >>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>> implement
> >>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> logic to send the fail records into another queue
for
> >>>>>>>> future
> >>>>>>>>>>>>>>>>>> processing.
> >>>>>>>>>>>>>>>>>>>>> Would people think that would be a big issue?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
> >>>>>>>> matt@frmr.me
> >>>>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hello all,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation,
and
> >>>> made
> >>>>>>>>>>> it
> >>>>>>>>>>>> so
> >>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> its
> >>>>>>>>>>>>>>>>>>>>>> interface, config setting, and parameters line up
> more
> >>>>>>>>>>>> closely
> >>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I believe there are a few specific questions I
need
> to
> >>>>>>>>>>> reply
> >>>>>>>>>>>>>>>> to.....
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters
are
> >>>>>>>> around
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or
> be
> >>>>>>>>>>>> generics
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> >>>>>>>>>>>>> `ProducerRecord<?
> >>>>>>>>>>>>>>>>>>>>> extends
> >>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that
this
> >>>> is a
> >>>>>>>>>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics
> would
> >>>> just
> >>>>>>>>>>>> make
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>> harder
> >>>>>>>>>>>>>>>>>>>>>> to work with the key and value.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
> >>>>>>>>>>>> `RecordMetadata`
> >>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>> well
> >>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>> case it is not null?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my
understanding
> > is
> >>>>>>>>>>> that
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
> while
> >>>>>>>>>>> trying
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> produce.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
> >>>> following
> >>>>>>>>>>>>>> handling
> >>>>>>>>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know
> what
> >>>> you
> >>>>>>>>>>>>> think.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>> Matt
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <
> >>>> matt@frmr.me
> >>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference
right
> >>>> now
> >>>>>>>>>>> and
> >>>>>>>>>>>> am
> >>>>>>>>>>>>>>>>>>>>> planning
> >>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>> updating the KIP again with details from this
> >>>>>>>> conversation
> >>>>>>>>>>>>> later
> >>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>> week.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
> >>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
> >>>>>>>>>>>>>> wangguoz@gmail.com
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Regarding the handle interface of
> >>>>>>>>>>>>>> ProductionExceptionHandlerResp
> >>>>>>>>>>>>>>>>>>> onse,
> >>>>>>>>>>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the
> actual
> >>>>>>>>>>> added
> >>>>>>>>>>>>>> config
> >>>>>>>>>>>>>>>>>>>>> names
> >>>>>>>>>>>>>>>>>>>>>>>> (e.g. what
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>> 161%3A+streams+
> >>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
> >>>>>>>>>>>>>>>>>>>>>>>> described).
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters
> are
> >>>>>>>>>>> around
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`,
or
> be
> >>>>>>>>>>>> generics
> >>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> >>>>>>>>>>>>> `ProducerRecord<?
> >>>>>>>>>>>>>>>>>>>>> extends
> >>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
> >>>>>>>>>>>> `RecordMetadata`
> >>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>> well in
> >>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
> >>>> following
> >>>>>>>>>>>>>> handling
> >>>>>>>>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1)
> throw
> >>>>>>>>>>>> exception
> >>>>>>>>>>>>>>>>>>>>>> immediately
> >>>>>>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the
error
> > and
> >>>>>>>>>>> drop
> >>>>>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a
> specific
> >>>>>>>>>>> "error"
> >>>>>>>>>>>>>> Kafka
> >>>>>>>>>>>>>>>>>>>>> topic,
> >>>>>>>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>> record it as an app-level metrics (
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>> https://kafka.apache.org/documentation/#kafka_streams_
> >>>>>>>>>>>>>> monitoring
> >>>>>>>>>>>>>>>>>> )
> >>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>> monitoring.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
> >>>>>>>>>>> matt@frmr.me
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I did some more digging tonight.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler
> > uses
> >>>>>>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for
> the
> >>>>>>>>>>>> config
> >>>>>>>>>>>>>>>>>>> name. No
> >>>>>>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this
> >>>> should
> >>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler".
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
> >>>>>>>>>>> matt@frmr.me
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized
record
> is
> >>>>>>>>>>>>> possible,
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> changing
> >>>>>>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly
doable.
> >>>>>>>>>>> However,
> >>>>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>>>> we're
> >>>>>>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have
> no
> >>>>>>>>>>>> guarantee
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the
> time
> >>>> this
> >>>>>>>>>>>>>>>>>>> particular
> >>>>>>>>>>>>>>>>>>>>>>>> handler
> >>>>>>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
> >>>>>>>>>>> something
> >>>>>>>>>>>>>>>>>> like:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse
handle(final
> >>>>>>>>>>>>>>>>>>>>> ProducerRecord<..>
> >>>>>>>>>>>>>>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>>>>>>>>>>>> final Exception exception)
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Would this be acceptable?
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
> >>>>>>>>>>>> matt@frmr.me>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming
> and
> >>>>>>>>>>> return
> >>>>>>>>>>>>> type
> >>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>> I’m
> >>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and
> the
> >>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>>> itself
> >>>>>>>>>>>>>>>>>>>>>>>> without
> >>>>>>>>>>>>>>>>>>>>>>>>>>> other changes.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J.
> Sax <
> >>>>>>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>> 161%3A+streams+
> >>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design
> (parameter
> >>>>>>>>>>> naming,
> >>>>>>>>>>>>>>>>>>> return
> >>>>>>>>>>>>>>>>>>>>>>>> types,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to
get
> a
> >>>>>>>>>>> unified
> >>>>>>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>>>>>>>> experience.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll
> update
> >>>> the
> >>>>>>>>>>>> KIP
> >>>>>>>>>>>>>>>>>>>>>>>> accordingly.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
> >>>>>>>>>>>>>>>>>>> yuzhihong@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to
the
> >>>> KIP ?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name
> should
> >>>> be
> >>>>>>>>>>>>>>>>>>> specified.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt
> Farmer <
> >>>>>>>>>>>>>>>>>>> matt@frmr.me>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP
> >>>> that I
> >>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>> filed
> >>>>>>>>>>>>>>>>>>>>>>>> here:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
> >>>>>>>>>>>> error+handling++when+Kafka+
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback
> from
> >>>>>>>>>>> folks
> >>>>>>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>> idea
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can
> contribute
> >>>> back.
> >>>>>>>>>>> :)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>>>
> >>
> >
>
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Matt,

Thanks for writing up the KIP. I made a pass over it and here is a few
minor comments. I think you can consider starting a voting thread for this
KIP while addressing them.

1) We do not have the "ProductionExceptionHandler" interface defined in the
wiki page, thought it is sort of clear that it is a one-function interface
with record and exception. Could you add it?

2) A quick question about your example code: where would be the "logger"
object be created? Note that the implementation of this interface have to
give a non-param constructor, or as a static field of the class but in that
case you would not be able to log which instance is throwing this error (we
may have multiple producers within a single instance, even within a
thread). Just a reminder to consider in your implementation.


Guozhang

On Tue, Dec 5, 2017 at 3:15 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Thanks a lot for the update! Great write-up! Very clearly explained what
> the change will look like!
>
> Looks good to me. No further comments from my side.
>
>
> -Matthias
>
>
> On 12/5/17 9:14 AM, Matt Farmer wrote:
> > I have updated this KIP accordingly.
> >
> > Can you please take a look and let me know if what I wrote looks correct
> to
> > you?
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 210+-+Provide+for+custom+error+handling++when+Kafka+
> Streams+fails+to+produce
> >
> > Thanks!
> >
> > Matt
> >
> >
> > On December 4, 2017 at 9:39:13 PM, Matt Farmer (matt@frmr.me) wrote:
> >
> > Hey Matthias, thanks for getting back to me.
> >
> > That's fine. But if we add it to `test` package, we don't need to talk
> > about it in the KIP. `test` is not public API.
> >
> > Yes, that makes sense. It was in the KIP originally because I was, at one
> > point, planning on including it. We can remove it now that we’ve decided
> we
> > won’t include it in the public API.
> >
> > Understood. That makes sense. We should explain this clearly in the KIP
> > and maybe log all other following exceptions at DEBUG level?
> >
> >
> > I thought it was clear in the KIP, but I can go back and double check my
> > wording and revise it to try and make it clearer.
> >
> > I’ll take a look at doing more work on the KIP and the Pull Request
> > tomorrow.
> >
> > Thanks again!
> >
> > On December 4, 2017 at 5:50:33 PM, Matthias J. Sax (
> matthias@confluent.io)
> > wrote:
> >
> > Hey,
> >
> > About your questions:
> >
> >>>> Acknowledged, so is ProducerFencedException the only kind of
> exception I
> >>>> need to change my behavior on? Or are there other types I need to
> > check? Is
> >>>> there a comprehensive list somewhere?
> >
> > I cannot think if any other atm. We should list all fatal exceptions for
> > which we don't call the handler and explain why (exception is "global"
> > and will affect all other records, too | ProducerFenced is self-healing).
> >
> > We started to collect and categorize exception here (not completed yet):
> > https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions
> > :
> >
> > This list should be a good starting point though.
> >
> >> I include it in the test package because I have tests that assert that
> if
> >> the record collector impl encounters an Exception and receives a
> CONTINUE
> >> that it actually does CONTINUE.
> >
> > That's fine. But if we add it to `test` package, we don't need to talk
> > about it in the KIP. `test` is not public API.
> >
> >> I didn't want to invoke the handler in places where the CONTINUE or FAIL
> >> result would just be ignored. Presumably, after a FAIL has been
> returned,
> >> subsequent exceptions are likely to be repeats or noise from my
> >> understanding of the code paths. If this is incorrect we can revisit.
> >
> > Understood. That makes sense. We should explain this clearly in the KIP
> > and maybe log all other following exceptions at DEBUG level?
> >
> >
> > -Matthias
> >
> >
> > On 12/1/17 11:43 AM, Matt Farmer wrote:
> >> Bump! It's been three days here and I haven't seen any further feedback.
> >> Eager to get this resolved, approved, and merged. =)
> >>
> >> On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <ma...@frmr.me> wrote:
> >>
> >>> Hi there, sorry for the delay in responding. Last week had a holiday
> and
> >>> several busy work days in it so I'm just now getting around to
> > responding.
> >>>
> >>>> We would only exclude
> >>>> exception Streams can handle itself (like ProducerFencedException) --
> >>>> thus, if the handler has code to react to this, it would not be bad,
> as
> >>>> this code is just never called.
> >>> [...]
> >>>> Thus, I am still in favor of not calling the
> ProductionExceptionHandler
> >>>> for fatal exception.
> >>>
> >>> Acknowledged, so is ProducerFencedException the only kind of exception
> I
> >>> need to change my behavior on? Or are there other types I need to
> check?
> > Is
> >>> there a comprehensive list somewhere?
> >>>
> >>>> About the "always continue" case. Sounds good to me to remove it (not
> >>>> sure why we need it in test package?)
> >>>
> >>> I include it in the test package because I have tests that assert that
> if
> >>> the record collector impl encounters an Exception and receives a
> CONTINUE
> >>> that it actually does CONTINUE.
> >>>
> >>>> What is there reasoning for invoking the handler only for the first
> >>>> exception?
> >>>
> >>> I didn't want to invoke the handler in places where the CONTINUE or
> FAIL
> >>> result would just be ignored. Presumably, after a FAIL has been
> returned,
> >>> subsequent exceptions are likely to be repeats or noise from my
> >>> understanding of the code paths. If this is incorrect we can revisit.
> >>>
> >>> Once I get the answers to these questions I can make another pass on
> the
> >>> pull request!
> >>>
> >>> Matt
> >>>
> >>> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <matthias@confluent.io
> >
> >>> wrote:
> >>>
> >>>> Thanks for following up!
> >>>>
> >>>> One thought about an older reply from you:
> >>>>
> >>>>>>>> I strongly disagree here. The purpose of this handler isn't *just*
> > to
> >>>>>>>> make a decision for streams. There may also be desirable side
> >>>> effects that
> >>>>>>>> users wish to cause when production exceptions occur. There may be
> >>>>>>>> side-effects that they wish to cause when AuthenticationExceptions
> >>>> occur,
> >>>>>>>> as well. We should still give them the hooks to preform those side
> >>>> effects.
> >>>>
> >>>> And your follow up:
> >>>>
> >>>>>> - I think I would rather invoke it for all exceptions that could
> >>>> occur
> >>>>>> from attempting to produce - even those exceptions were returning
> >>>> CONTINUE
> >>>>>> may not be a good idea (e.g. Authorization exception). Until there
> >>>> is a
> >>>>>> different type for exceptions that are totally fatal (for example a
> >>>>>> FatalStreamsException or some sort), maintaining a list of
> >>>> exceptions that
> >>>>>> you can intercept with this handler and exceptions you cannot would
> >>>> be
> >>>>>> really error-prone and hard to keep correct.
> >>>>
> >>>> I understand what you are saying, however, consider that Streams needs
> >>>> to die for a fatal exception. Thus, if you call the handler for a
> fatal
> >>>> exception, we would need to ignore the return value and fail -- this
> >>>> seems to be rather intuitive. Furthermore, users can register an
> >>>> uncaught-exception-handler and side effects for fatal errors can be
> >>>> triggered there.
> >>>>
> >>>> Btw: an AuthorizationException is fatal -- not sure what you mean by
> an
> >>>> "totally fatal" exception -- there is no superlative to fatal from my
> >>>> understanding.
> >>>>
> >>>> About maintaining a list of exceptions: I don't think this is too
> hard,
> >>>> and users also don't need to worry about it IMHO. We would only
> exclude
> >>>> exception Streams can handle itself (like ProducerFencedException) --
> >>>> thus, if the handler has code to react to this, it would not be bad,
> as
> >>>> this code is just never called. In case that there is an exception
> >>>> Streams could actually handle but we still call the handler (ie, bug),
> >>>> there should be no harm either -- the handler needs to return either
> >>>> CONTINUE or FAIL and we would obey. It could only happen, that Streams
> >>>> dies---as request by the user(!)---even if Streams could actually
> handle
> >>>> the error and move on. But this seems to be not a issue from my point
> of
> >>>> view.
> >>>>
> >>>> Thus, I am still in favor of not calling the
> ProductionExceptionHandler
> >>>> for fatal exception.
> >>>>
> >>>>
> >>>>
> >>>> About the "always continue" case. Sounds good to me to remove it (not
> >>>> sure why we need it in test package?) and to rename the "failing"
> >>>> handler to "Default" (even if "default" is less descriptive and I
> would
> >>>> still prefer "Fail" in the name).
> >>>>
> >>>>
> >>>> Last question:
> >>>>
> >>>>>> - Continue to *only* invoke it on the first exception that we
> >>>>>> encounter (before sendException is set)
> >>>>
> >>>>
> >>>> What is there reasoning for invoking the handler only for the first
> >>>> exception?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>> On 11/20/17 10:43 AM, Matt Farmer wrote:
> >>>>> Alright, here are some updates I'm planning to make after thinking on
> >>>> this
> >>>>> for awhile:
> >>>>>
> >>>>> - Given that the "always continue" handler isn't something I'd
> >>>> recommend
> >>>>> for production use as is, I'm going to move it into the test
> >>>> namespace and
> >>>>> remove it from mention in the public API.
> >>>>> - I'm going to rename the "AlwaysFailProductionExceptionHandler" to
> >>>>> "DefaultProductionExceptionHandler"
> >>>>> - Given that the API for the exception handler involves being
> >>>> invoked by
> >>>>> streams to make a decision about whether or not to continue — I
> >>>> think that
> >>>>> we should:
> >>>>> - Continue to *only* invoke it on the first exception that we
> >>>>> encounter (before sendException is set)
> >>>>> - Stop invoking it for the self-healing fenced exceptions.
> >>>>> - I think I would rather invoke it for all exceptions that could
> >>>> occur
> >>>>> from attempting to produce - even those exceptions were returning
> >>>> CONTINUE
> >>>>> may not be a good idea (e.g. Authorization exception). Until there
> >>>> is a
> >>>>> different type for exceptions that are totally fatal (for example a
> >>>>> FatalStreamsException or some sort), maintaining a list of
> >>>> exceptions that
> >>>>> you can intercept with this handler and exceptions you cannot would
> >>>> be
> >>>>> really error-prone and hard to keep correct.
> >>>>> - I'm happy to file a KIP for the creation of this new Exception
> >>>> type
> >>>>> if there is interest.
> >>>>>
> >>>>> @ Matthias — What do you think about the above?
> >>>>>
> >>>>> On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:
> >>>>>
> >>>>>> I responded before reading your code review and didn't see the bit
> >>>> about
> >>>>>> how ProducerFencedException is self-healing. This error handling
> logic
> >>>> is
> >>>>>> *quite* confusing to reason about... I think I'm going to sit down
> > with
> >>>>>> the code a bit more today, but I'm inclined to think that if the
> > fenced
> >>>>>> exceptions are, indeed, self healing that we still invoke the
> handler
> >>>> but
> >>>>>> ignore its result for those exceptions.
> >>>>>>
> >>>>>> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me> wrote:
> >>>>>>
> >>>>>>> Hi there,
> >>>>>>>
> >>>>>>> Following up here...
> >>>>>>>
> >>>>>>>> One tiny comment: I would prefer to remove the "Always" from the
> >>>>>>> handler implementation names -- it sounds "cleaner" to me without
> it.
> >>>>>>>
> >>>>>>> Let me think on this. I generally prefer expressiveness to
> > clean-ness,
> >>>>>>> and I think that comes out in the names I chose for things. The
> straw
> >>>> man
> >>>>>>> in my head is the person that tried to substitute in the
> >>>> "AlwaysContinue"
> >>>>>>> variant and the "Always" is a trigger to really consider whether or
> >>>> not
> >>>>>>> they *always* want to try to continue.
> >>>>>>>
> >>>>>>> To be truthful, after some thought, using the "AlwaysContinue"
> > variant
> >>>>>>> isn't something I'd recommend generally in a production system.
> >>>> Ideally
> >>>>>>> these handlers are targeted at handling a specific series of
> >>>> exceptions
> >>>>>>> that a user wants to ignore, and not ignoring all exceptions. More
> on
> >>>> this
> >>>>>>> in a minute.
> >>>>>>>
> >>>>>>>> For the first category, it seems to not make sense to call the
> >>>> handle but
> >>>>>>> Streams should always fail. If we follow this design, the KIP
> should
> >>>> list
> >>>>>>> all exceptions for which the handler is not called.
> >>>>>>>
> >>>>>>> I strongly disagree here. The purpose of this handler isn't *just*
> to
> >>>>>>> make a decision for streams. There may also be desirable side
> effects
> >>>> that
> >>>>>>> users wish to cause when production exceptions occur. There may be
> >>>>>>> side-effects that they wish to cause when AuthenticationExceptions
> >>>> occur,
> >>>>>>> as well. We should still give them the hooks to preform those side
> >>>> effects.
> >>>>>>>
> >>>>>>> In light of the above, I'm thinking that the
> >>>>>>> "AlwaysContinueProductionExceptionHandler" variant should
> probably be
> >>>>>>> removed from the public API and moved into tests since that's where
> >>>> it's
> >>>>>>> most useful. The more I think on it, the more I feel that having
> that
> >>>> in
> >>>>>>> the public API is a landmine. If you agree, then, we can rename the
> >>>>>>> "AlwaysFailProductionExceptionHandler" to
> >>>>>>> "DefaultProductionExceptionHandler".
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>>
> >>>>>>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <
> >>>> matthias@confluent.io>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I just review the PR, and there is one thing we should discuss.
> >>>>>>>>
> >>>>>>>> There are different types of exceptions that could occur. Some
> apply
> >>>> to
> >>>>>>>> all records (like Authorization exception) while others are for
> >>>> single
> >>>>>>>> records only (like record too large).
> >>>>>>>>
> >>>>>>>> For the first category, it seems to not make sense to call the
> > handle
> >>>>>>>> but Streams should always fail. If we follow this design, the KIP
> >>>> should
> >>>>>>>> list all exceptions for which the handler is not called.
> >>>>>>>>
> >>>>>>>> WDYT?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> -Matthias
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
> >>>>>>>>> Just catching up on this KIP.
> >>>>>>>>>
> >>>>>>>>> One tiny comment: I would prefer to remove the "Always" from the
> >>>>>>>> handler
> >>>>>>>>> implementation names -- it sounds "cleaner" to me without it.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>> On 11/5/17 12:57 PM, Matt Farmer wrote:
> >>>>>>>>>> It is agreed, then. I've updated the pull request. I'm trying to
> >>>> also
> >>>>>>>>>> update the KIP accordingly, but cwiki is being slow and dropping
> >>>>>>>>>> connections..... I'll try again a bit later but please consider
> > the
> >>>>>>>> KIP
> >>>>>>>>>> updated for all intents and purposes. heh.
> >>>>>>>>>>
> >>>>>>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <
> wangguoz@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> That makes sense.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Guozhang
> >>>>>>>>>>>
> >>>>>>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me>
> >>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Interesting. I'm not sure I agree. I've been bitten many times
> > by
> >>>>>>>>>>>> unintentionally shipping code that fails to properly implement
> >>>>>>>> logging. I
> >>>>>>>>>>>> always discover this at the exact *worst* moment, too.
> (Normally
> >>>> at
> >>>>>>>> 3 AM
> >>>>>>>>>>>> during an on-call shift. Hah.) However, if others feel the
> same
> >>>> way
> >>>>>>>> I
> >>>>>>>>>>> could
> >>>>>>>>>>>> probably be convinced to remove it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> We could also meet halfway and say that when a customized
> >>>>>>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE, we
> log
> >>>> at
> >>>>>>>> DEBUG
> >>>>>>>>>>>> level instead of WARN level. Would that alternative be
> appealing
> >>>> to
> >>>>>>>> you?
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <
> >>>> wangguoz@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks for the updates. I made a pass over the wiki again and
> > it
> >>>>>>>> looks
> >>>>>>>>>>>>> good.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> About whether record collector should still internally log
> the
> >>>>>>>> error in
> >>>>>>>>>>>>> addition to what the customized ProductionExceptionHandler
> >>>> does. I
> >>>>>>>>>>>>> personally would prefer only to log if the returned value is
> >>>> FAIL
> >>>>>>>> to
> >>>>>>>>>>>>> indicate that this thread is going to shutdown and trigger
> the
> >>>>>>>>>>> exception
> >>>>>>>>>>>>> handler.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me>
> >>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updated
> this
> >>>>>>>> KIP as
> >>>>>>>>>>>>>> outlined above. The updated KIP is now ready for review
> again!
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me>
> >>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Ah. I actually created both of those in the PR and forgot
> to
> >>>>>>>>>>> mention
> >>>>>>>>>>>>> them
> >>>>>>>>>>>>>>> by name in the KIP! Thanks for pointing out the oversight.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I’ll revise the KIP this afternoon accordingly.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The logging is actually provided for in the record
> collector.
> >>>>>>>>>>>> Whenever
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>>>> handler continues it’ll log a warning to ensure that it’s
> >>>>>>>>>>>> *impossible*
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> write a handler that totally suppresses production
> exceptions
> >>>>>>>> from
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>> log.
> >>>>>>>>>>>>>>> As such, the default continue handler just returns the
> >>>> continue
> >>>>>>>>>>>> value.
> >>>>>>>>>>>>> I
> >>>>>>>>>>>>>>> can add details about those semantics to the KIP as well.
> >>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
> >>>>>>>>>>>> matthias@confluent.io
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> One more comment.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> You mention a default implementation for the handler that
> >>>>>>>> fails. I
> >>>>>>>>>>>>>>>> think, this should be part of the public API and thus
> should
> >>>>>>>> have
> >>>>>>>>>>> a
> >>>>>>>>>>>>>>>> proper defined named that is mentioned in the KIP.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> We could also add a second implementation for the
> >>>>>>>> log-and-move-on
> >>>>>>>>>>>>>>>> strategy, as both are the two most common cases. This
> class
> >>>>>>>> should
> >>>>>>>>>>>>> also
> >>>>>>>>>>>>>>>> be part of public API (so users can just set in the
> config)
> >>>>>>>> with a
> >>>>>>>>>>>>>>>> proper name.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
> >>>>>>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are
> >>>> compatible
> >>>>>>>>>>>> with
> >>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>> PR, but there will be a merge conflict that happens
> > whenever
> >>>>>>>> one
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> PRs
> >>>>>>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR if
> 4148
> >>>> goes
> >>>>>>>>>>> in
> >>>>>>>>>>>>>>>> first. :)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
> >>>>>>>>>>> wangguoz@gmail.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> That sounds reasonable, thanks Matt.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> As for the implementation, please note that there is
> >>>> another
> >>>>>>>>>>>>> ongoing
> >>>>>>>>>>>>>> PR
> >>>>>>>>>>>>>>>>>> that may touch the same classes that you are working on:
> >>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> So it may help if you can also take a look at that PR
> and
> >>>> see
> >>>>>>>>>>> if
> >>>>>>>>>>>> it
> >>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>> compatible with your changes.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <
> >>>> matt@frmr.me>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I've opened this pull request to implement the KIP as
> >>>>>>>>>>> currently
> >>>>>>>>>>>>>>>> written:
> >>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still
> needs
> >>>>>>>>>>> some
> >>>>>>>>>>>>>> tests
> >>>>>>>>>>>>>>>>>>> added,
> >>>>>>>>>>>>>>>>>>> but largely represents the shape I was going for.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> If there are more points that folks would like to
> > discuss,
> >>>>>>>>>>>> please
> >>>>>>>>>>>>>> let
> >>>>>>>>>>>>>>>> me
> >>>>>>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon
> I'll
> >>>>>>>>>>>> probably
> >>>>>>>>>>>>>>>> start
> >>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>> [VOTE] thread.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>> Matt
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <
> matt@frmr.me
> >>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I can’t think of a reason that would be problematic.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Most of the time I would write a handler like this, I
> >>>> either
> >>>>>>>>>>>> want
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> ignore the error or fail and bring everything down so
> >>>> that I
> >>>>>>>>>>>> can
> >>>>>>>>>>>>>> spin
> >>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>> back up later and resume from earlier offsets. When we
> >>>> start
> >>>>>>>>>>> up
> >>>>>>>>>>>>>> after
> >>>>>>>>>>>>>>>>>>>> crashing we’ll eventually try to process the message
> we
> >>>>>>>>>>> failed
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> produce
> >>>>>>>>>>>>>>>>>>>> again.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I’m concerned that “putting in a queue for later”
> opens
> >>>> you
> >>>>>>>>>>> up
> >>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> putting
> >>>>>>>>>>>>>>>>>>>> messages into the destination topic in an unexpected
> >>>> order.
> >>>>>>>>>>>>> However
> >>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>>>> others feel differently, I’m happy to talk about it.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
> >>>>>>>>>>>>> wangguoz@gmail.com>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding
> > is
> >>>>>>>>>>> that
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
> while
> >>>>>>>>>>> trying
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> produce.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> That is right. Thanks.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I looked at the example code, and one thing I
> realized
> >>>> that
> >>>>>>>>>>>>> since
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>>>>>> not passing the context in the handle function, we
> may
> >>>> not
> >>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>> implement
> >>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> logic to send the fail records into another queue for
> >>>>>>>> future
> >>>>>>>>>>>>>>>>>> processing.
> >>>>>>>>>>>>>>>>>>>>> Would people think that would be a big issue?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
> >>>>>>>> matt@frmr.me
> >>>>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hello all,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation, and
> >>>> made
> >>>>>>>>>>> it
> >>>>>>>>>>>> so
> >>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> its
> >>>>>>>>>>>>>>>>>>>>>> interface, config setting, and parameters line up
> more
> >>>>>>>>>>>> closely
> >>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I believe there are a few specific questions I need
> to
> >>>>>>>>>>> reply
> >>>>>>>>>>>>>>>> to.....
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
> >>>>>>>> around
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or
> be
> >>>>>>>>>>>> generics
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> >>>>>>>>>>>>> `ProducerRecord<?
> >>>>>>>>>>>>>>>>>>>>> extends
> >>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that this
> >>>> is a
> >>>>>>>>>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics
> would
> >>>> just
> >>>>>>>>>>>> make
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>>>> harder
> >>>>>>>>>>>>>>>>>>>>>> to work with the key and value.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
> >>>>>>>>>>>> `RecordMetadata`
> >>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>> well
> >>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>> case it is not null?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding
> > is
> >>>>>>>>>>> that
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
> while
> >>>>>>>>>>> trying
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>> produce.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
> >>>> following
> >>>>>>>>>>>>>> handling
> >>>>>>>>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know
> what
> >>>> you
> >>>>>>>>>>>>> think.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>> Matt
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <
> >>>> matt@frmr.me
> >>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right
> >>>> now
> >>>>>>>>>>> and
> >>>>>>>>>>>> am
> >>>>>>>>>>>>>>>>>>>>> planning
> >>>>>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>> updating the KIP again with details from this
> >>>>>>>> conversation
> >>>>>>>>>>>>> later
> >>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>> week.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
> >>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
> >>>>>>>>>>>>>> wangguoz@gmail.com
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Regarding the handle interface of
> >>>>>>>>>>>>>> ProductionExceptionHandlerResp
> >>>>>>>>>>>>>>>>>>> onse,
> >>>>>>>>>>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the
> actual
> >>>>>>>>>>> added
> >>>>>>>>>>>>>> config
> >>>>>>>>>>>>>>>>>>>>> names
> >>>>>>>>>>>>>>>>>>>>>>>> (e.g. what
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>> 161%3A+streams+
> >>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
> >>>>>>>>>>>>>>>>>>>>>>>> described).
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters
> are
> >>>>>>>>>>> around
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or
> be
> >>>>>>>>>>>> generics
> >>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> >>>>>>>>>>>>> `ProducerRecord<?
> >>>>>>>>>>>>>>>>>>>>> extends
> >>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
> >>>>>>>>>>>> `RecordMetadata`
> >>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>> well in
> >>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
> >>>> following
> >>>>>>>>>>>>>> handling
> >>>>>>>>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1)
> throw
> >>>>>>>>>>>> exception
> >>>>>>>>>>>>>>>>>>>>>> immediately
> >>>>>>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error
> > and
> >>>>>>>>>>> drop
> >>>>>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a
> specific
> >>>>>>>>>>> "error"
> >>>>>>>>>>>>>> Kafka
> >>>>>>>>>>>>>>>>>>>>> topic,
> >>>>>>>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>> record it as an app-level metrics (
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>> https://kafka.apache.org/documentation/#kafka_streams_
> >>>>>>>>>>>>>> monitoring
> >>>>>>>>>>>>>>>>>> )
> >>>>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>> monitoring.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
> >>>>>>>>>>> matt@frmr.me
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I did some more digging tonight.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler
> > uses
> >>>>>>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for
> the
> >>>>>>>>>>>> config
> >>>>>>>>>>>>>>>>>>> name. No
> >>>>>>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this
> >>>> should
> >>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler".
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
> >>>>>>>>>>> matt@frmr.me
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized record
> is
> >>>>>>>>>>>>> possible,
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> changing
> >>>>>>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly doable.
> >>>>>>>>>>> However,
> >>>>>>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>>>>>> we're
> >>>>>>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have
> no
> >>>>>>>>>>>> guarantee
> >>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the
> time
> >>>> this
> >>>>>>>>>>>>>>>>>>> particular
> >>>>>>>>>>>>>>>>>>>>>>>> handler
> >>>>>>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
> >>>>>>>>>>> something
> >>>>>>>>>>>>>>>>>> like:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
> >>>>>>>>>>>>>>>>>>>>> ProducerRecord<..>
> >>>>>>>>>>>>>>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>>>>>>>>>>>> final Exception exception)
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Would this be acceptable?
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
> >>>>>>>>>>>> matt@frmr.me>
> >>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming
> and
> >>>>>>>>>>> return
> >>>>>>>>>>>>> type
> >>>>>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>>>>>> I’m
> >>>>>>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and
> the
> >>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>>> itself
> >>>>>>>>>>>>>>>>>>>>>>>> without
> >>>>>>>>>>>>>>>>>>>>>>>>>>> other changes.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J.
> Sax <
> >>>>>>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>> 161%3A+streams+
> >>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design
> (parameter
> >>>>>>>>>>> naming,
> >>>>>>>>>>>>>>>>>>> return
> >>>>>>>>>>>>>>>>>>>>>>>> types,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get
> a
> >>>>>>>>>>> unified
> >>>>>>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>>>>>>>> experience.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll
> update
> >>>> the
> >>>>>>>>>>>> KIP
> >>>>>>>>>>>>>>>>>>>>>>>> accordingly.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
> >>>>>>>>>>>>>>>>>>> yuzhihong@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the
> >>>> KIP ?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name
> should
> >>>> be
> >>>>>>>>>>>>>>>>>>> specified.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt
> Farmer <
> >>>>>>>>>>>>>>>>>>> matt@frmr.me>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP
> >>>> that I
> >>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>> filed
> >>>>>>>>>>>>>>>>>>>>>>>> here:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
> >>>>>>>>>>>> error+handling++when+Kafka+
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback
> from
> >>>>>>>>>>> folks
> >>>>>>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>> idea
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can
> contribute
> >>>> back.
> >>>>>>>>>>> :)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>>>
> >>
> >
>
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Thanks a lot for the update! Great write-up! Very clearly explained what
the change will look like!

Looks good to me. No further comments from my side.


-Matthias


On 12/5/17 9:14 AM, Matt Farmer wrote:
> I have updated this KIP accordingly.
> 
> Can you please take a look and let me know if what I wrote looks correct to
> you?
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce
> 
> Thanks!
> 
> Matt
> 
> 
> On December 4, 2017 at 9:39:13 PM, Matt Farmer (matt@frmr.me) wrote:
> 
> Hey Matthias, thanks for getting back to me.
> 
> That's fine. But if we add it to `test` package, we don't need to talk
> about it in the KIP. `test` is not public API.
> 
> Yes, that makes sense. It was in the KIP originally because I was, at one
> point, planning on including it. We can remove it now that we’ve decided we
> won’t include it in the public API.
> 
> Understood. That makes sense. We should explain this clearly in the KIP
> and maybe log all other following exceptions at DEBUG level?
> 
> 
> I thought it was clear in the KIP, but I can go back and double check my
> wording and revise it to try and make it clearer.
> 
> I’ll take a look at doing more work on the KIP and the Pull Request
> tomorrow.
> 
> Thanks again!
> 
> On December 4, 2017 at 5:50:33 PM, Matthias J. Sax (matthias@confluent.io)
> wrote:
> 
> Hey,
> 
> About your questions:
> 
>>>> Acknowledged, so is ProducerFencedException the only kind of exception I
>>>> need to change my behavior on? Or are there other types I need to
> check? Is
>>>> there a comprehensive list somewhere?
> 
> I cannot think if any other atm. We should list all fatal exceptions for
> which we don't call the handler and explain why (exception is "global"
> and will affect all other records, too | ProducerFenced is self-healing).
> 
> We started to collect and categorize exception here (not completed yet):
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions
> :
> 
> This list should be a good starting point though.
> 
>> I include it in the test package because I have tests that assert that if
>> the record collector impl encounters an Exception and receives a CONTINUE
>> that it actually does CONTINUE.
> 
> That's fine. But if we add it to `test` package, we don't need to talk
> about it in the KIP. `test` is not public API.
> 
>> I didn't want to invoke the handler in places where the CONTINUE or FAIL
>> result would just be ignored. Presumably, after a FAIL has been returned,
>> subsequent exceptions are likely to be repeats or noise from my
>> understanding of the code paths. If this is incorrect we can revisit.
> 
> Understood. That makes sense. We should explain this clearly in the KIP
> and maybe log all other following exceptions at DEBUG level?
> 
> 
> -Matthias
> 
> 
> On 12/1/17 11:43 AM, Matt Farmer wrote:
>> Bump! It's been three days here and I haven't seen any further feedback.
>> Eager to get this resolved, approved, and merged. =)
>>
>> On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <ma...@frmr.me> wrote:
>>
>>> Hi there, sorry for the delay in responding. Last week had a holiday and
>>> several busy work days in it so I'm just now getting around to
> responding.
>>>
>>>> We would only exclude
>>>> exception Streams can handle itself (like ProducerFencedException) --
>>>> thus, if the handler has code to react to this, it would not be bad, as
>>>> this code is just never called.
>>> [...]
>>>> Thus, I am still in favor of not calling the ProductionExceptionHandler
>>>> for fatal exception.
>>>
>>> Acknowledged, so is ProducerFencedException the only kind of exception I
>>> need to change my behavior on? Or are there other types I need to check?
> Is
>>> there a comprehensive list somewhere?
>>>
>>>> About the "always continue" case. Sounds good to me to remove it (not
>>>> sure why we need it in test package?)
>>>
>>> I include it in the test package because I have tests that assert that if
>>> the record collector impl encounters an Exception and receives a CONTINUE
>>> that it actually does CONTINUE.
>>>
>>>> What is there reasoning for invoking the handler only for the first
>>>> exception?
>>>
>>> I didn't want to invoke the handler in places where the CONTINUE or FAIL
>>> result would just be ignored. Presumably, after a FAIL has been returned,
>>> subsequent exceptions are likely to be repeats or noise from my
>>> understanding of the code paths. If this is incorrect we can revisit.
>>>
>>> Once I get the answers to these questions I can make another pass on the
>>> pull request!
>>>
>>> Matt
>>>
>>> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <ma...@confluent.io>
>>> wrote:
>>>
>>>> Thanks for following up!
>>>>
>>>> One thought about an older reply from you:
>>>>
>>>>>>>> I strongly disagree here. The purpose of this handler isn't *just*
> to
>>>>>>>> make a decision for streams. There may also be desirable side
>>>> effects that
>>>>>>>> users wish to cause when production exceptions occur. There may be
>>>>>>>> side-effects that they wish to cause when AuthenticationExceptions
>>>> occur,
>>>>>>>> as well. We should still give them the hooks to preform those side
>>>> effects.
>>>>
>>>> And your follow up:
>>>>
>>>>>> - I think I would rather invoke it for all exceptions that could
>>>> occur
>>>>>> from attempting to produce - even those exceptions were returning
>>>> CONTINUE
>>>>>> may not be a good idea (e.g. Authorization exception). Until there
>>>> is a
>>>>>> different type for exceptions that are totally fatal (for example a
>>>>>> FatalStreamsException or some sort), maintaining a list of
>>>> exceptions that
>>>>>> you can intercept with this handler and exceptions you cannot would
>>>> be
>>>>>> really error-prone and hard to keep correct.
>>>>
>>>> I understand what you are saying, however, consider that Streams needs
>>>> to die for a fatal exception. Thus, if you call the handler for a fatal
>>>> exception, we would need to ignore the return value and fail -- this
>>>> seems to be rather intuitive. Furthermore, users can register an
>>>> uncaught-exception-handler and side effects for fatal errors can be
>>>> triggered there.
>>>>
>>>> Btw: an AuthorizationException is fatal -- not sure what you mean by an
>>>> "totally fatal" exception -- there is no superlative to fatal from my
>>>> understanding.
>>>>
>>>> About maintaining a list of exceptions: I don't think this is too hard,
>>>> and users also don't need to worry about it IMHO. We would only exclude
>>>> exception Streams can handle itself (like ProducerFencedException) --
>>>> thus, if the handler has code to react to this, it would not be bad, as
>>>> this code is just never called. In case that there is an exception
>>>> Streams could actually handle but we still call the handler (ie, bug),
>>>> there should be no harm either -- the handler needs to return either
>>>> CONTINUE or FAIL and we would obey. It could only happen, that Streams
>>>> dies---as request by the user(!)---even if Streams could actually handle
>>>> the error and move on. But this seems to be not a issue from my point of
>>>> view.
>>>>
>>>> Thus, I am still in favor of not calling the ProductionExceptionHandler
>>>> for fatal exception.
>>>>
>>>>
>>>>
>>>> About the "always continue" case. Sounds good to me to remove it (not
>>>> sure why we need it in test package?) and to rename the "failing"
>>>> handler to "Default" (even if "default" is less descriptive and I would
>>>> still prefer "Fail" in the name).
>>>>
>>>>
>>>> Last question:
>>>>
>>>>>> - Continue to *only* invoke it on the first exception that we
>>>>>> encounter (before sendException is set)
>>>>
>>>>
>>>> What is there reasoning for invoking the handler only for the first
>>>> exception?
>>>>
>>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 11/20/17 10:43 AM, Matt Farmer wrote:
>>>>> Alright, here are some updates I'm planning to make after thinking on
>>>> this
>>>>> for awhile:
>>>>>
>>>>> - Given that the "always continue" handler isn't something I'd
>>>> recommend
>>>>> for production use as is, I'm going to move it into the test
>>>> namespace and
>>>>> remove it from mention in the public API.
>>>>> - I'm going to rename the "AlwaysFailProductionExceptionHandler" to
>>>>> "DefaultProductionExceptionHandler"
>>>>> - Given that the API for the exception handler involves being
>>>> invoked by
>>>>> streams to make a decision about whether or not to continue — I
>>>> think that
>>>>> we should:
>>>>> - Continue to *only* invoke it on the first exception that we
>>>>> encounter (before sendException is set)
>>>>> - Stop invoking it for the self-healing fenced exceptions.
>>>>> - I think I would rather invoke it for all exceptions that could
>>>> occur
>>>>> from attempting to produce - even those exceptions were returning
>>>> CONTINUE
>>>>> may not be a good idea (e.g. Authorization exception). Until there
>>>> is a
>>>>> different type for exceptions that are totally fatal (for example a
>>>>> FatalStreamsException or some sort), maintaining a list of
>>>> exceptions that
>>>>> you can intercept with this handler and exceptions you cannot would
>>>> be
>>>>> really error-prone and hard to keep correct.
>>>>> - I'm happy to file a KIP for the creation of this new Exception
>>>> type
>>>>> if there is interest.
>>>>>
>>>>> @ Matthias — What do you think about the above?
>>>>>
>>>>> On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:
>>>>>
>>>>>> I responded before reading your code review and didn't see the bit
>>>> about
>>>>>> how ProducerFencedException is self-healing. This error handling logic
>>>> is
>>>>>> *quite* confusing to reason about... I think I'm going to sit down
> with
>>>>>> the code a bit more today, but I'm inclined to think that if the
> fenced
>>>>>> exceptions are, indeed, self healing that we still invoke the handler
>>>> but
>>>>>> ignore its result for those exceptions.
>>>>>>
>>>>>> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me> wrote:
>>>>>>
>>>>>>> Hi there,
>>>>>>>
>>>>>>> Following up here...
>>>>>>>
>>>>>>>> One tiny comment: I would prefer to remove the "Always" from the
>>>>>>> handler implementation names -- it sounds "cleaner" to me without it.
>>>>>>>
>>>>>>> Let me think on this. I generally prefer expressiveness to
> clean-ness,
>>>>>>> and I think that comes out in the names I chose for things. The straw
>>>> man
>>>>>>> in my head is the person that tried to substitute in the
>>>> "AlwaysContinue"
>>>>>>> variant and the "Always" is a trigger to really consider whether or
>>>> not
>>>>>>> they *always* want to try to continue.
>>>>>>>
>>>>>>> To be truthful, after some thought, using the "AlwaysContinue"
> variant
>>>>>>> isn't something I'd recommend generally in a production system.
>>>> Ideally
>>>>>>> these handlers are targeted at handling a specific series of
>>>> exceptions
>>>>>>> that a user wants to ignore, and not ignoring all exceptions. More on
>>>> this
>>>>>>> in a minute.
>>>>>>>
>>>>>>>> For the first category, it seems to not make sense to call the
>>>> handle but
>>>>>>> Streams should always fail. If we follow this design, the KIP should
>>>> list
>>>>>>> all exceptions for which the handler is not called.
>>>>>>>
>>>>>>> I strongly disagree here. The purpose of this handler isn't *just* to
>>>>>>> make a decision for streams. There may also be desirable side effects
>>>> that
>>>>>>> users wish to cause when production exceptions occur. There may be
>>>>>>> side-effects that they wish to cause when AuthenticationExceptions
>>>> occur,
>>>>>>> as well. We should still give them the hooks to preform those side
>>>> effects.
>>>>>>>
>>>>>>> In light of the above, I'm thinking that the
>>>>>>> "AlwaysContinueProductionExceptionHandler" variant should probably be
>>>>>>> removed from the public API and moved into tests since that's where
>>>> it's
>>>>>>> most useful. The more I think on it, the more I feel that having that
>>>> in
>>>>>>> the public API is a landmine. If you agree, then, we can rename the
>>>>>>> "AlwaysFailProductionExceptionHandler" to
>>>>>>> "DefaultProductionExceptionHandler".
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <
>>>> matthias@confluent.io>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I just review the PR, and there is one thing we should discuss.
>>>>>>>>
>>>>>>>> There are different types of exceptions that could occur. Some apply
>>>> to
>>>>>>>> all records (like Authorization exception) while others are for
>>>> single
>>>>>>>> records only (like record too large).
>>>>>>>>
>>>>>>>> For the first category, it seems to not make sense to call the
> handle
>>>>>>>> but Streams should always fail. If we follow this design, the KIP
>>>> should
>>>>>>>> list all exceptions for which the handler is not called.
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
>>>>>>>>> Just catching up on this KIP.
>>>>>>>>>
>>>>>>>>> One tiny comment: I would prefer to remove the "Always" from the
>>>>>>>> handler
>>>>>>>>> implementation names -- it sounds "cleaner" to me without it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>> On 11/5/17 12:57 PM, Matt Farmer wrote:
>>>>>>>>>> It is agreed, then. I've updated the pull request. I'm trying to
>>>> also
>>>>>>>>>> update the KIP accordingly, but cwiki is being slow and dropping
>>>>>>>>>> connections..... I'll try again a bit later but please consider
> the
>>>>>>>> KIP
>>>>>>>>>> updated for all intents and purposes. heh.
>>>>>>>>>>
>>>>>>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> That makes sense.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Guozhang
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me>
>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Interesting. I'm not sure I agree. I've been bitten many times
> by
>>>>>>>>>>>> unintentionally shipping code that fails to properly implement
>>>>>>>> logging. I
>>>>>>>>>>>> always discover this at the exact *worst* moment, too. (Normally
>>>> at
>>>>>>>> 3 AM
>>>>>>>>>>>> during an on-call shift. Hah.) However, if others feel the same
>>>> way
>>>>>>>> I
>>>>>>>>>>> could
>>>>>>>>>>>> probably be convinced to remove it.
>>>>>>>>>>>>
>>>>>>>>>>>> We could also meet halfway and say that when a customized
>>>>>>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE, we log
>>>> at
>>>>>>>> DEBUG
>>>>>>>>>>>> level instead of WARN level. Would that alternative be appealing
>>>> to
>>>>>>>> you?
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <
>>>> wangguoz@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the updates. I made a pass over the wiki again and
> it
>>>>>>>> looks
>>>>>>>>>>>>> good.
>>>>>>>>>>>>>
>>>>>>>>>>>>> About whether record collector should still internally log the
>>>>>>>> error in
>>>>>>>>>>>>> addition to what the customized ProductionExceptionHandler
>>>> does. I
>>>>>>>>>>>>> personally would prefer only to log if the returned value is
>>>> FAIL
>>>>>>>> to
>>>>>>>>>>>>> indicate that this thread is going to shutdown and trigger the
>>>>>>>>>>> exception
>>>>>>>>>>>>> handler.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me>
>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updated this
>>>>>>>> KIP as
>>>>>>>>>>>>>> outlined above. The updated KIP is now ready for review again!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me>
>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ah. I actually created both of those in the PR and forgot to
>>>>>>>>>>> mention
>>>>>>>>>>>>> them
>>>>>>>>>>>>>>> by name in the KIP! Thanks for pointing out the oversight.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I’ll revise the KIP this afternoon accordingly.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The logging is actually provided for in the record collector.
>>>>>>>>>>>> Whenever
>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> handler continues it’ll log a warning to ensure that it’s
>>>>>>>>>>>> *impossible*
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> write a handler that totally suppresses production exceptions
>>>>>>>> from
>>>>>>>>>>>> the
>>>>>>>>>>>>>> log.
>>>>>>>>>>>>>>> As such, the default continue handler just returns the
>>>> continue
>>>>>>>>>>>> value.
>>>>>>>>>>>>> I
>>>>>>>>>>>>>>> can add details about those semantics to the KIP as well.
>>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
>>>>>>>>>>>> matthias@confluent.io
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One more comment.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You mention a default implementation for the handler that
>>>>>>>> fails. I
>>>>>>>>>>>>>>>> think, this should be part of the public API and thus should
>>>>>>>> have
>>>>>>>>>>> a
>>>>>>>>>>>>>>>> proper defined named that is mentioned in the KIP.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We could also add a second implementation for the
>>>>>>>> log-and-move-on
>>>>>>>>>>>>>>>> strategy, as both are the two most common cases. This class
>>>>>>>> should
>>>>>>>>>>>>> also
>>>>>>>>>>>>>>>> be part of public API (so users can just set in the config)
>>>>>>>> with a
>>>>>>>>>>>>>>>> proper name.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>>>>>>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are
>>>> compatible
>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> PR, but there will be a merge conflict that happens
> whenever
>>>>>>>> one
>>>>>>>>>>>> of
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> PRs
>>>>>>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR if 4148
>>>> goes
>>>>>>>>>>> in
>>>>>>>>>>>>>>>> first. :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
>>>>>>>>>>> wangguoz@gmail.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> That sounds reasonable, thanks Matt.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As for the implementation, please note that there is
>>>> another
>>>>>>>>>>>>> ongoing
>>>>>>>>>>>>>> PR
>>>>>>>>>>>>>>>>>> that may touch the same classes that you are working on:
>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So it may help if you can also take a look at that PR and
>>>> see
>>>>>>>>>>> if
>>>>>>>>>>>> it
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> compatible with your changes.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <
>>>> matt@frmr.me>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I've opened this pull request to implement the KIP as
>>>>>>>>>>> currently
>>>>>>>>>>>>>>>> written:
>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still needs
>>>>>>>>>>> some
>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>>> added,
>>>>>>>>>>>>>>>>>>> but largely represents the shape I was going for.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If there are more points that folks would like to
> discuss,
>>>>>>>>>>>> please
>>>>>>>>>>>>>> let
>>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon I'll
>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> [VOTE] thread.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <matt@frmr.me
>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I can’t think of a reason that would be problematic.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Most of the time I would write a handler like this, I
>>>> either
>>>>>>>>>>>> want
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> ignore the error or fail and bring everything down so
>>>> that I
>>>>>>>>>>>> can
>>>>>>>>>>>>>> spin
>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> back up later and resume from earlier offsets. When we
>>>> start
>>>>>>>>>>> up
>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>> crashing we’ll eventually try to process the message we
>>>>>>>>>>> failed
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>>>>>>> again.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I’m concerned that “putting in a queue for later” opens
>>>> you
>>>>>>>>>>> up
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> putting
>>>>>>>>>>>>>>>>>>>> messages into the destination topic in an unexpected
>>>> order.
>>>>>>>>>>>>> However
>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>> others feel differently, I’m happy to talk about it.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
>>>>>>>>>>>>> wangguoz@gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding
> is
>>>>>>>>>>> that
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred while
>>>>>>>>>>> trying
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> That is right. Thanks.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I looked at the example code, and one thing I realized
>>>> that
>>>>>>>>>>>>> since
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>> not passing the context in the handle function, we may
>>>> not
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> logic to send the fail records into another queue for
>>>>>>>> future
>>>>>>>>>>>>>>>>>> processing.
>>>>>>>>>>>>>>>>>>>>> Would people think that would be a big issue?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
>>>>>>>> matt@frmr.me
>>>>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation, and
>>>> made
>>>>>>>>>>> it
>>>>>>>>>>>> so
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>>>>>> interface, config setting, and parameters line up more
>>>>>>>>>>>> closely
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I believe there are a few specific questions I need to
>>>>>>>>>>> reply
>>>>>>>>>>>>>>>> to.....
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
>>>>>>>> around
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>>>>>>>>>>> generics
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that this
>>>> is a
>>>>>>>>>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics would
>>>> just
>>>>>>>>>>>> make
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>> harder
>>>>>>>>>>>>>>>>>>>>>> to work with the key and value.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>>>>>> `RecordMetadata`
>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>> well
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding
> is
>>>>>>>>>>> that
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred while
>>>>>>>>>>> trying
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>>>> following
>>>>>>>>>>>>>> handling
>>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know what
>>>> you
>>>>>>>>>>>>> think.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <
>>>> matt@frmr.me
>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right
>>>> now
>>>>>>>>>>> and
>>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>>>> planning
>>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>> updating the KIP again with details from this
>>>>>>>> conversation
>>>>>>>>>>>>> later
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
>>>>>>>>>>>>>> wangguoz@gmail.com
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Regarding the handle interface of
>>>>>>>>>>>>>> ProductionExceptionHandlerResp
>>>>>>>>>>>>>>>>>>> onse,
>>>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the actual
>>>>>>>>>>> added
>>>>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>>>>>>>>> (e.g. what
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>>>>> described).
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
>>>>>>>>>>> around
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>>>>>>>>>>> generics
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>>>>>> `RecordMetadata`
>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>> well in
>>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>>>> following
>>>>>>>>>>>>>> handling
>>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) throw
>>>>>>>>>>>> exception
>>>>>>>>>>>>>>>>>>>>>> immediately
>>>>>>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error
> and
>>>>>>>>>>> drop
>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a specific
>>>>>>>>>>> "error"
>>>>>>>>>>>>>> Kafka
>>>>>>>>>>>>>>>>>>>>> topic,
>>>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>> record it as an app-level metrics (
>>>>>>>>>>>>>>>>>>>>>>>>
>>>> https://kafka.apache.org/documentation/#kafka_streams_
>>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>> monitoring.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
>>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I did some more digging tonight.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler
> uses
>>>>>>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for the
>>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>>> name. No
>>>>>>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this
>>>> should
>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler".
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
>>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized record is
>>>>>>>>>>>>> possible,
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>> changing
>>>>>>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly doable.
>>>>>>>>>>> However,
>>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>>> we're
>>>>>>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have no
>>>>>>>>>>>> guarantee
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the time
>>>> this
>>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
>>>>>>>>>>> something
>>>>>>>>>>>>>>>>>> like:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
>>>>>>>>>>>>>>>>>>>>> ProducerRecord<..>
>>>>>>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>>>>> final Exception exception)
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Would this be acceptable?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
>>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and
>>>>>>>>>>> return
>>>>>>>>>>>>> type
>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>> I’m
>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and the
>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>> itself
>>>>>>>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>>>>>>>>> other changes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>>>>>>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design (parameter
>>>>>>>>>>> naming,
>>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>>>>>>>> types,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a
>>>>>>>>>>> unified
>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>> experience.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll update
>>>> the
>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>>>> accordingly.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>>>>>>>>>>>>>>>>>>> yuzhihong@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the
>>>> KIP ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name should
>>>> be
>>>>>>>>>>>>>>>>>>> specified.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
>>>>>>>>>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP
>>>> that I
>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>> filed
>>>>>>>>>>>>>>>>>>>>>>>> here:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
>>>>>>>>>>>> error+handling++when+Kafka+
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback from
>>>>>>>>>>> folks
>>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can contribute
>>>> back.
>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>>>
>>
> 


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
I have updated this KIP accordingly.

Can you please take a look and let me know if what I wrote looks correct to
you?

https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce

Thanks!

Matt


On December 4, 2017 at 9:39:13 PM, Matt Farmer (matt@frmr.me) wrote:

Hey Matthias, thanks for getting back to me.

That's fine. But if we add it to `test` package, we don't need to talk
about it in the KIP. `test` is not public API.

Yes, that makes sense. It was in the KIP originally because I was, at one
point, planning on including it. We can remove it now that we’ve decided we
won’t include it in the public API.

Understood. That makes sense. We should explain this clearly in the KIP
and maybe log all other following exceptions at DEBUG level?


I thought it was clear in the KIP, but I can go back and double check my
wording and revise it to try and make it clearer.

I’ll take a look at doing more work on the KIP and the Pull Request
tomorrow.

Thanks again!

On December 4, 2017 at 5:50:33 PM, Matthias J. Sax (matthias@confluent.io)
wrote:

Hey,

About your questions:

>>> Acknowledged, so is ProducerFencedException the only kind of exception I
>>> need to change my behavior on? Or are there other types I need to
check? Is
>>> there a comprehensive list somewhere?

I cannot think if any other atm. We should list all fatal exceptions for
which we don't call the handler and explain why (exception is "global"
and will affect all other records, too | ProducerFenced is self-healing).

We started to collect and categorize exception here (not completed yet):
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions
:

This list should be a good starting point though.

> I include it in the test package because I have tests that assert that if
> the record collector impl encounters an Exception and receives a CONTINUE
> that it actually does CONTINUE.

That's fine. But if we add it to `test` package, we don't need to talk
about it in the KIP. `test` is not public API.

> I didn't want to invoke the handler in places where the CONTINUE or FAIL
> result would just be ignored. Presumably, after a FAIL has been returned,
> subsequent exceptions are likely to be repeats or noise from my
> understanding of the code paths. If this is incorrect we can revisit.

Understood. That makes sense. We should explain this clearly in the KIP
and maybe log all other following exceptions at DEBUG level?


-Matthias


On 12/1/17 11:43 AM, Matt Farmer wrote:
> Bump! It's been three days here and I haven't seen any further feedback.
> Eager to get this resolved, approved, and merged. =)
>
> On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <ma...@frmr.me> wrote:
>
>> Hi there, sorry for the delay in responding. Last week had a holiday and
>> several busy work days in it so I'm just now getting around to
responding.
>>
>>> We would only exclude
>>> exception Streams can handle itself (like ProducerFencedException) --
>>> thus, if the handler has code to react to this, it would not be bad, as
>>> this code is just never called.
>> [...]
>>> Thus, I am still in favor of not calling the ProductionExceptionHandler
>>> for fatal exception.
>>
>> Acknowledged, so is ProducerFencedException the only kind of exception I
>> need to change my behavior on? Or are there other types I need to check?
Is
>> there a comprehensive list somewhere?
>>
>>> About the "always continue" case. Sounds good to me to remove it (not
>>> sure why we need it in test package?)
>>
>> I include it in the test package because I have tests that assert that if
>> the record collector impl encounters an Exception and receives a CONTINUE
>> that it actually does CONTINUE.
>>
>>> What is there reasoning for invoking the handler only for the first
>>> exception?
>>
>> I didn't want to invoke the handler in places where the CONTINUE or FAIL
>> result would just be ignored. Presumably, after a FAIL has been returned,
>> subsequent exceptions are likely to be repeats or noise from my
>> understanding of the code paths. If this is incorrect we can revisit.
>>
>> Once I get the answers to these questions I can make another pass on the
>> pull request!
>>
>> Matt
>>
>> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>
>>> Thanks for following up!
>>>
>>> One thought about an older reply from you:
>>>
>>>>>>> I strongly disagree here. The purpose of this handler isn't *just*
to
>>>>>>> make a decision for streams. There may also be desirable side
>>> effects that
>>>>>>> users wish to cause when production exceptions occur. There may be
>>>>>>> side-effects that they wish to cause when AuthenticationExceptions
>>> occur,
>>>>>>> as well. We should still give them the hooks to preform those side
>>> effects.
>>>
>>> And your follow up:
>>>
>>>>> - I think I would rather invoke it for all exceptions that could
>>> occur
>>>>> from attempting to produce - even those exceptions were returning
>>> CONTINUE
>>>>> may not be a good idea (e.g. Authorization exception). Until there
>>> is a
>>>>> different type for exceptions that are totally fatal (for example a
>>>>> FatalStreamsException or some sort), maintaining a list of
>>> exceptions that
>>>>> you can intercept with this handler and exceptions you cannot would
>>> be
>>>>> really error-prone and hard to keep correct.
>>>
>>> I understand what you are saying, however, consider that Streams needs
>>> to die for a fatal exception. Thus, if you call the handler for a fatal
>>> exception, we would need to ignore the return value and fail -- this
>>> seems to be rather intuitive. Furthermore, users can register an
>>> uncaught-exception-handler and side effects for fatal errors can be
>>> triggered there.
>>>
>>> Btw: an AuthorizationException is fatal -- not sure what you mean by an
>>> "totally fatal" exception -- there is no superlative to fatal from my
>>> understanding.
>>>
>>> About maintaining a list of exceptions: I don't think this is too hard,
>>> and users also don't need to worry about it IMHO. We would only exclude
>>> exception Streams can handle itself (like ProducerFencedException) --
>>> thus, if the handler has code to react to this, it would not be bad, as
>>> this code is just never called. In case that there is an exception
>>> Streams could actually handle but we still call the handler (ie, bug),
>>> there should be no harm either -- the handler needs to return either
>>> CONTINUE or FAIL and we would obey. It could only happen, that Streams
>>> dies---as request by the user(!)---even if Streams could actually handle
>>> the error and move on. But this seems to be not a issue from my point of
>>> view.
>>>
>>> Thus, I am still in favor of not calling the ProductionExceptionHandler
>>> for fatal exception.
>>>
>>>
>>>
>>> About the "always continue" case. Sounds good to me to remove it (not
>>> sure why we need it in test package?) and to rename the "failing"
>>> handler to "Default" (even if "default" is less descriptive and I would
>>> still prefer "Fail" in the name).
>>>
>>>
>>> Last question:
>>>
>>>>> - Continue to *only* invoke it on the first exception that we
>>>>> encounter (before sendException is set)
>>>
>>>
>>> What is there reasoning for invoking the handler only for the first
>>> exception?
>>>
>>>
>>>
>>>
>>> -Matthias
>>>
>>> On 11/20/17 10:43 AM, Matt Farmer wrote:
>>>> Alright, here are some updates I'm planning to make after thinking on
>>> this
>>>> for awhile:
>>>>
>>>> - Given that the "always continue" handler isn't something I'd
>>> recommend
>>>> for production use as is, I'm going to move it into the test
>>> namespace and
>>>> remove it from mention in the public API.
>>>> - I'm going to rename the "AlwaysFailProductionExceptionHandler" to
>>>> "DefaultProductionExceptionHandler"
>>>> - Given that the API for the exception handler involves being
>>> invoked by
>>>> streams to make a decision about whether or not to continue — I
>>> think that
>>>> we should:
>>>> - Continue to *only* invoke it on the first exception that we
>>>> encounter (before sendException is set)
>>>> - Stop invoking it for the self-healing fenced exceptions.
>>>> - I think I would rather invoke it for all exceptions that could
>>> occur
>>>> from attempting to produce - even those exceptions were returning
>>> CONTINUE
>>>> may not be a good idea (e.g. Authorization exception). Until there
>>> is a
>>>> different type for exceptions that are totally fatal (for example a
>>>> FatalStreamsException or some sort), maintaining a list of
>>> exceptions that
>>>> you can intercept with this handler and exceptions you cannot would
>>> be
>>>> really error-prone and hard to keep correct.
>>>> - I'm happy to file a KIP for the creation of this new Exception
>>> type
>>>> if there is interest.
>>>>
>>>> @ Matthias — What do you think about the above?
>>>>
>>>> On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:
>>>>
>>>>> I responded before reading your code review and didn't see the bit
>>> about
>>>>> how ProducerFencedException is self-healing. This error handling logic
>>> is
>>>>> *quite* confusing to reason about... I think I'm going to sit down
with
>>>>> the code a bit more today, but I'm inclined to think that if the
fenced
>>>>> exceptions are, indeed, self healing that we still invoke the handler
>>> but
>>>>> ignore its result for those exceptions.
>>>>>
>>>>> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me> wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> Following up here...
>>>>>>
>>>>>>> One tiny comment: I would prefer to remove the "Always" from the
>>>>>> handler implementation names -- it sounds "cleaner" to me without it.
>>>>>>
>>>>>> Let me think on this. I generally prefer expressiveness to
clean-ness,
>>>>>> and I think that comes out in the names I chose for things. The straw
>>> man
>>>>>> in my head is the person that tried to substitute in the
>>> "AlwaysContinue"
>>>>>> variant and the "Always" is a trigger to really consider whether or
>>> not
>>>>>> they *always* want to try to continue.
>>>>>>
>>>>>> To be truthful, after some thought, using the "AlwaysContinue"
variant
>>>>>> isn't something I'd recommend generally in a production system.
>>> Ideally
>>>>>> these handlers are targeted at handling a specific series of
>>> exceptions
>>>>>> that a user wants to ignore, and not ignoring all exceptions. More on
>>> this
>>>>>> in a minute.
>>>>>>
>>>>>>> For the first category, it seems to not make sense to call the
>>> handle but
>>>>>> Streams should always fail. If we follow this design, the KIP should
>>> list
>>>>>> all exceptions for which the handler is not called.
>>>>>>
>>>>>> I strongly disagree here. The purpose of this handler isn't *just* to
>>>>>> make a decision for streams. There may also be desirable side effects
>>> that
>>>>>> users wish to cause when production exceptions occur. There may be
>>>>>> side-effects that they wish to cause when AuthenticationExceptions
>>> occur,
>>>>>> as well. We should still give them the hooks to preform those side
>>> effects.
>>>>>>
>>>>>> In light of the above, I'm thinking that the
>>>>>> "AlwaysContinueProductionExceptionHandler" variant should probably be
>>>>>> removed from the public API and moved into tests since that's where
>>> it's
>>>>>> most useful. The more I think on it, the more I feel that having that
>>> in
>>>>>> the public API is a landmine. If you agree, then, we can rename the
>>>>>> "AlwaysFailProductionExceptionHandler" to
>>>>>> "DefaultProductionExceptionHandler".
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <
>>> matthias@confluent.io>
>>>>>> wrote:
>>>>>>
>>>>>>> I just review the PR, and there is one thing we should discuss.
>>>>>>>
>>>>>>> There are different types of exceptions that could occur. Some apply
>>> to
>>>>>>> all records (like Authorization exception) while others are for
>>> single
>>>>>>> records only (like record too large).
>>>>>>>
>>>>>>> For the first category, it seems to not make sense to call the
handle
>>>>>>> but Streams should always fail. If we follow this design, the KIP
>>> should
>>>>>>> list all exceptions for which the handler is not called.
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
>>>>>>>> Just catching up on this KIP.
>>>>>>>>
>>>>>>>> One tiny comment: I would prefer to remove the "Always" from the
>>>>>>> handler
>>>>>>>> implementation names -- it sounds "cleaner" to me without it.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>> On 11/5/17 12:57 PM, Matt Farmer wrote:
>>>>>>>>> It is agreed, then. I've updated the pull request. I'm trying to
>>> also
>>>>>>>>> update the KIP accordingly, but cwiki is being slow and dropping
>>>>>>>>> connections..... I'll try again a bit later but please consider
the
>>>>>>> KIP
>>>>>>>>> updated for all intents and purposes. heh.
>>>>>>>>>
>>>>>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wa...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> That makes sense.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Guozhang
>>>>>>>>>>
>>>>>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me>
>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Interesting. I'm not sure I agree. I've been bitten many times
by
>>>>>>>>>>> unintentionally shipping code that fails to properly implement
>>>>>>> logging. I
>>>>>>>>>>> always discover this at the exact *worst* moment, too. (Normally
>>> at
>>>>>>> 3 AM
>>>>>>>>>>> during an on-call shift. Hah.) However, if others feel the same
>>> way
>>>>>>> I
>>>>>>>>>> could
>>>>>>>>>>> probably be convinced to remove it.
>>>>>>>>>>>
>>>>>>>>>>> We could also meet halfway and say that when a customized
>>>>>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE, we log
>>> at
>>>>>>> DEBUG
>>>>>>>>>>> level instead of WARN level. Would that alternative be appealing
>>> to
>>>>>>> you?
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <
>>> wangguoz@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the updates. I made a pass over the wiki again and
it
>>>>>>> looks
>>>>>>>>>>>> good.
>>>>>>>>>>>>
>>>>>>>>>>>> About whether record collector should still internally log the
>>>>>>> error in
>>>>>>>>>>>> addition to what the customized ProductionExceptionHandler
>>> does. I
>>>>>>>>>>>> personally would prefer only to log if the returned value is
>>> FAIL
>>>>>>> to
>>>>>>>>>>>> indicate that this thread is going to shutdown and trigger the
>>>>>>>>>> exception
>>>>>>>>>>>> handler.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me>
>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updated this
>>>>>>> KIP as
>>>>>>>>>>>>> outlined above. The updated KIP is now ready for review again!
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me>
>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ah. I actually created both of those in the PR and forgot to
>>>>>>>>>> mention
>>>>>>>>>>>> them
>>>>>>>>>>>>>> by name in the KIP! Thanks for pointing out the oversight.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I’ll revise the KIP this afternoon accordingly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The logging is actually provided for in the record collector.
>>>>>>>>>>> Whenever
>>>>>>>>>>>> a
>>>>>>>>>>>>>> handler continues it’ll log a warning to ensure that it’s
>>>>>>>>>>> *impossible*
>>>>>>>>>>>> to
>>>>>>>>>>>>>> write a handler that totally suppresses production exceptions
>>>>>>> from
>>>>>>>>>>> the
>>>>>>>>>>>>> log.
>>>>>>>>>>>>>> As such, the default continue handler just returns the
>>> continue
>>>>>>>>>>> value.
>>>>>>>>>>>> I
>>>>>>>>>>>>>> can add details about those semantics to the KIP as well.
>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
>>>>>>>>>>> matthias@confluent.io
>>>>>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One more comment.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You mention a default implementation for the handler that
>>>>>>> fails. I
>>>>>>>>>>>>>>> think, this should be part of the public API and thus should
>>>>>>> have
>>>>>>>>>> a
>>>>>>>>>>>>>>> proper defined named that is mentioned in the KIP.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We could also add a second implementation for the
>>>>>>> log-and-move-on
>>>>>>>>>>>>>>> strategy, as both are the two most common cases. This class
>>>>>>> should
>>>>>>>>>>>> also
>>>>>>>>>>>>>>> be part of public API (so users can just set in the config)
>>>>>>> with a
>>>>>>>>>>>>>>> proper name.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>>>>>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are
>>> compatible
>>>>>>>>>>> with
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> PR, but there will be a merge conflict that happens
whenever
>>>>>>> one
>>>>>>>>>>> of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> PRs
>>>>>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR if 4148
>>> goes
>>>>>>>>>> in
>>>>>>>>>>>>>>> first. :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
>>>>>>>>>> wangguoz@gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> That sounds reasonable, thanks Matt.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> As for the implementation, please note that there is
>>> another
>>>>>>>>>>>> ongoing
>>>>>>>>>>>>> PR
>>>>>>>>>>>>>>>>> that may touch the same classes that you are working on:
>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So it may help if you can also take a look at that PR and
>>> see
>>>>>>>>>> if
>>>>>>>>>>> it
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> compatible with your changes.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <
>>> matt@frmr.me>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've opened this pull request to implement the KIP as
>>>>>>>>>> currently
>>>>>>>>>>>>>>> written:
>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still needs
>>>>>>>>>> some
>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>> added,
>>>>>>>>>>>>>>>>>> but largely represents the shape I was going for.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If there are more points that folks would like to
discuss,
>>>>>>>>>>> please
>>>>>>>>>>>>> let
>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon I'll
>>>>>>>>>>> probably
>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> [VOTE] thread.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <matt@frmr.me
>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I can’t think of a reason that would be problematic.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Most of the time I would write a handler like this, I
>>> either
>>>>>>>>>>> want
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> ignore the error or fail and bring everything down so
>>> that I
>>>>>>>>>>> can
>>>>>>>>>>>>> spin
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> back up later and resume from earlier offsets. When we
>>> start
>>>>>>>>>> up
>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>> crashing we’ll eventually try to process the message we
>>>>>>>>>> failed
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>>>>>> again.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I’m concerned that “putting in a queue for later” opens
>>> you
>>>>>>>>>> up
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> putting
>>>>>>>>>>>>>>>>>>> messages into the destination topic in an unexpected
>>> order.
>>>>>>>>>>>> However
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>> others feel differently, I’m happy to talk about it.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
>>>>>>>>>>>> wangguoz@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding
is
>>>>>>>>>> that
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred while
>>>>>>>>>> trying
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> That is right. Thanks.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I looked at the example code, and one thing I realized
>>> that
>>>>>>>>>>>> since
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>> not passing the context in the handle function, we may
>>> not
>>>>>>>>>> be
>>>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> logic to send the fail records into another queue for
>>>>>>> future
>>>>>>>>>>>>>>>>> processing.
>>>>>>>>>>>>>>>>>>>> Would people think that would be a big issue?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
>>>>>>> matt@frmr.me
>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation, and
>>> made
>>>>>>>>>> it
>>>>>>>>>>> so
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>>>>> interface, config setting, and parameters line up more
>>>>>>>>>>> closely
>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I believe there are a few specific questions I need to
>>>>>>>>>> reply
>>>>>>>>>>>>>>> to.....
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
>>>>>>> around
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>>>>>>>>>> generics
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that this
>>> is a
>>>>>>>>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics would
>>> just
>>>>>>>>>>> make
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> harder
>>>>>>>>>>>>>>>>>>>>> to work with the key and value.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>>>>> `RecordMetadata`
>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>> well
>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding
is
>>>>>>>>>> that
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred while
>>>>>>>>>> trying
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>>> following
>>>>>>>>>>>>> handling
>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know what
>>> you
>>>>>>>>>>>> think.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <
>>> matt@frmr.me
>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right
>>> now
>>>>>>>>>> and
>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>>> planning
>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>> updating the KIP again with details from this
>>>>>>> conversation
>>>>>>>>>>>> later
>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
>>>>>>>>>>>>> wangguoz@gmail.com
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Regarding the handle interface of
>>>>>>>>>>>>> ProductionExceptionHandlerResp
>>>>>>>>>>>>>>>>>> onse,
>>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the actual
>>>>>>>>>> added
>>>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>>>>>>>> (e.g. what
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>>>> described).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
>>>>>>>>>> around
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>>>>>>>>>> generics
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>>>>> `RecordMetadata`
>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>> well in
>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>>> following
>>>>>>>>>>>>> handling
>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) throw
>>>>>>>>>>> exception
>>>>>>>>>>>>>>>>>>>>> immediately
>>>>>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error
and
>>>>>>>>>> drop
>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a specific
>>>>>>>>>> "error"
>>>>>>>>>>>>> Kafka
>>>>>>>>>>>>>>>>>>>> topic,
>>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>> record it as an app-level metrics (
>>>>>>>>>>>>>>>>>>>>>>>
>>> https://kafka.apache.org/documentation/#kafka_streams_
>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>> monitoring.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I did some more digging tonight.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler
uses
>>>>>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for the
>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>> name. No
>>>>>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this
>>> should
>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler".
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized record is
>>>>>>>>>>>> possible,
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> changing
>>>>>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly doable.
>>>>>>>>>> However,
>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>> we're
>>>>>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have no
>>>>>>>>>>> guarantee
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the time
>>> this
>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
>>>>>>>>>> something
>>>>>>>>>>>>>>>>> like:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
>>>>>>>>>>>>>>>>>>>> ProducerRecord<..>
>>>>>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>>>> final Exception exception)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Would this be acceptable?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and
>>>>>>>>>> return
>>>>>>>>>>>> type
>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>> I’m
>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and the
>>>>>>>>>> record
>>>>>>>>>>>>>>>>>> itself
>>>>>>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>>>>>>>> other changes.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>>>>>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design (parameter
>>>>>>>>>> naming,
>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>>>>>>> types,
>>>>>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a
>>>>>>>>>> unified
>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>> experience.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll update
>>> the
>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>>> accordingly.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>>>>>>>>>>>>>>>>>> yuzhihong@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the
>>> KIP ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name should
>>> be
>>>>>>>>>>>>>>>>>> specified.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
>>>>>>>>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP
>>> that I
>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>> filed
>>>>>>>>>>>>>>>>>>>>>>> here:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
>>>>>>>>>>> error+handling++when+Kafka+
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback from
>>>>>>>>>> folks
>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can contribute
>>> back.
>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> -- Guozhang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>>>
>>>
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
Hey Matthias, thanks for getting back to me.

That's fine. But if we add it to `test` package, we don't need to talk
about it in the KIP. `test` is not public API.

Yes, that makes sense. It was in the KIP originally because I was, at one
point, planning on including it. We can remove it now that we’ve decided we
won’t include it in the public API.

Understood. That makes sense. We should explain this clearly in the KIP
and maybe log all other following exceptions at DEBUG level?


I thought it was clear in the KIP, but I can go back and double check my
wording and revise it to try and make it clearer.

I’ll take a look at doing more work on the KIP and the Pull Request
tomorrow.

Thanks again!

On December 4, 2017 at 5:50:33 PM, Matthias J. Sax (matthias@confluent.io)
wrote:

Hey,

About your questions:

>>> Acknowledged, so is ProducerFencedException the only kind of exception
I
>>> need to change my behavior on? Or are there other types I need to
check? Is
>>> there a comprehensive list somewhere?

I cannot think if any other atm. We should list all fatal exceptions for
which we don't call the handler and explain why (exception is "global"
and will affect all other records, too | ProducerFenced is self-healing).

We started to collect and categorize exception here (not completed yet):
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions:


This list should be a good starting point though.

> I include it in the test package because I have tests that assert that if
> the record collector impl encounters an Exception and receives a CONTINUE
> that it actually does CONTINUE.

That's fine. But if we add it to `test` package, we don't need to talk
about it in the KIP. `test` is not public API.

> I didn't want to invoke the handler in places where the CONTINUE or FAIL
> result would just be ignored. Presumably, after a FAIL has been returned,
> subsequent exceptions are likely to be repeats or noise from my
> understanding of the code paths. If this is incorrect we can revisit.

Understood. That makes sense. We should explain this clearly in the KIP
and maybe log all other following exceptions at DEBUG level?


-Matthias


On 12/1/17 11:43 AM, Matt Farmer wrote:
> Bump! It's been three days here and I haven't seen any further feedback.
> Eager to get this resolved, approved, and merged. =)
>
> On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <ma...@frmr.me> wrote:
>
>> Hi there, sorry for the delay in responding. Last week had a holiday and
>> several busy work days in it so I'm just now getting around to
responding.
>>
>>> We would only exclude
>>> exception Streams can handle itself (like ProducerFencedException) --
>>> thus, if the handler has code to react to this, it would not be bad, as
>>> this code is just never called.
>> [...]
>>> Thus, I am still in favor of not calling the ProductionExceptionHandler
>>> for fatal exception.
>>
>> Acknowledged, so is ProducerFencedException the only kind of exception I
>> need to change my behavior on? Or are there other types I need to check?
Is
>> there a comprehensive list somewhere?
>>
>>> About the "always continue" case. Sounds good to me to remove it (not
>>> sure why we need it in test package?)
>>
>> I include it in the test package because I have tests that assert that
if
>> the record collector impl encounters an Exception and receives a
CONTINUE
>> that it actually does CONTINUE.
>>
>>> What is there reasoning for invoking the handler only for the first
>>> exception?
>>
>> I didn't want to invoke the handler in places where the CONTINUE or FAIL
>> result would just be ignored. Presumably, after a FAIL has been
returned,
>> subsequent exceptions are likely to be repeats or noise from my
>> understanding of the code paths. If this is incorrect we can revisit.
>>
>> Once I get the answers to these questions I can make another pass on the
>> pull request!
>>
>> Matt
>>
>> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>
>>> Thanks for following up!
>>>
>>> One thought about an older reply from you:
>>>
>>>>>>> I strongly disagree here. The purpose of this handler isn't *just*
to
>>>>>>> make a decision for streams. There may also be desirable side
>>> effects that
>>>>>>> users wish to cause when production exceptions occur. There may be
>>>>>>> side-effects that they wish to cause when AuthenticationExceptions
>>> occur,
>>>>>>> as well. We should still give them the hooks to preform those side
>>> effects.
>>>
>>> And your follow up:
>>>
>>>>> - I think I would rather invoke it for all exceptions that could
>>> occur
>>>>> from attempting to produce - even those exceptions were returning
>>> CONTINUE
>>>>> may not be a good idea (e.g. Authorization exception). Until there
>>> is a
>>>>> different type for exceptions that are totally fatal (for example a
>>>>> FatalStreamsException or some sort), maintaining a list of
>>> exceptions that
>>>>> you can intercept with this handler and exceptions you cannot would
>>> be
>>>>> really error-prone and hard to keep correct.
>>>
>>> I understand what you are saying, however, consider that Streams needs
>>> to die for a fatal exception. Thus, if you call the handler for a fatal
>>> exception, we would need to ignore the return value and fail -- this
>>> seems to be rather intuitive. Furthermore, users can register an
>>> uncaught-exception-handler and side effects for fatal errors can be
>>> triggered there.
>>>
>>> Btw: an AuthorizationException is fatal -- not sure what you mean by an
>>> "totally fatal" exception -- there is no superlative to fatal from my
>>> understanding.
>>>
>>> About maintaining a list of exceptions: I don't think this is too hard,
>>> and users also don't need to worry about it IMHO. We would only exclude
>>> exception Streams can handle itself (like ProducerFencedException) --
>>> thus, if the handler has code to react to this, it would not be bad, as
>>> this code is just never called. In case that there is an exception
>>> Streams could actually handle but we still call the handler (ie, bug),
>>> there should be no harm either -- the handler needs to return either
>>> CONTINUE or FAIL and we would obey. It could only happen, that Streams
>>> dies---as request by the user(!)---even if Streams could actually
handle
>>> the error and move on. But this seems to be not a issue from my point
of
>>> view.
>>>
>>> Thus, I am still in favor of not calling the ProductionExceptionHandler
>>> for fatal exception.
>>>
>>>
>>>
>>> About the "always continue" case. Sounds good to me to remove it (not
>>> sure why we need it in test package?) and to rename the "failing"
>>> handler to "Default" (even if "default" is less descriptive and I would
>>> still prefer "Fail" in the name).
>>>
>>>
>>> Last question:
>>>
>>>>> - Continue to *only* invoke it on the first exception that we
>>>>> encounter (before sendException is set)
>>>
>>>
>>> What is there reasoning for invoking the handler only for the first
>>> exception?
>>>
>>>
>>>
>>>
>>> -Matthias
>>>
>>> On 11/20/17 10:43 AM, Matt Farmer wrote:
>>>> Alright, here are some updates I'm planning to make after thinking on
>>> this
>>>> for awhile:
>>>>
>>>> - Given that the "always continue" handler isn't something I'd
>>> recommend
>>>> for production use as is, I'm going to move it into the test
>>> namespace and
>>>> remove it from mention in the public API.
>>>> - I'm going to rename the "AlwaysFailProductionExceptionHandler" to
>>>> "DefaultProductionExceptionHandler"
>>>> - Given that the API for the exception handler involves being
>>> invoked by
>>>> streams to make a decision about whether or not to continue — I
>>> think that
>>>> we should:
>>>> - Continue to *only* invoke it on the first exception that we
>>>> encounter (before sendException is set)
>>>> - Stop invoking it for the self-healing fenced exceptions.
>>>> - I think I would rather invoke it for all exceptions that could
>>> occur
>>>> from attempting to produce - even those exceptions were returning
>>> CONTINUE
>>>> may not be a good idea (e.g. Authorization exception). Until there
>>> is a
>>>> different type for exceptions that are totally fatal (for example a
>>>> FatalStreamsException or some sort), maintaining a list of
>>> exceptions that
>>>> you can intercept with this handler and exceptions you cannot would
>>> be
>>>> really error-prone and hard to keep correct.
>>>> - I'm happy to file a KIP for the creation of this new Exception
>>> type
>>>> if there is interest.
>>>>
>>>> @ Matthias — What do you think about the above?
>>>>
>>>> On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:
>>>>
>>>>> I responded before reading your code review and didn't see the bit
>>> about
>>>>> how ProducerFencedException is self-healing. This error handling
logic
>>> is
>>>>> *quite* confusing to reason about... I think I'm going to sit down
with
>>>>> the code a bit more today, but I'm inclined to think that if the
fenced
>>>>> exceptions are, indeed, self healing that we still invoke the handler
>>> but
>>>>> ignore its result for those exceptions.
>>>>>
>>>>> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me> wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> Following up here...
>>>>>>
>>>>>>> One tiny comment: I would prefer to remove the "Always" from the
>>>>>> handler implementation names -- it sounds "cleaner" to me without
it.
>>>>>>
>>>>>> Let me think on this. I generally prefer expressiveness to
clean-ness,
>>>>>> and I think that comes out in the names I chose for things. The
straw
>>> man
>>>>>> in my head is the person that tried to substitute in the
>>> "AlwaysContinue"
>>>>>> variant and the "Always" is a trigger to really consider whether or
>>> not
>>>>>> they *always* want to try to continue.
>>>>>>
>>>>>> To be truthful, after some thought, using the "AlwaysContinue"
variant
>>>>>> isn't something I'd recommend generally in a production system.
>>> Ideally
>>>>>> these handlers are targeted at handling a specific series of
>>> exceptions
>>>>>> that a user wants to ignore, and not ignoring all exceptions. More
on
>>> this
>>>>>> in a minute.
>>>>>>
>>>>>>> For the first category, it seems to not make sense to call the
>>> handle but
>>>>>> Streams should always fail. If we follow this design, the KIP should
>>> list
>>>>>> all exceptions for which the handler is not called.
>>>>>>
>>>>>> I strongly disagree here. The purpose of this handler isn't *just*
to
>>>>>> make a decision for streams. There may also be desirable side
effects
>>> that
>>>>>> users wish to cause when production exceptions occur. There may be
>>>>>> side-effects that they wish to cause when AuthenticationExceptions
>>> occur,
>>>>>> as well. We should still give them the hooks to preform those side
>>> effects.
>>>>>>
>>>>>> In light of the above, I'm thinking that the
>>>>>> "AlwaysContinueProductionExceptionHandler" variant should probably
be
>>>>>> removed from the public API and moved into tests since that's where
>>> it's
>>>>>> most useful. The more I think on it, the more I feel that having
that
>>> in
>>>>>> the public API is a landmine. If you agree, then, we can rename the
>>>>>> "AlwaysFailProductionExceptionHandler" to
>>>>>> "DefaultProductionExceptionHandler".
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <
>>> matthias@confluent.io>
>>>>>> wrote:
>>>>>>
>>>>>>> I just review the PR, and there is one thing we should discuss.
>>>>>>>
>>>>>>> There are different types of exceptions that could occur. Some
apply
>>> to
>>>>>>> all records (like Authorization exception) while others are for
>>> single
>>>>>>> records only (like record too large).
>>>>>>>
>>>>>>> For the first category, it seems to not make sense to call the
handle
>>>>>>> but Streams should always fail. If we follow this design, the KIP
>>> should
>>>>>>> list all exceptions for which the handler is not called.
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
>>>>>>>> Just catching up on this KIP.
>>>>>>>>
>>>>>>>> One tiny comment: I would prefer to remove the "Always" from the
>>>>>>> handler
>>>>>>>> implementation names -- it sounds "cleaner" to me without it.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>> On 11/5/17 12:57 PM, Matt Farmer wrote:
>>>>>>>>> It is agreed, then. I've updated the pull request. I'm trying to
>>> also
>>>>>>>>> update the KIP accordingly, but cwiki is being slow and dropping
>>>>>>>>> connections..... I'll try again a bit later but please consider
the
>>>>>>> KIP
>>>>>>>>> updated for all intents and purposes. heh.
>>>>>>>>>
>>>>>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wa...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> That makes sense.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Guozhang
>>>>>>>>>>
>>>>>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me>
>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Interesting. I'm not sure I agree. I've been bitten many times
by
>>>>>>>>>>> unintentionally shipping code that fails to properly implement
>>>>>>> logging. I
>>>>>>>>>>> always discover this at the exact *worst* moment, too.
(Normally
>>> at
>>>>>>> 3 AM
>>>>>>>>>>> during an on-call shift. Hah.) However, if others feel the same
>>> way
>>>>>>> I
>>>>>>>>>> could
>>>>>>>>>>> probably be convinced to remove it.
>>>>>>>>>>>
>>>>>>>>>>> We could also meet halfway and say that when a customized
>>>>>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE, we
log
>>> at
>>>>>>> DEBUG
>>>>>>>>>>> level instead of WARN level. Would that alternative be
appealing
>>> to
>>>>>>> you?
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <
>>> wangguoz@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the updates. I made a pass over the wiki again and
it
>>>>>>> looks
>>>>>>>>>>>> good.
>>>>>>>>>>>>
>>>>>>>>>>>> About whether record collector should still internally log the
>>>>>>> error in
>>>>>>>>>>>> addition to what the customized ProductionExceptionHandler
>>> does. I
>>>>>>>>>>>> personally would prefer only to log if the returned value is
>>> FAIL
>>>>>>> to
>>>>>>>>>>>> indicate that this thread is going to shutdown and trigger the
>>>>>>>>>> exception
>>>>>>>>>>>> handler.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me>
>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updated
this
>>>>>>> KIP as
>>>>>>>>>>>>> outlined above. The updated KIP is now ready for review
again!
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me>
>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ah. I actually created both of those in the PR and forgot to
>>>>>>>>>> mention
>>>>>>>>>>>> them
>>>>>>>>>>>>>> by name in the KIP! Thanks for pointing out the oversight.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I’ll revise the KIP this afternoon accordingly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The logging is actually provided for in the record
collector.
>>>>>>>>>>> Whenever
>>>>>>>>>>>> a
>>>>>>>>>>>>>> handler continues it’ll log a warning to ensure that it’s
>>>>>>>>>>> *impossible*
>>>>>>>>>>>> to
>>>>>>>>>>>>>> write a handler that totally suppresses production
exceptions
>>>>>>> from
>>>>>>>>>>> the
>>>>>>>>>>>>> log.
>>>>>>>>>>>>>> As such, the default continue handler just returns the
>>> continue
>>>>>>>>>>> value.
>>>>>>>>>>>> I
>>>>>>>>>>>>>> can add details about those semantics to the KIP as well.
>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
>>>>>>>>>>> matthias@confluent.io
>>>>>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One more comment.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You mention a default implementation for the handler that
>>>>>>> fails. I
>>>>>>>>>>>>>>> think, this should be part of the public API and thus
should
>>>>>>> have
>>>>>>>>>> a
>>>>>>>>>>>>>>> proper defined named that is mentioned in the KIP.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We could also add a second implementation for the
>>>>>>> log-and-move-on
>>>>>>>>>>>>>>> strategy, as both are the two most common cases. This class
>>>>>>> should
>>>>>>>>>>>> also
>>>>>>>>>>>>>>> be part of public API (so users can just set in the config)
>>>>>>> with a
>>>>>>>>>>>>>>> proper name.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>>>>>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are
>>> compatible
>>>>>>>>>>> with
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> PR, but there will be a merge conflict that happens
whenever
>>>>>>> one
>>>>>>>>>>> of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> PRs
>>>>>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR if 4148
>>> goes
>>>>>>>>>> in
>>>>>>>>>>>>>>> first. :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
>>>>>>>>>> wangguoz@gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> That sounds reasonable, thanks Matt.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> As for the implementation, please note that there is
>>> another
>>>>>>>>>>>> ongoing
>>>>>>>>>>>>> PR
>>>>>>>>>>>>>>>>> that may touch the same classes that you are working on:
>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So it may help if you can also take a look at that PR and
>>> see
>>>>>>>>>> if
>>>>>>>>>>> it
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> compatible with your changes.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <
>>> matt@frmr.me>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've opened this pull request to implement the KIP as
>>>>>>>>>> currently
>>>>>>>>>>>>>>> written:
>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still
needs
>>>>>>>>>> some
>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>> added,
>>>>>>>>>>>>>>>>>> but largely represents the shape I was going for.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If there are more points that folks would like to
discuss,
>>>>>>>>>>> please
>>>>>>>>>>>>> let
>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon
I'll
>>>>>>>>>>> probably
>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> [VOTE] thread.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <matt@frmr.me
>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I can’t think of a reason that would be problematic.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Most of the time I would write a handler like this, I
>>> either
>>>>>>>>>>> want
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> ignore the error or fail and bring everything down so
>>> that I
>>>>>>>>>>> can
>>>>>>>>>>>>> spin
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> back up later and resume from earlier offsets. When we
>>> start
>>>>>>>>>> up
>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>> crashing we’ll eventually try to process the message we
>>>>>>>>>> failed
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>>>>>> again.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I’m concerned that “putting in a queue for later” opens
>>> you
>>>>>>>>>> up
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> putting
>>>>>>>>>>>>>>>>>>> messages into the destination topic in an unexpected
>>> order.
>>>>>>>>>>>> However
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>> others feel differently, I’m happy to talk about it.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
>>>>>>>>>>>> wangguoz@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding
is
>>>>>>>>>> that
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
while
>>>>>>>>>> trying
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> That is right. Thanks.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I looked at the example code, and one thing I realized
>>> that
>>>>>>>>>>>> since
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>> not passing the context in the handle function, we may
>>> not
>>>>>>>>>> be
>>>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> logic to send the fail records into another queue for
>>>>>>> future
>>>>>>>>>>>>>>>>> processing.
>>>>>>>>>>>>>>>>>>>> Would people think that would be a big issue?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
>>>>>>> matt@frmr.me
>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation, and
>>> made
>>>>>>>>>> it
>>>>>>>>>>> so
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>>>>> interface, config setting, and parameters line up
more
>>>>>>>>>>> closely
>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I believe there are a few specific questions I need
to
>>>>>>>>>> reply
>>>>>>>>>>>>>>> to.....
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
>>>>>>> around
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>>>>>>>>>> generics
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that this
>>> is a
>>>>>>>>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics would
>>> just
>>>>>>>>>>> make
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> harder
>>>>>>>>>>>>>>>>>>>>> to work with the key and value.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>>>>> `RecordMetadata`
>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>> well
>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding
is
>>>>>>>>>> that
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred
while
>>>>>>>>>> trying
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>>> following
>>>>>>>>>>>>> handling
>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know what
>>> you
>>>>>>>>>>>> think.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <
>>> matt@frmr.me
>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right
>>> now
>>>>>>>>>> and
>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>>> planning
>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>> updating the KIP again with details from this
>>>>>>> conversation
>>>>>>>>>>>> later
>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
>>>>>>>>>>>>> wangguoz@gmail.com
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Regarding the handle interface of
>>>>>>>>>>>>> ProductionExceptionHandlerResp
>>>>>>>>>>>>>>>>>> onse,
>>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the
actual
>>>>>>>>>> added
>>>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>>>>>>>> (e.g. what
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>>>> described).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
>>>>>>>>>> around
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or
be
>>>>>>>>>>> generics
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>>>>> `RecordMetadata`
>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>> well in
>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>>> following
>>>>>>>>>>>>> handling
>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) throw
>>>>>>>>>>> exception
>>>>>>>>>>>>>>>>>>>>> immediately
>>>>>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error
and
>>>>>>>>>> drop
>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a specific
>>>>>>>>>> "error"
>>>>>>>>>>>>> Kafka
>>>>>>>>>>>>>>>>>>>> topic,
>>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>> record it as an app-level metrics (
>>>>>>>>>>>>>>>>>>>>>>>
>>> https://kafka.apache.org/documentation/#kafka_streams_
>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>> monitoring.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I did some more digging tonight.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler
uses
>>>>>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for
the
>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>> name. No
>>>>>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this
>>> should
>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler".
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized record
is
>>>>>>>>>>>> possible,
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> changing
>>>>>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly doable.
>>>>>>>>>> However,
>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>> we're
>>>>>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have no
>>>>>>>>>>> guarantee
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the time
>>> this
>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
>>>>>>>>>> something
>>>>>>>>>>>>>>>>> like:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
>>>>>>>>>>>>>>>>>>>> ProducerRecord<..>
>>>>>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>>>> final Exception exception)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Would this be acceptable?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and
>>>>>>>>>> return
>>>>>>>>>>>> type
>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>> I’m
>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and
the
>>>>>>>>>> record
>>>>>>>>>>>>>>>>>> itself
>>>>>>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>>>>>>>> other changes.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax
<
>>>>>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design (parameter
>>>>>>>>>> naming,
>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>>>>>>> types,
>>>>>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a
>>>>>>>>>> unified
>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>> experience.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll
update
>>> the
>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>>> accordingly.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>>>>>>>>>>>>>>>>>> yuzhihong@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the
>>> KIP ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name should
>>> be
>>>>>>>>>>>>>>>>>> specified.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer
<
>>>>>>>>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP
>>> that I
>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>> filed
>>>>>>>>>>>>>>>>>>>>>>> here:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
>>>>>>>>>>> error+handling++when+Kafka+
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback
from
>>>>>>>>>> folks
>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can contribute
>>> back.
>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> -- Guozhang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>>>
>>>
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Hey,

About your questions:

>>> Acknowledged, so is ProducerFencedException the only kind of exception I
>>> need to change my behavior on? Or are there other types I need to check? Is
>>> there a comprehensive list somewhere?

I cannot think if any other atm. We should list all fatal exceptions for
which we don't call the handler and explain why (exception is "global"
and will affect all other records, too | ProducerFenced is self-healing).

We started to collect and categorize exception here (not completed yet):
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Architecture#KafkaStreamsArchitecture-TypesofExceptions:

This list should be a good starting point though.

> I include it in the test package because I have tests that assert that if
> the record collector impl encounters an Exception and receives a CONTINUE
> that it actually does CONTINUE.

That's fine. But if we add it to `test` package, we don't need to talk
about it in the KIP. `test` is not public API.

> I didn't want to invoke the handler in places where the CONTINUE or FAIL
> result would just be ignored. Presumably, after a FAIL has been returned,
> subsequent exceptions are likely to be repeats or noise from my
> understanding of the code paths. If this is incorrect we can revisit.

Understood. That makes sense. We should explain this clearly in the KIP
and maybe log all other following exceptions at DEBUG level?


-Matthias


On 12/1/17 11:43 AM, Matt Farmer wrote:
> Bump! It's been three days here and I haven't seen any further feedback.
> Eager to get this resolved, approved, and merged. =)
> 
> On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <ma...@frmr.me> wrote:
> 
>> Hi there, sorry for the delay in responding. Last week had a holiday and
>> several busy work days in it so I'm just now getting around to responding.
>>
>>> We would only exclude
>>> exception Streams can handle itself (like ProducerFencedException) --
>>> thus, if the handler has code to react to this, it would not be bad, as
>>> this code is just never called.
>> [...]
>>> Thus, I am still in favor of not calling the ProductionExceptionHandler
>>> for fatal exception.
>>
>> Acknowledged, so is ProducerFencedException the only kind of exception I
>> need to change my behavior on? Or are there other types I need to check? Is
>> there a comprehensive list somewhere?
>>
>>> About the "always continue" case. Sounds good to me to remove it (not
>>> sure why we need it in test package?)
>>
>> I include it in the test package because I have tests that assert that if
>> the record collector impl encounters an Exception and receives a CONTINUE
>> that it actually does CONTINUE.
>>
>>> What is there reasoning for invoking the handler only for the first
>>> exception?
>>
>> I didn't want to invoke the handler in places where the CONTINUE or FAIL
>> result would just be ignored. Presumably, after a FAIL has been returned,
>> subsequent exceptions are likely to be repeats or noise from my
>> understanding of the code paths. If this is incorrect we can revisit.
>>
>> Once I get the answers to these questions I can make another pass on the
>> pull request!
>>
>> Matt
>>
>> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>
>>> Thanks for following up!
>>>
>>> One thought about an older reply from you:
>>>
>>>>>>> I strongly disagree here. The purpose of this handler isn't *just* to
>>>>>>> make a decision for streams. There may also be desirable side
>>> effects that
>>>>>>> users wish to cause when production exceptions occur. There may be
>>>>>>> side-effects that they wish to cause when AuthenticationExceptions
>>> occur,
>>>>>>> as well. We should still give them the hooks to preform those side
>>> effects.
>>>
>>> And your follow up:
>>>
>>>>>    - I think I would rather invoke it for all exceptions that could
>>> occur
>>>>>    from attempting to produce - even those exceptions were returning
>>> CONTINUE
>>>>>    may not be a good idea (e.g. Authorization exception). Until there
>>> is a
>>>>>    different type for exceptions that are totally fatal (for example a
>>>>>    FatalStreamsException or some sort), maintaining a list of
>>> exceptions that
>>>>>    you can intercept with this handler and exceptions you cannot would
>>> be
>>>>>    really error-prone and hard to keep correct.
>>>
>>> I understand what you are saying, however, consider that Streams needs
>>> to die for a fatal exception. Thus, if you call the handler for a fatal
>>> exception, we would  need to ignore the return value and fail -- this
>>> seems to be rather intuitive. Furthermore, users can register an
>>> uncaught-exception-handler and side effects for fatal errors can be
>>> triggered there.
>>>
>>> Btw: an AuthorizationException is fatal -- not sure what you mean by an
>>> "totally fatal" exception -- there is no superlative to fatal from my
>>> understanding.
>>>
>>> About maintaining a list of exceptions: I don't think this is too hard,
>>> and users also don't need to worry about it IMHO. We would only exclude
>>> exception Streams can handle itself (like ProducerFencedException) --
>>> thus, if the handler has code to react to this, it would not be bad, as
>>> this code is just never called. In case that there is an exception
>>> Streams could actually handle but we still call the handler (ie, bug),
>>> there should be no harm either -- the handler needs to return either
>>> CONTINUE or FAIL and we would obey. It could only happen, that Streams
>>> dies---as request by the user(!)---even if Streams could actually handle
>>> the error and move on. But this seems to be not a issue from my point of
>>> view.
>>>
>>> Thus, I am still in favor of not calling the ProductionExceptionHandler
>>> for fatal exception.
>>>
>>>
>>>
>>> About the "always continue" case. Sounds good to me to remove it (not
>>> sure why we need it in test package?) and to rename the "failing"
>>> handler to "Default" (even if "default" is less descriptive and I would
>>> still prefer "Fail" in the name).
>>>
>>>
>>> Last question:
>>>
>>>>>       - Continue to *only* invoke it on the first exception that we
>>>>>       encounter (before sendException is set)
>>>
>>>
>>> What is there reasoning for invoking the handler only for the first
>>> exception?
>>>
>>>
>>>
>>>
>>> -Matthias
>>>
>>> On 11/20/17 10:43 AM, Matt Farmer wrote:
>>>> Alright, here are some updates I'm planning to make after thinking on
>>> this
>>>> for awhile:
>>>>
>>>>    - Given that the "always continue" handler isn't something I'd
>>> recommend
>>>>    for production use as is, I'm going to move it into the test
>>> namespace and
>>>>    remove it from mention in the public API.
>>>>    - I'm going to rename the "AlwaysFailProductionExceptionHandler" to
>>>>    "DefaultProductionExceptionHandler"
>>>>    - Given that the API for the exception handler involves being
>>> invoked by
>>>>    streams to make a decision about whether or not to continue — I
>>> think that
>>>>    we should:
>>>>       - Continue to *only* invoke it on the first exception that we
>>>>       encounter (before sendException is set)
>>>>       - Stop invoking it for the self-healing fenced exceptions.
>>>>    - I think I would rather invoke it for all exceptions that could
>>> occur
>>>>    from attempting to produce - even those exceptions were returning
>>> CONTINUE
>>>>    may not be a good idea (e.g. Authorization exception). Until there
>>> is a
>>>>    different type for exceptions that are totally fatal (for example a
>>>>    FatalStreamsException or some sort), maintaining a list of
>>> exceptions that
>>>>    you can intercept with this handler and exceptions you cannot would
>>> be
>>>>    really error-prone and hard to keep correct.
>>>>       - I'm happy to file a KIP for the creation of this new Exception
>>> type
>>>>       if there is interest.
>>>>
>>>> @ Matthias — What do you think about the above?
>>>>
>>>> On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:
>>>>
>>>>> I responded before reading your code review and didn't see the bit
>>> about
>>>>> how ProducerFencedException is self-healing. This error handling logic
>>> is
>>>>> *quite* confusing to reason about... I think I'm going to sit down with
>>>>> the code a bit more today, but I'm inclined to think that if the fenced
>>>>> exceptions are, indeed, self healing that we still invoke the handler
>>> but
>>>>> ignore its result for those exceptions.
>>>>>
>>>>> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me> wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> Following up here...
>>>>>>
>>>>>>> One tiny comment: I would prefer to remove the "Always" from the
>>>>>> handler implementation names -- it sounds "cleaner" to me without it.
>>>>>>
>>>>>> Let me think on this. I generally prefer expressiveness to clean-ness,
>>>>>> and I think that comes out in the names I chose for things. The straw
>>> man
>>>>>> in my head is the person that tried to substitute in the
>>> "AlwaysContinue"
>>>>>> variant and the "Always" is a trigger to really consider whether or
>>> not
>>>>>> they *always* want to try to continue.
>>>>>>
>>>>>> To be truthful, after some thought, using the "AlwaysContinue" variant
>>>>>> isn't something I'd recommend generally in a production system.
>>> Ideally
>>>>>> these handlers are targeted at handling a specific series of
>>> exceptions
>>>>>> that a user wants to ignore, and not ignoring all exceptions. More on
>>> this
>>>>>> in a minute.
>>>>>>
>>>>>>> For the first category, it seems to not make sense to call the
>>> handle but
>>>>>> Streams should always fail. If we follow this design, the KIP should
>>> list
>>>>>> all exceptions for which the handler is not called.
>>>>>>
>>>>>> I strongly disagree here. The purpose of this handler isn't *just* to
>>>>>> make a decision for streams. There may also be desirable side effects
>>> that
>>>>>> users wish to cause when production exceptions occur. There may be
>>>>>> side-effects that they wish to cause when AuthenticationExceptions
>>> occur,
>>>>>> as well. We should still give them the hooks to preform those side
>>> effects.
>>>>>>
>>>>>> In light of the above, I'm thinking that the
>>>>>> "AlwaysContinueProductionExceptionHandler" variant should probably be
>>>>>> removed from the public API and moved into tests since that's where
>>> it's
>>>>>> most useful. The more I think on it, the more I feel that having that
>>> in
>>>>>> the public API is a landmine. If you agree, then, we can rename the
>>>>>> "AlwaysFailProductionExceptionHandler" to
>>>>>> "DefaultProductionExceptionHandler".
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <
>>> matthias@confluent.io>
>>>>>> wrote:
>>>>>>
>>>>>>> I just review the PR, and there is one thing we should discuss.
>>>>>>>
>>>>>>> There are different types of exceptions that could occur. Some apply
>>> to
>>>>>>> all records (like Authorization exception) while others are for
>>> single
>>>>>>> records only (like record too large).
>>>>>>>
>>>>>>> For the first category, it seems to not make sense to call the handle
>>>>>>> but Streams should always fail. If we follow this design, the KIP
>>> should
>>>>>>> list all exceptions for which the handler is not called.
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
>>>>>>>> Just catching up on this KIP.
>>>>>>>>
>>>>>>>> One tiny comment: I would prefer to remove the "Always" from the
>>>>>>> handler
>>>>>>>> implementation names -- it sounds "cleaner" to me without it.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>> On 11/5/17 12:57 PM, Matt Farmer wrote:
>>>>>>>>> It is agreed, then. I've updated the pull request. I'm trying to
>>> also
>>>>>>>>> update the KIP accordingly, but cwiki is being slow and dropping
>>>>>>>>> connections..... I'll try again a bit later but please consider the
>>>>>>> KIP
>>>>>>>>> updated for all intents and purposes. heh.
>>>>>>>>>
>>>>>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wa...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> That makes sense.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Guozhang
>>>>>>>>>>
>>>>>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me>
>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Interesting. I'm not sure I agree. I've been bitten many times by
>>>>>>>>>>> unintentionally shipping code that fails to properly implement
>>>>>>> logging. I
>>>>>>>>>>> always discover this at the exact *worst* moment, too. (Normally
>>> at
>>>>>>> 3 AM
>>>>>>>>>>> during an on-call shift. Hah.) However, if others feel the same
>>> way
>>>>>>> I
>>>>>>>>>> could
>>>>>>>>>>> probably be convinced to remove it.
>>>>>>>>>>>
>>>>>>>>>>> We could also meet halfway and say that when a customized
>>>>>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE, we log
>>> at
>>>>>>> DEBUG
>>>>>>>>>>> level instead of WARN level. Would that alternative be appealing
>>> to
>>>>>>> you?
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <
>>> wangguoz@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the updates. I made a pass over the wiki again and it
>>>>>>> looks
>>>>>>>>>>>> good.
>>>>>>>>>>>>
>>>>>>>>>>>> About whether record collector should still internally log the
>>>>>>> error in
>>>>>>>>>>>> addition to what the customized ProductionExceptionHandler
>>> does. I
>>>>>>>>>>>> personally would prefer only to log if the returned value is
>>> FAIL
>>>>>>> to
>>>>>>>>>>>> indicate that this thread is going to shutdown and trigger the
>>>>>>>>>> exception
>>>>>>>>>>>> handler.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me>
>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updated this
>>>>>>> KIP as
>>>>>>>>>>>>> outlined above. The updated KIP is now ready for review again!
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me>
>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ah. I actually created both of those in the PR and forgot to
>>>>>>>>>> mention
>>>>>>>>>>>> them
>>>>>>>>>>>>>> by name in the KIP! Thanks for pointing out the oversight.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I’ll revise the KIP this afternoon accordingly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The logging is actually provided for in the record collector.
>>>>>>>>>>> Whenever
>>>>>>>>>>>> a
>>>>>>>>>>>>>> handler continues it’ll log a warning to ensure that it’s
>>>>>>>>>>> *impossible*
>>>>>>>>>>>> to
>>>>>>>>>>>>>> write a handler that totally suppresses production exceptions
>>>>>>> from
>>>>>>>>>>> the
>>>>>>>>>>>>> log.
>>>>>>>>>>>>>> As such, the default continue handler just returns the
>>> continue
>>>>>>>>>>> value.
>>>>>>>>>>>> I
>>>>>>>>>>>>>> can add details about those semantics to the KIP as well.
>>>>>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
>>>>>>>>>>> matthias@confluent.io
>>>>>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> One more comment.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You mention a default implementation for the handler that
>>>>>>> fails. I
>>>>>>>>>>>>>>> think, this should be part of the public API and thus should
>>>>>>> have
>>>>>>>>>> a
>>>>>>>>>>>>>>> proper defined named that is mentioned in the KIP.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We could also add a second implementation for the
>>>>>>> log-and-move-on
>>>>>>>>>>>>>>> strategy, as both are the two most common cases. This class
>>>>>>> should
>>>>>>>>>>>> also
>>>>>>>>>>>>>>> be part of public API (so users can just set in the config)
>>>>>>> with a
>>>>>>>>>>>>>>> proper name.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>>>>>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are
>>> compatible
>>>>>>>>>>> with
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> PR, but there will be a merge conflict that happens whenever
>>>>>>> one
>>>>>>>>>>> of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> PRs
>>>>>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR if 4148
>>> goes
>>>>>>>>>> in
>>>>>>>>>>>>>>> first. :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
>>>>>>>>>> wangguoz@gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> That sounds reasonable, thanks Matt.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> As for the implementation, please note that there is
>>> another
>>>>>>>>>>>> ongoing
>>>>>>>>>>>>> PR
>>>>>>>>>>>>>>>>> that may touch the same classes that you are working on:
>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So it may help if you can also take a look at that PR and
>>> see
>>>>>>>>>> if
>>>>>>>>>>> it
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> compatible with your changes.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <
>>> matt@frmr.me>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've opened this pull request to implement the KIP as
>>>>>>>>>> currently
>>>>>>>>>>>>>>> written:
>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still needs
>>>>>>>>>> some
>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>> added,
>>>>>>>>>>>>>>>>>> but largely represents the shape I was going for.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If there are more points that folks would like to discuss,
>>>>>>>>>>> please
>>>>>>>>>>>>> let
>>>>>>>>>>>>>>> me
>>>>>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon I'll
>>>>>>>>>>> probably
>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> [VOTE] thread.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <matt@frmr.me
>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I can’t think of a reason that would be problematic.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Most of the time I would write a handler like this, I
>>> either
>>>>>>>>>>> want
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> ignore the error or fail and bring everything down so
>>> that I
>>>>>>>>>>> can
>>>>>>>>>>>>> spin
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> back up later and resume from earlier offsets. When we
>>> start
>>>>>>>>>> up
>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>> crashing we’ll eventually try to process the message we
>>>>>>>>>> failed
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>>>>>> again.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I’m concerned that “putting in a queue for later” opens
>>> you
>>>>>>>>>> up
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> putting
>>>>>>>>>>>>>>>>>>> messages into the destination topic in an unexpected
>>> order.
>>>>>>>>>>>> However
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>> others feel differently, I’m happy to talk about it.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
>>>>>>>>>>>> wangguoz@gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>>>>>>>>>> that
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred while
>>>>>>>>>> trying
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> That is right. Thanks.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I looked at the example code, and one thing I realized
>>> that
>>>>>>>>>>>> since
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>> not passing the context in the handle function, we may
>>> not
>>>>>>>>>> be
>>>>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> logic to send the fail records into another queue for
>>>>>>> future
>>>>>>>>>>>>>>>>> processing.
>>>>>>>>>>>>>>>>>>>> Would people think that would be a big issue?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
>>>>>>> matt@frmr.me
>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation, and
>>> made
>>>>>>>>>> it
>>>>>>>>>>> so
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>>>>> interface, config setting, and parameters line up more
>>>>>>>>>>> closely
>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I believe there are a few specific questions I need to
>>>>>>>>>> reply
>>>>>>>>>>>>>>> to.....
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
>>>>>>> around
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>>>>>>>>>> generics
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that this
>>> is a
>>>>>>>>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics would
>>> just
>>>>>>>>>>> make
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> harder
>>>>>>>>>>>>>>>>>>>>> to work with the key and value.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>>>>> `RecordMetadata`
>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>> well
>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>>>>>>>>>> that
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred while
>>>>>>>>>> trying
>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>>> following
>>>>>>>>>>>>> handling
>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know what
>>> you
>>>>>>>>>>>> think.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <
>>> matt@frmr.me
>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right
>>> now
>>>>>>>>>> and
>>>>>>>>>>> am
>>>>>>>>>>>>>>>>>>>> planning
>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>> updating the KIP again with details from this
>>>>>>> conversation
>>>>>>>>>>>> later
>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
>>>>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
>>>>>>>>>>>>> wangguoz@gmail.com
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Regarding the handle interface of
>>>>>>>>>>>>> ProductionExceptionHandlerResp
>>>>>>>>>>>>>>>>>> onse,
>>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the actual
>>>>>>>>>> added
>>>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>>>>>>>> (e.g. what
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>>>> described).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
>>>>>>>>>> around
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>>>>>>>>>> generics
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>>>>> `RecordMetadata`
>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>> well in
>>>>>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>>> following
>>>>>>>>>>>>> handling
>>>>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) throw
>>>>>>>>>>> exception
>>>>>>>>>>>>>>>>>>>>> immediately
>>>>>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error and
>>>>>>>>>> drop
>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a specific
>>>>>>>>>> "error"
>>>>>>>>>>>>> Kafka
>>>>>>>>>>>>>>>>>>>> topic,
>>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>> record it as an app-level metrics (
>>>>>>>>>>>>>>>>>>>>>>>
>>> https://kafka.apache.org/documentation/#kafka_streams_
>>>>>>>>>>>>> monitoring
>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>> monitoring.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I did some more digging tonight.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler uses
>>>>>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for the
>>>>>>>>>>> config
>>>>>>>>>>>>>>>>>> name. No
>>>>>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this
>>> should
>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler".
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
>>>>>>>>>> matt@frmr.me
>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized record is
>>>>>>>>>>>> possible,
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> changing
>>>>>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly doable.
>>>>>>>>>> However,
>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>>>> we're
>>>>>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have no
>>>>>>>>>>> guarantee
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the time
>>> this
>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
>>>>>>>>>> something
>>>>>>>>>>>>>>>>> like:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
>>>>>>>>>>>>>>>>>>>> ProducerRecord<..>
>>>>>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>>>>> final Exception exception)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Would this be acceptable?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and
>>>>>>>>>> return
>>>>>>>>>>>> type
>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>> I’m
>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and the
>>>>>>>>>> record
>>>>>>>>>>>>>>>>>> itself
>>>>>>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>>>>>>>> other changes.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>>>>>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design (parameter
>>>>>>>>>> naming,
>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>>>>>>> types,
>>>>>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a
>>>>>>>>>> unified
>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>> experience.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll update
>>> the
>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>>> accordingly.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>>>>>>>>>>>>>>>>>> yuzhihong@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the
>>> KIP ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name should
>>> be
>>>>>>>>>>>>>>>>>> specified.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
>>>>>>>>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP
>>> that I
>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>> filed
>>>>>>>>>>>>>>>>>>>>>>> here:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
>>>>>>>>>>> error+handling++when+Kafka+
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback from
>>>>>>>>>> folks
>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can contribute
>>> back.
>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> -- Guozhang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>>>
>>>
> 


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
Bump! It's been three days here and I haven't seen any further feedback.
Eager to get this resolved, approved, and merged. =)

On Tue, Nov 28, 2017 at 9:53 AM Matt Farmer <ma...@frmr.me> wrote:

> Hi there, sorry for the delay in responding. Last week had a holiday and
> several busy work days in it so I'm just now getting around to responding.
>
> > We would only exclude
> > exception Streams can handle itself (like ProducerFencedException) --
> > thus, if the handler has code to react to this, it would not be bad, as
> > this code is just never called.
> [...]
> > Thus, I am still in favor of not calling the ProductionExceptionHandler
> > for fatal exception.
>
> Acknowledged, so is ProducerFencedException the only kind of exception I
> need to change my behavior on? Or are there other types I need to check? Is
> there a comprehensive list somewhere?
>
> > About the "always continue" case. Sounds good to me to remove it (not
> > sure why we need it in test package?)
>
> I include it in the test package because I have tests that assert that if
> the record collector impl encounters an Exception and receives a CONTINUE
> that it actually does CONTINUE.
>
> > What is there reasoning for invoking the handler only for the first
> > exception?
>
> I didn't want to invoke the handler in places where the CONTINUE or FAIL
> result would just be ignored. Presumably, after a FAIL has been returned,
> subsequent exceptions are likely to be repeats or noise from my
> understanding of the code paths. If this is incorrect we can revisit.
>
> Once I get the answers to these questions I can make another pass on the
> pull request!
>
> Matt
>
> On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <ma...@confluent.io>
> wrote:
>
>> Thanks for following up!
>>
>> One thought about an older reply from you:
>>
>> >>>> I strongly disagree here. The purpose of this handler isn't *just* to
>> >>>> make a decision for streams. There may also be desirable side
>> effects that
>> >>>> users wish to cause when production exceptions occur. There may be
>> >>>> side-effects that they wish to cause when AuthenticationExceptions
>> occur,
>> >>>> as well. We should still give them the hooks to preform those side
>> effects.
>>
>> And your follow up:
>>
>> >>    - I think I would rather invoke it for all exceptions that could
>> occur
>> >>    from attempting to produce - even those exceptions were returning
>> CONTINUE
>> >>    may not be a good idea (e.g. Authorization exception). Until there
>> is a
>> >>    different type for exceptions that are totally fatal (for example a
>> >>    FatalStreamsException or some sort), maintaining a list of
>> exceptions that
>> >>    you can intercept with this handler and exceptions you cannot would
>> be
>> >>    really error-prone and hard to keep correct.
>>
>> I understand what you are saying, however, consider that Streams needs
>> to die for a fatal exception. Thus, if you call the handler for a fatal
>> exception, we would  need to ignore the return value and fail -- this
>> seems to be rather intuitive. Furthermore, users can register an
>> uncaught-exception-handler and side effects for fatal errors can be
>> triggered there.
>>
>> Btw: an AuthorizationException is fatal -- not sure what you mean by an
>> "totally fatal" exception -- there is no superlative to fatal from my
>> understanding.
>>
>> About maintaining a list of exceptions: I don't think this is too hard,
>> and users also don't need to worry about it IMHO. We would only exclude
>> exception Streams can handle itself (like ProducerFencedException) --
>> thus, if the handler has code to react to this, it would not be bad, as
>> this code is just never called. In case that there is an exception
>> Streams could actually handle but we still call the handler (ie, bug),
>> there should be no harm either -- the handler needs to return either
>> CONTINUE or FAIL and we would obey. It could only happen, that Streams
>> dies---as request by the user(!)---even if Streams could actually handle
>> the error and move on. But this seems to be not a issue from my point of
>> view.
>>
>> Thus, I am still in favor of not calling the ProductionExceptionHandler
>> for fatal exception.
>>
>>
>>
>> About the "always continue" case. Sounds good to me to remove it (not
>> sure why we need it in test package?) and to rename the "failing"
>> handler to "Default" (even if "default" is less descriptive and I would
>> still prefer "Fail" in the name).
>>
>>
>> Last question:
>>
>> >>       - Continue to *only* invoke it on the first exception that we
>> >>       encounter (before sendException is set)
>>
>>
>> What is there reasoning for invoking the handler only for the first
>> exception?
>>
>>
>>
>>
>> -Matthias
>>
>> On 11/20/17 10:43 AM, Matt Farmer wrote:
>> > Alright, here are some updates I'm planning to make after thinking on
>> this
>> > for awhile:
>> >
>> >    - Given that the "always continue" handler isn't something I'd
>> recommend
>> >    for production use as is, I'm going to move it into the test
>> namespace and
>> >    remove it from mention in the public API.
>> >    - I'm going to rename the "AlwaysFailProductionExceptionHandler" to
>> >    "DefaultProductionExceptionHandler"
>> >    - Given that the API for the exception handler involves being
>> invoked by
>> >    streams to make a decision about whether or not to continue — I
>> think that
>> >    we should:
>> >       - Continue to *only* invoke it on the first exception that we
>> >       encounter (before sendException is set)
>> >       - Stop invoking it for the self-healing fenced exceptions.
>> >    - I think I would rather invoke it for all exceptions that could
>> occur
>> >    from attempting to produce - even those exceptions were returning
>> CONTINUE
>> >    may not be a good idea (e.g. Authorization exception). Until there
>> is a
>> >    different type for exceptions that are totally fatal (for example a
>> >    FatalStreamsException or some sort), maintaining a list of
>> exceptions that
>> >    you can intercept with this handler and exceptions you cannot would
>> be
>> >    really error-prone and hard to keep correct.
>> >       - I'm happy to file a KIP for the creation of this new Exception
>> type
>> >       if there is interest.
>> >
>> > @ Matthias — What do you think about the above?
>> >
>> > On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:
>> >
>> >> I responded before reading your code review and didn't see the bit
>> about
>> >> how ProducerFencedException is self-healing. This error handling logic
>> is
>> >> *quite* confusing to reason about... I think I'm going to sit down with
>> >> the code a bit more today, but I'm inclined to think that if the fenced
>> >> exceptions are, indeed, self healing that we still invoke the handler
>> but
>> >> ignore its result for those exceptions.
>> >>
>> >> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me> wrote:
>> >>
>> >>> Hi there,
>> >>>
>> >>> Following up here...
>> >>>
>> >>>> One tiny comment: I would prefer to remove the "Always" from the
>> >>> handler implementation names -- it sounds "cleaner" to me without it.
>> >>>
>> >>> Let me think on this. I generally prefer expressiveness to clean-ness,
>> >>> and I think that comes out in the names I chose for things. The straw
>> man
>> >>> in my head is the person that tried to substitute in the
>> "AlwaysContinue"
>> >>> variant and the "Always" is a trigger to really consider whether or
>> not
>> >>> they *always* want to try to continue.
>> >>>
>> >>> To be truthful, after some thought, using the "AlwaysContinue" variant
>> >>> isn't something I'd recommend generally in a production system.
>> Ideally
>> >>> these handlers are targeted at handling a specific series of
>> exceptions
>> >>> that a user wants to ignore, and not ignoring all exceptions. More on
>> this
>> >>> in a minute.
>> >>>
>> >>>> For the first category, it seems to not make sense to call the
>> handle but
>> >>> Streams should always fail. If we follow this design, the KIP should
>> list
>> >>> all exceptions for which the handler is not called.
>> >>>
>> >>> I strongly disagree here. The purpose of this handler isn't *just* to
>> >>> make a decision for streams. There may also be desirable side effects
>> that
>> >>> users wish to cause when production exceptions occur. There may be
>> >>> side-effects that they wish to cause when AuthenticationExceptions
>> occur,
>> >>> as well. We should still give them the hooks to preform those side
>> effects.
>> >>>
>> >>> In light of the above, I'm thinking that the
>> >>> "AlwaysContinueProductionExceptionHandler" variant should probably be
>> >>> removed from the public API and moved into tests since that's where
>> it's
>> >>> most useful. The more I think on it, the more I feel that having that
>> in
>> >>> the public API is a landmine. If you agree, then, we can rename the
>> >>> "AlwaysFailProductionExceptionHandler" to
>> >>> "DefaultProductionExceptionHandler".
>> >>>
>> >>> Thoughts?
>> >>>
>> >>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <
>> matthias@confluent.io>
>> >>> wrote:
>> >>>
>> >>>> I just review the PR, and there is one thing we should discuss.
>> >>>>
>> >>>> There are different types of exceptions that could occur. Some apply
>> to
>> >>>> all records (like Authorization exception) while others are for
>> single
>> >>>> records only (like record too large).
>> >>>>
>> >>>> For the first category, it seems to not make sense to call the handle
>> >>>> but Streams should always fail. If we follow this design, the KIP
>> should
>> >>>> list all exceptions for which the handler is not called.
>> >>>>
>> >>>> WDYT?
>> >>>>
>> >>>>
>> >>>> -Matthias
>> >>>>
>> >>>>
>> >>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
>> >>>>> Just catching up on this KIP.
>> >>>>>
>> >>>>> One tiny comment: I would prefer to remove the "Always" from the
>> >>>> handler
>> >>>>> implementation names -- it sounds "cleaner" to me without it.
>> >>>>>
>> >>>>>
>> >>>>> -Matthias
>> >>>>>
>> >>>>> On 11/5/17 12:57 PM, Matt Farmer wrote:
>> >>>>>> It is agreed, then. I've updated the pull request. I'm trying to
>> also
>> >>>>>> update the KIP accordingly, but cwiki is being slow and dropping
>> >>>>>> connections..... I'll try again a bit later but please consider the
>> >>>> KIP
>> >>>>>> updated for all intents and purposes. heh.
>> >>>>>>
>> >>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wa...@gmail.com>
>> >>>> wrote:
>> >>>>>>
>> >>>>>>> That makes sense.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Guozhang
>> >>>>>>>
>> >>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me>
>> wrote:
>> >>>>>>>
>> >>>>>>>> Interesting. I'm not sure I agree. I've been bitten many times by
>> >>>>>>>> unintentionally shipping code that fails to properly implement
>> >>>> logging. I
>> >>>>>>>> always discover this at the exact *worst* moment, too. (Normally
>> at
>> >>>> 3 AM
>> >>>>>>>> during an on-call shift. Hah.) However, if others feel the same
>> way
>> >>>> I
>> >>>>>>> could
>> >>>>>>>> probably be convinced to remove it.
>> >>>>>>>>
>> >>>>>>>> We could also meet halfway and say that when a customized
>> >>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE, we log
>> at
>> >>>> DEBUG
>> >>>>>>>> level instead of WARN level. Would that alternative be appealing
>> to
>> >>>> you?
>> >>>>>>>>
>> >>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <
>> wangguoz@gmail.com>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Thanks for the updates. I made a pass over the wiki again and it
>> >>>> looks
>> >>>>>>>>> good.
>> >>>>>>>>>
>> >>>>>>>>> About whether record collector should still internally log the
>> >>>> error in
>> >>>>>>>>> addition to what the customized ProductionExceptionHandler
>> does. I
>> >>>>>>>>> personally would prefer only to log if the returned value is
>> FAIL
>> >>>> to
>> >>>>>>>>> indicate that this thread is going to shutdown and trigger the
>> >>>>>>> exception
>> >>>>>>>>> handler.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Guozhang
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me>
>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updated this
>> >>>> KIP as
>> >>>>>>>>>> outlined above. The updated KIP is now ready for review again!
>> >>>>>>>>>>
>> >>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me>
>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Ah. I actually created both of those in the PR and forgot to
>> >>>>>>> mention
>> >>>>>>>>> them
>> >>>>>>>>>>> by name in the KIP! Thanks for pointing out the oversight.
>> >>>>>>>>>>>
>> >>>>>>>>>>> I’ll revise the KIP this afternoon accordingly.
>> >>>>>>>>>>>
>> >>>>>>>>>>> The logging is actually provided for in the record collector.
>> >>>>>>>> Whenever
>> >>>>>>>>> a
>> >>>>>>>>>>> handler continues it’ll log a warning to ensure that it’s
>> >>>>>>>> *impossible*
>> >>>>>>>>> to
>> >>>>>>>>>>> write a handler that totally suppresses production exceptions
>> >>>> from
>> >>>>>>>> the
>> >>>>>>>>>> log.
>> >>>>>>>>>>> As such, the default continue handler just returns the
>> continue
>> >>>>>>>> value.
>> >>>>>>>>> I
>> >>>>>>>>>>> can add details about those semantics to the KIP as well.
>> >>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
>> >>>>>>>> matthias@confluent.io
>> >>>>>>>>>>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> One more comment.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> You mention a default implementation for the handler that
>> >>>> fails. I
>> >>>>>>>>>>>> think, this should be part of the public API and thus should
>> >>>> have
>> >>>>>>> a
>> >>>>>>>>>>>> proper defined named that is mentioned in the KIP.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> We could also add a second implementation for the
>> >>>> log-and-move-on
>> >>>>>>>>>>>> strategy, as both are the two most common cases. This class
>> >>>> should
>> >>>>>>>>> also
>> >>>>>>>>>>>> be part of public API (so users can just set in the config)
>> >>>> with a
>> >>>>>>>>>>>> proper name.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> -Matthias
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>> >>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are
>> compatible
>> >>>>>>>> with
>> >>>>>>>>>>>> that
>> >>>>>>>>>>>>> PR, but there will be a merge conflict that happens whenever
>> >>>> one
>> >>>>>>>> of
>> >>>>>>>>>> the
>> >>>>>>>>>>>> PRs
>> >>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR if 4148
>> goes
>> >>>>>>> in
>> >>>>>>>>>>>> first. :)
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
>> >>>>>>> wangguoz@gmail.com
>> >>>>>>>>>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> That sounds reasonable, thanks Matt.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> As for the implementation, please note that there is
>> another
>> >>>>>>>>> ongoing
>> >>>>>>>>>> PR
>> >>>>>>>>>>>>>> that may touch the same classes that you are working on:
>> >>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> So it may help if you can also take a look at that PR and
>> see
>> >>>>>>> if
>> >>>>>>>> it
>> >>>>>>>>>> is
>> >>>>>>>>>>>>>> compatible with your changes.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Guozhang
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <
>> matt@frmr.me>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> I've opened this pull request to implement the KIP as
>> >>>>>>> currently
>> >>>>>>>>>>>> written:
>> >>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still needs
>> >>>>>>> some
>> >>>>>>>>>> tests
>> >>>>>>>>>>>>>>> added,
>> >>>>>>>>>>>>>>> but largely represents the shape I was going for.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> If there are more points that folks would like to discuss,
>> >>>>>>>> please
>> >>>>>>>>>> let
>> >>>>>>>>>>>> me
>> >>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon I'll
>> >>>>>>>> probably
>> >>>>>>>>>>>> start
>> >>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>> [VOTE] thread.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>>> Matt
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <matt@frmr.me
>> >
>> >>>>>>>> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> I can’t think of a reason that would be problematic.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Most of the time I would write a handler like this, I
>> either
>> >>>>>>>> want
>> >>>>>>>>>> to
>> >>>>>>>>>>>>>>>> ignore the error or fail and bring everything down so
>> that I
>> >>>>>>>> can
>> >>>>>>>>>> spin
>> >>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>> back up later and resume from earlier offsets. When we
>> start
>> >>>>>>> up
>> >>>>>>>>>> after
>> >>>>>>>>>>>>>>>> crashing we’ll eventually try to process the message we
>> >>>>>>> failed
>> >>>>>>>> to
>> >>>>>>>>>>>>>> produce
>> >>>>>>>>>>>>>>>> again.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> I’m concerned that “putting in a queue for later” opens
>> you
>> >>>>>>> up
>> >>>>>>>> to
>> >>>>>>>>>>>>>> putting
>> >>>>>>>>>>>>>>>> messages into the destination topic in an unexpected
>> order.
>> >>>>>>>>> However
>> >>>>>>>>>>>> if
>> >>>>>>>>>>>>>>>> others feel differently, I’m happy to talk about it.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
>> >>>>>>>>> wangguoz@gmail.com>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>> >>>>>>> that
>> >>>>>>>>> the
>> >>>>>>>>>>>>>>> record
>> >>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred while
>> >>>>>>> trying
>> >>>>>>>>> to
>> >>>>>>>>>>>>>>>>> produce.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> That is right. Thanks.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> I looked at the example code, and one thing I realized
>> that
>> >>>>>>>>> since
>> >>>>>>>>>> we
>> >>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>> not passing the context in the handle function, we may
>> not
>> >>>>>>> be
>> >>>>>>>>>>>>>> implement
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> logic to send the fail records into another queue for
>> >>>> future
>> >>>>>>>>>>>>>> processing.
>> >>>>>>>>>>>>>>>>> Would people think that would be a big issue?
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Guozhang
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
>> >>>> matt@frmr.me
>> >>>>>>>>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Hello all,
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation, and
>> made
>> >>>>>>> it
>> >>>>>>>> so
>> >>>>>>>>>>>> that
>> >>>>>>>>>>>>>>> its
>> >>>>>>>>>>>>>>>>>> interface, config setting, and parameters line up more
>> >>>>>>>> closely
>> >>>>>>>>>> with
>> >>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> I believe there are a few specific questions I need to
>> >>>>>>> reply
>> >>>>>>>>>>>> to.....
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
>> >>>> around
>> >>>>>>>> the
>> >>>>>>>>>>>>>>> record,
>> >>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>> >>>>>>>> generics
>> >>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>> >>>>>>>>> `ProducerRecord<?
>> >>>>>>>>>>>>>>>>> extends
>> >>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that this
>> is a
>> >>>>>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics would
>> just
>> >>>>>>>> make
>> >>>>>>>>>> it
>> >>>>>>>>>>>>>>>>> harder
>> >>>>>>>>>>>>>>>>>> to work with the key and value.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>> >>>>>>>> `RecordMetadata`
>> >>>>>>>>> as
>> >>>>>>>>>>>>>>> well
>> >>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>> case it is not null?
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>> >>>>>>> that
>> >>>>>>>>> the
>> >>>>>>>>>>>>>>> record
>> >>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred while
>> >>>>>>> trying
>> >>>>>>>>> to
>> >>>>>>>>>>>>>>>>> produce.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>> following
>> >>>>>>>>>> handling
>> >>>>>>>>>>>>>>>>> logic
>> >>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know what
>> you
>> >>>>>>>>> think.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>>>>>> Matt
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <
>> matt@frmr.me
>> >>>>>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right
>> now
>> >>>>>>> and
>> >>>>>>>> am
>> >>>>>>>>>>>>>>>>> planning
>> >>>>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>> updating the KIP again with details from this
>> >>>> conversation
>> >>>>>>>>> later
>> >>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>> week.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
>> >>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
>> >>>>>>>>>> wangguoz@gmail.com
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Regarding the handle interface of
>> >>>>>>>>>> ProductionExceptionHandlerResp
>> >>>>>>>>>>>>>>> onse,
>> >>>>>>>>>>>>>>>>>>>> could
>> >>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the actual
>> >>>>>>> added
>> >>>>>>>>>> config
>> >>>>>>>>>>>>>>>>> names
>> >>>>>>>>>>>>>>>>>>>> (e.g. what
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> >>>>>>>> 161%3A+streams+
>> >>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>> >>>>>>>>>>>>>>>>>>>> described).
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
>> >>>>>>> around
>> >>>>>>>>> the
>> >>>>>>>>>>>>>>>>> record,
>> >>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>> >>>>>>>> generics
>> >>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>> >>>>>>>>> `ProducerRecord<?
>> >>>>>>>>>>>>>>>>> extends
>> >>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>> >>>>>>>> `RecordMetadata`
>> >>>>>>>>>> as
>> >>>>>>>>>>>>>>>>> well in
>> >>>>>>>>>>>>>>>>>>>> case it is not null?
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
>> following
>> >>>>>>>>>> handling
>> >>>>>>>>>>>>>>>>> logic
>> >>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) throw
>> >>>>>>>> exception
>> >>>>>>>>>>>>>>>>>> immediately
>> >>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error and
>> >>>>>>> drop
>> >>>>>>>>>> record
>> >>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a specific
>> >>>>>>> "error"
>> >>>>>>>>>> Kafka
>> >>>>>>>>>>>>>>>>> topic,
>> >>>>>>>>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>>>>>>> record it as an app-level metrics (
>> >>>>>>>>>>>>>>>>>>>>
>> https://kafka.apache.org/documentation/#kafka_streams_
>> >>>>>>>>>> monitoring
>> >>>>>>>>>>>>>> )
>> >>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>> monitoring.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Guozhang
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
>> >>>>>>> matt@frmr.me
>> >>>>>>>>>
>> >>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> I did some more digging tonight.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler uses
>> >>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for the
>> >>>>>>>> config
>> >>>>>>>>>>>>>>> name. No
>> >>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this
>> should
>> >>>>>>> use
>> >>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler".
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
>> >>>>>>> matt@frmr.me
>> >>>>>>>>>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized record is
>> >>>>>>>>> possible,
>> >>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>> changing
>> >>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly doable.
>> >>>>>>> However,
>> >>>>>>>>>>>>>>> because
>> >>>>>>>>>>>>>>>>>> we're
>> >>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have no
>> >>>>>>>> guarantee
>> >>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the time
>> this
>> >>>>>>>>>>>>>>> particular
>> >>>>>>>>>>>>>>>>>>>> handler
>> >>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
>> >>>>>>> something
>> >>>>>>>>>>>>>> like:
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
>> >>>>>>>>>>>>>>>>> ProducerRecord<..>
>> >>>>>>>>>>>>>>>>>>>>> record,
>> >>>>>>>>>>>>>>>>>>>>>> final Exception exception)
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Would this be acceptable?
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
>> >>>>>>>> matt@frmr.me>
>> >>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and
>> >>>>>>> return
>> >>>>>>>>> type
>> >>>>>>>>>>>>>>> but
>> >>>>>>>>>>>>>>>>>> I’m
>> >>>>>>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and the
>> >>>>>>> record
>> >>>>>>>>>>>>>>> itself
>> >>>>>>>>>>>>>>>>>>>> without
>> >>>>>>>>>>>>>>>>>>>>>>> other changes.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>> >>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
>> >>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> >>>>>>>> 161%3A+streams+
>> >>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design (parameter
>> >>>>>>> naming,
>> >>>>>>>>>>>>>>> return
>> >>>>>>>>>>>>>>>>>>>> types,
>> >>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a
>> >>>>>>> unified
>> >>>>>>>>>>>>>> user
>> >>>>>>>>>>>>>>>>>>>>> experience.
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll update
>> the
>> >>>>>>>> KIP
>> >>>>>>>>>>>>>>>>>>>> accordingly.
>> >>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>> >>>>>>>>>>>>>>> yuzhihong@gmail.com>
>> >>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the
>> KIP ?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
>> >>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
>> >>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name should
>> be
>> >>>>>>>>>>>>>>> specified.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
>> >>>>>>>>>>>>>>> matt@frmr.me>
>> >>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP
>> that I
>> >>>>>>>> just
>> >>>>>>>>>>>>>>> filed
>> >>>>>>>>>>>>>>>>>>>> here:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
>> >>>>>>>> error+handling++when+Kafka+
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback from
>> >>>>>>> folks
>> >>>>>>>>>>>>>> about
>> >>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>> idea
>> >>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can contribute
>> back.
>> >>>>>>> :)
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>>>>> -- Guozhang
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>>> -- Guozhang
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>> -- Guozhang
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> -- Guozhang
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> -- Guozhang
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >
>>
>>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
Hi there, sorry for the delay in responding. Last week had a holiday and
several busy work days in it so I'm just now getting around to responding.

> We would only exclude
> exception Streams can handle itself (like ProducerFencedException) --
> thus, if the handler has code to react to this, it would not be bad, as
> this code is just never called.
[...]
> Thus, I am still in favor of not calling the ProductionExceptionHandler
> for fatal exception.

Acknowledged, so is ProducerFencedException the only kind of exception I
need to change my behavior on? Or are there other types I need to check? Is
there a comprehensive list somewhere?

> About the "always continue" case. Sounds good to me to remove it (not
> sure why we need it in test package?)

I include it in the test package because I have tests that assert that if
the record collector impl encounters an Exception and receives a CONTINUE
that it actually does CONTINUE.

> What is there reasoning for invoking the handler only for the first
> exception?

I didn't want to invoke the handler in places where the CONTINUE or FAIL
result would just be ignored. Presumably, after a FAIL has been returned,
subsequent exceptions are likely to be repeats or noise from my
understanding of the code paths. If this is incorrect we can revisit.

Once I get the answers to these questions I can make another pass on the
pull request!

Matt

On Mon, Nov 20, 2017 at 4:07 PM Matthias J. Sax <ma...@confluent.io>
wrote:

> Thanks for following up!
>
> One thought about an older reply from you:
>
> >>>> I strongly disagree here. The purpose of this handler isn't *just* to
> >>>> make a decision for streams. There may also be desirable side effects
> that
> >>>> users wish to cause when production exceptions occur. There may be
> >>>> side-effects that they wish to cause when AuthenticationExceptions
> occur,
> >>>> as well. We should still give them the hooks to preform those side
> effects.
>
> And your follow up:
>
> >>    - I think I would rather invoke it for all exceptions that could
> occur
> >>    from attempting to produce - even those exceptions were returning
> CONTINUE
> >>    may not be a good idea (e.g. Authorization exception). Until there
> is a
> >>    different type for exceptions that are totally fatal (for example a
> >>    FatalStreamsException or some sort), maintaining a list of
> exceptions that
> >>    you can intercept with this handler and exceptions you cannot would
> be
> >>    really error-prone and hard to keep correct.
>
> I understand what you are saying, however, consider that Streams needs
> to die for a fatal exception. Thus, if you call the handler for a fatal
> exception, we would  need to ignore the return value and fail -- this
> seems to be rather intuitive. Furthermore, users can register an
> uncaught-exception-handler and side effects for fatal errors can be
> triggered there.
>
> Btw: an AuthorizationException is fatal -- not sure what you mean by an
> "totally fatal" exception -- there is no superlative to fatal from my
> understanding.
>
> About maintaining a list of exceptions: I don't think this is too hard,
> and users also don't need to worry about it IMHO. We would only exclude
> exception Streams can handle itself (like ProducerFencedException) --
> thus, if the handler has code to react to this, it would not be bad, as
> this code is just never called. In case that there is an exception
> Streams could actually handle but we still call the handler (ie, bug),
> there should be no harm either -- the handler needs to return either
> CONTINUE or FAIL and we would obey. It could only happen, that Streams
> dies---as request by the user(!)---even if Streams could actually handle
> the error and move on. But this seems to be not a issue from my point of
> view.
>
> Thus, I am still in favor of not calling the ProductionExceptionHandler
> for fatal exception.
>
>
>
> About the "always continue" case. Sounds good to me to remove it (not
> sure why we need it in test package?) and to rename the "failing"
> handler to "Default" (even if "default" is less descriptive and I would
> still prefer "Fail" in the name).
>
>
> Last question:
>
> >>       - Continue to *only* invoke it on the first exception that we
> >>       encounter (before sendException is set)
>
>
> What is there reasoning for invoking the handler only for the first
> exception?
>
>
>
>
> -Matthias
>
> On 11/20/17 10:43 AM, Matt Farmer wrote:
> > Alright, here are some updates I'm planning to make after thinking on
> this
> > for awhile:
> >
> >    - Given that the "always continue" handler isn't something I'd
> recommend
> >    for production use as is, I'm going to move it into the test
> namespace and
> >    remove it from mention in the public API.
> >    - I'm going to rename the "AlwaysFailProductionExceptionHandler" to
> >    "DefaultProductionExceptionHandler"
> >    - Given that the API for the exception handler involves being invoked
> by
> >    streams to make a decision about whether or not to continue — I think
> that
> >    we should:
> >       - Continue to *only* invoke it on the first exception that we
> >       encounter (before sendException is set)
> >       - Stop invoking it for the self-healing fenced exceptions.
> >    - I think I would rather invoke it for all exceptions that could occur
> >    from attempting to produce - even those exceptions were returning
> CONTINUE
> >    may not be a good idea (e.g. Authorization exception). Until there is
> a
> >    different type for exceptions that are totally fatal (for example a
> >    FatalStreamsException or some sort), maintaining a list of exceptions
> that
> >    you can intercept with this handler and exceptions you cannot would be
> >    really error-prone and hard to keep correct.
> >       - I'm happy to file a KIP for the creation of this new Exception
> type
> >       if there is interest.
> >
> > @ Matthias — What do you think about the above?
> >
> > On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:
> >
> >> I responded before reading your code review and didn't see the bit about
> >> how ProducerFencedException is self-healing. This error handling logic
> is
> >> *quite* confusing to reason about... I think I'm going to sit down with
> >> the code a bit more today, but I'm inclined to think that if the fenced
> >> exceptions are, indeed, self healing that we still invoke the handler
> but
> >> ignore its result for those exceptions.
> >>
> >> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me> wrote:
> >>
> >>> Hi there,
> >>>
> >>> Following up here...
> >>>
> >>>> One tiny comment: I would prefer to remove the "Always" from the
> >>> handler implementation names -- it sounds "cleaner" to me without it.
> >>>
> >>> Let me think on this. I generally prefer expressiveness to clean-ness,
> >>> and I think that comes out in the names I chose for things. The straw
> man
> >>> in my head is the person that tried to substitute in the
> "AlwaysContinue"
> >>> variant and the "Always" is a trigger to really consider whether or not
> >>> they *always* want to try to continue.
> >>>
> >>> To be truthful, after some thought, using the "AlwaysContinue" variant
> >>> isn't something I'd recommend generally in a production system. Ideally
> >>> these handlers are targeted at handling a specific series of exceptions
> >>> that a user wants to ignore, and not ignoring all exceptions. More on
> this
> >>> in a minute.
> >>>
> >>>> For the first category, it seems to not make sense to call the handle
> but
> >>> Streams should always fail. If we follow this design, the KIP should
> list
> >>> all exceptions for which the handler is not called.
> >>>
> >>> I strongly disagree here. The purpose of this handler isn't *just* to
> >>> make a decision for streams. There may also be desirable side effects
> that
> >>> users wish to cause when production exceptions occur. There may be
> >>> side-effects that they wish to cause when AuthenticationExceptions
> occur,
> >>> as well. We should still give them the hooks to preform those side
> effects.
> >>>
> >>> In light of the above, I'm thinking that the
> >>> "AlwaysContinueProductionExceptionHandler" variant should probably be
> >>> removed from the public API and moved into tests since that's where
> it's
> >>> most useful. The more I think on it, the more I feel that having that
> in
> >>> the public API is a landmine. If you agree, then, we can rename the
> >>> "AlwaysFailProductionExceptionHandler" to
> >>> "DefaultProductionExceptionHandler".
> >>>
> >>> Thoughts?
> >>>
> >>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <matthias@confluent.io
> >
> >>> wrote:
> >>>
> >>>> I just review the PR, and there is one thing we should discuss.
> >>>>
> >>>> There are different types of exceptions that could occur. Some apply
> to
> >>>> all records (like Authorization exception) while others are for single
> >>>> records only (like record too large).
> >>>>
> >>>> For the first category, it seems to not make sense to call the handle
> >>>> but Streams should always fail. If we follow this design, the KIP
> should
> >>>> list all exceptions for which the handler is not called.
> >>>>
> >>>> WDYT?
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
> >>>>> Just catching up on this KIP.
> >>>>>
> >>>>> One tiny comment: I would prefer to remove the "Always" from the
> >>>> handler
> >>>>> implementation names -- it sounds "cleaner" to me without it.
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>> On 11/5/17 12:57 PM, Matt Farmer wrote:
> >>>>>> It is agreed, then. I've updated the pull request. I'm trying to
> also
> >>>>>> update the KIP accordingly, but cwiki is being slow and dropping
> >>>>>> connections..... I'll try again a bit later but please consider the
> >>>> KIP
> >>>>>> updated for all intents and purposes. heh.
> >>>>>>
> >>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wa...@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> That makes sense.
> >>>>>>>
> >>>>>>>
> >>>>>>> Guozhang
> >>>>>>>
> >>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me> wrote:
> >>>>>>>
> >>>>>>>> Interesting. I'm not sure I agree. I've been bitten many times by
> >>>>>>>> unintentionally shipping code that fails to properly implement
> >>>> logging. I
> >>>>>>>> always discover this at the exact *worst* moment, too. (Normally
> at
> >>>> 3 AM
> >>>>>>>> during an on-call shift. Hah.) However, if others feel the same
> way
> >>>> I
> >>>>>>> could
> >>>>>>>> probably be convinced to remove it.
> >>>>>>>>
> >>>>>>>> We could also meet halfway and say that when a customized
> >>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE, we log
> at
> >>>> DEBUG
> >>>>>>>> level instead of WARN level. Would that alternative be appealing
> to
> >>>> you?
> >>>>>>>>
> >>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wangguoz@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Thanks for the updates. I made a pass over the wiki again and it
> >>>> looks
> >>>>>>>>> good.
> >>>>>>>>>
> >>>>>>>>> About whether record collector should still internally log the
> >>>> error in
> >>>>>>>>> addition to what the customized ProductionExceptionHandler does.
> I
> >>>>>>>>> personally would prefer only to log if the returned value is FAIL
> >>>> to
> >>>>>>>>> indicate that this thread is going to shutdown and trigger the
> >>>>>>> exception
> >>>>>>>>> handler.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Guozhang
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me>
> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updated this
> >>>> KIP as
> >>>>>>>>>> outlined above. The updated KIP is now ready for review again!
> >>>>>>>>>>
> >>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me>
> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Ah. I actually created both of those in the PR and forgot to
> >>>>>>> mention
> >>>>>>>>> them
> >>>>>>>>>>> by name in the KIP! Thanks for pointing out the oversight.
> >>>>>>>>>>>
> >>>>>>>>>>> I’ll revise the KIP this afternoon accordingly.
> >>>>>>>>>>>
> >>>>>>>>>>> The logging is actually provided for in the record collector.
> >>>>>>>> Whenever
> >>>>>>>>> a
> >>>>>>>>>>> handler continues it’ll log a warning to ensure that it’s
> >>>>>>>> *impossible*
> >>>>>>>>> to
> >>>>>>>>>>> write a handler that totally suppresses production exceptions
> >>>> from
> >>>>>>>> the
> >>>>>>>>>> log.
> >>>>>>>>>>> As such, the default continue handler just returns the continue
> >>>>>>>> value.
> >>>>>>>>> I
> >>>>>>>>>>> can add details about those semantics to the KIP as well.
> >>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
> >>>>>>>> matthias@confluent.io
> >>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> One more comment.
> >>>>>>>>>>>>
> >>>>>>>>>>>> You mention a default implementation for the handler that
> >>>> fails. I
> >>>>>>>>>>>> think, this should be part of the public API and thus should
> >>>> have
> >>>>>>> a
> >>>>>>>>>>>> proper defined named that is mentioned in the KIP.
> >>>>>>>>>>>>
> >>>>>>>>>>>> We could also add a second implementation for the
> >>>> log-and-move-on
> >>>>>>>>>>>> strategy, as both are the two most common cases. This class
> >>>> should
> >>>>>>>>> also
> >>>>>>>>>>>> be part of public API (so users can just set in the config)
> >>>> with a
> >>>>>>>>>>>> proper name.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
> >>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are
> compatible
> >>>>>>>> with
> >>>>>>>>>>>> that
> >>>>>>>>>>>>> PR, but there will be a merge conflict that happens whenever
> >>>> one
> >>>>>>>> of
> >>>>>>>>>> the
> >>>>>>>>>>>> PRs
> >>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR if 4148
> goes
> >>>>>>> in
> >>>>>>>>>>>> first. :)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
> >>>>>>> wangguoz@gmail.com
> >>>>>>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> That sounds reasonable, thanks Matt.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> As for the implementation, please note that there is another
> >>>>>>>>> ongoing
> >>>>>>>>>> PR
> >>>>>>>>>>>>>> that may touch the same classes that you are working on:
> >>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So it may help if you can also take a look at that PR and
> see
> >>>>>>> if
> >>>>>>>> it
> >>>>>>>>>> is
> >>>>>>>>>>>>>> compatible with your changes.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <matt@frmr.me
> >
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I've opened this pull request to implement the KIP as
> >>>>>>> currently
> >>>>>>>>>>>> written:
> >>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still needs
> >>>>>>> some
> >>>>>>>>>> tests
> >>>>>>>>>>>>>>> added,
> >>>>>>>>>>>>>>> but largely represents the shape I was going for.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If there are more points that folks would like to discuss,
> >>>>>>>> please
> >>>>>>>>>> let
> >>>>>>>>>>>> me
> >>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon I'll
> >>>>>>>> probably
> >>>>>>>>>>>> start
> >>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>> [VOTE] thread.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Matt
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
> >>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I can’t think of a reason that would be problematic.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Most of the time I would write a handler like this, I
> either
> >>>>>>>> want
> >>>>>>>>>> to
> >>>>>>>>>>>>>>>> ignore the error or fail and bring everything down so
> that I
> >>>>>>>> can
> >>>>>>>>>> spin
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>> back up later and resume from earlier offsets. When we
> start
> >>>>>>> up
> >>>>>>>>>> after
> >>>>>>>>>>>>>>>> crashing we’ll eventually try to process the message we
> >>>>>>> failed
> >>>>>>>> to
> >>>>>>>>>>>>>> produce
> >>>>>>>>>>>>>>>> again.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I’m concerned that “putting in a queue for later” opens
> you
> >>>>>>> up
> >>>>>>>> to
> >>>>>>>>>>>>>> putting
> >>>>>>>>>>>>>>>> messages into the destination topic in an unexpected
> order.
> >>>>>>>>> However
> >>>>>>>>>>>> if
> >>>>>>>>>>>>>>>> others feel differently, I’m happy to talk about it.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
> >>>>>>>>> wangguoz@gmail.com>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
> >>>>>>> that
> >>>>>>>>> the
> >>>>>>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred while
> >>>>>>> trying
> >>>>>>>>> to
> >>>>>>>>>>>>>>>>> produce.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> That is right. Thanks.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I looked at the example code, and one thing I realized
> that
> >>>>>>>>> since
> >>>>>>>>>> we
> >>>>>>>>>>>>>> are
> >>>>>>>>>>>>>>>>> not passing the context in the handle function, we may
> not
> >>>>>>> be
> >>>>>>>>>>>>>> implement
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> logic to send the fail records into another queue for
> >>>> future
> >>>>>>>>>>>>>> processing.
> >>>>>>>>>>>>>>>>> Would people think that would be a big issue?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
> >>>> matt@frmr.me
> >>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hello all,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation, and
> made
> >>>>>>> it
> >>>>>>>> so
> >>>>>>>>>>>> that
> >>>>>>>>>>>>>>> its
> >>>>>>>>>>>>>>>>>> interface, config setting, and parameters line up more
> >>>>>>>> closely
> >>>>>>>>>> with
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I believe there are a few specific questions I need to
> >>>>>>> reply
> >>>>>>>>>>>> to.....
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
> >>>> around
> >>>>>>>> the
> >>>>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
> >>>>>>>> generics
> >>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> >>>>>>>>> `ProducerRecord<?
> >>>>>>>>>>>>>>>>> extends
> >>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that this is
> a
> >>>>>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics would
> just
> >>>>>>>> make
> >>>>>>>>>> it
> >>>>>>>>>>>>>>>>> harder
> >>>>>>>>>>>>>>>>>> to work with the key and value.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Also, should the handle function include the
> >>>>>>>> `RecordMetadata`
> >>>>>>>>> as
> >>>>>>>>>>>>>>> well
> >>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> case it is not null?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
> >>>>>>> that
> >>>>>>>>> the
> >>>>>>>>>>>>>>> record
> >>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred while
> >>>>>>> trying
> >>>>>>>>> to
> >>>>>>>>>>>>>>>>> produce.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
> following
> >>>>>>>>>> handling
> >>>>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know what
> you
> >>>>>>>>> think.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>> Matt
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <
> matt@frmr.me
> >>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right now
> >>>>>>> and
> >>>>>>>> am
> >>>>>>>>>>>>>>>>> planning
> >>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>> updating the KIP again with details from this
> >>>> conversation
> >>>>>>>>> later
> >>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>> week.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
> >>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
> >>>>>>>>>> wangguoz@gmail.com
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Regarding the handle interface of
> >>>>>>>>>> ProductionExceptionHandlerResp
> >>>>>>>>>>>>>>> onse,
> >>>>>>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the actual
> >>>>>>> added
> >>>>>>>>>> config
> >>>>>>>>>>>>>>>>> names
> >>>>>>>>>>>>>>>>>>>> (e.g. what
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>> 161%3A+streams+
> >>>>>>>>>>>>>>>>>> deserialization+exception+handlers
> >>>>>>>>>>>>>>>>>>>> described).
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
> >>>>>>> around
> >>>>>>>>> the
> >>>>>>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
> >>>>>>>> generics
> >>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> >>>>>>>>> `ProducerRecord<?
> >>>>>>>>>>>>>>>>> extends
> >>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
> >>>>>>>> `RecordMetadata`
> >>>>>>>>>> as
> >>>>>>>>>>>>>>>>> well in
> >>>>>>>>>>>>>>>>>>>> case it is not null?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the
> following
> >>>>>>>>>> handling
> >>>>>>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) throw
> >>>>>>>> exception
> >>>>>>>>>>>>>>>>>> immediately
> >>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error and
> >>>>>>> drop
> >>>>>>>>>> record
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a specific
> >>>>>>> "error"
> >>>>>>>>>> Kafka
> >>>>>>>>>>>>>>>>> topic,
> >>>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>> record it as an app-level metrics (
> >>>>>>>>>>>>>>>>>>>>
> https://kafka.apache.org/documentation/#kafka_streams_
> >>>>>>>>>> monitoring
> >>>>>>>>>>>>>> )
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> monitoring.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
> >>>>>>> matt@frmr.me
> >>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I did some more digging tonight.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler uses
> >>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for the
> >>>>>>>> config
> >>>>>>>>>>>>>>> name. No
> >>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this
> should
> >>>>>>> use
> >>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler".
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
> >>>>>>> matt@frmr.me
> >>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized record is
> >>>>>>>>> possible,
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> changing
> >>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly doable.
> >>>>>>> However,
> >>>>>>>>>>>>>>> because
> >>>>>>>>>>>>>>>>>> we're
> >>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have no
> >>>>>>>> guarantee
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the time
> this
> >>>>>>>>>>>>>>> particular
> >>>>>>>>>>>>>>>>>>>> handler
> >>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
> >>>>>>> something
> >>>>>>>>>>>>>> like:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
> >>>>>>>>>>>>>>>>> ProducerRecord<..>
> >>>>>>>>>>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>>>>>>>> final Exception exception)
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Would this be acceptable?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
> >>>>>>>> matt@frmr.me>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and
> >>>>>>> return
> >>>>>>>>> type
> >>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>> I’m
> >>>>>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and the
> >>>>>>> record
> >>>>>>>>>>>>>>> itself
> >>>>>>>>>>>>>>>>>>>> without
> >>>>>>>>>>>>>>>>>>>>>>> other changes.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
> >>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>> 161%3A+streams+
> >>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design (parameter
> >>>>>>> naming,
> >>>>>>>>>>>>>>> return
> >>>>>>>>>>>>>>>>>>>> types,
> >>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a
> >>>>>>> unified
> >>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>>>> experience.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll update
> the
> >>>>>>>> KIP
> >>>>>>>>>>>>>>>>>>>> accordingly.
> >>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
> >>>>>>>>>>>>>>> yuzhihong@gmail.com>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP
> ?
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
> >>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
> >>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name should be
> >>>>>>>>>>>>>>> specified.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
> >>>>>>>>>>>>>>> matt@frmr.me>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP that
> I
> >>>>>>>> just
> >>>>>>>>>>>>>>> filed
> >>>>>>>>>>>>>>>>>>>> here:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
> >>>>>>>> error+handling++when+Kafka+
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback from
> >>>>>>> folks
> >>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>> idea
> >>>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can contribute
> back.
> >>>>>>> :)
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> -- Guozhang
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> -- Guozhang
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >
>
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Thanks for following up!

One thought about an older reply from you:

>>>> I strongly disagree here. The purpose of this handler isn't *just* to
>>>> make a decision for streams. There may also be desirable side effects that
>>>> users wish to cause when production exceptions occur. There may be
>>>> side-effects that they wish to cause when AuthenticationExceptions occur,
>>>> as well. We should still give them the hooks to preform those side effects.

And your follow up:

>>    - I think I would rather invoke it for all exceptions that could occur
>>    from attempting to produce - even those exceptions were returning CONTINUE
>>    may not be a good idea (e.g. Authorization exception). Until there is a
>>    different type for exceptions that are totally fatal (for example a
>>    FatalStreamsException or some sort), maintaining a list of exceptions that
>>    you can intercept with this handler and exceptions you cannot would be
>>    really error-prone and hard to keep correct.

I understand what you are saying, however, consider that Streams needs
to die for a fatal exception. Thus, if you call the handler for a fatal
exception, we would  need to ignore the return value and fail -- this
seems to be rather intuitive. Furthermore, users can register an
uncaught-exception-handler and side effects for fatal errors can be
triggered there.

Btw: an AuthorizationException is fatal -- not sure what you mean by an
"totally fatal" exception -- there is no superlative to fatal from my
understanding.

About maintaining a list of exceptions: I don't think this is too hard,
and users also don't need to worry about it IMHO. We would only exclude
exception Streams can handle itself (like ProducerFencedException) --
thus, if the handler has code to react to this, it would not be bad, as
this code is just never called. In case that there is an exception
Streams could actually handle but we still call the handler (ie, bug),
there should be no harm either -- the handler needs to return either
CONTINUE or FAIL and we would obey. It could only happen, that Streams
dies---as request by the user(!)---even if Streams could actually handle
the error and move on. But this seems to be not a issue from my point of
view.

Thus, I am still in favor of not calling the ProductionExceptionHandler
for fatal exception.



About the "always continue" case. Sounds good to me to remove it (not
sure why we need it in test package?) and to rename the "failing"
handler to "Default" (even if "default" is less descriptive and I would
still prefer "Fail" in the name).


Last question:

>>       - Continue to *only* invoke it on the first exception that we
>>       encounter (before sendException is set)


What is there reasoning for invoking the handler only for the first
exception?




-Matthias

On 11/20/17 10:43 AM, Matt Farmer wrote:
> Alright, here are some updates I'm planning to make after thinking on this
> for awhile:
> 
>    - Given that the "always continue" handler isn't something I'd recommend
>    for production use as is, I'm going to move it into the test namespace and
>    remove it from mention in the public API.
>    - I'm going to rename the "AlwaysFailProductionExceptionHandler" to
>    "DefaultProductionExceptionHandler"
>    - Given that the API for the exception handler involves being invoked by
>    streams to make a decision about whether or not to continue — I think that
>    we should:
>       - Continue to *only* invoke it on the first exception that we
>       encounter (before sendException is set)
>       - Stop invoking it for the self-healing fenced exceptions.
>    - I think I would rather invoke it for all exceptions that could occur
>    from attempting to produce - even those exceptions were returning CONTINUE
>    may not be a good idea (e.g. Authorization exception). Until there is a
>    different type for exceptions that are totally fatal (for example a
>    FatalStreamsException or some sort), maintaining a list of exceptions that
>    you can intercept with this handler and exceptions you cannot would be
>    really error-prone and hard to keep correct.
>       - I'm happy to file a KIP for the creation of this new Exception type
>       if there is interest.
> 
> @ Matthias — What do you think about the above?
> 
> On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:
> 
>> I responded before reading your code review and didn't see the bit about
>> how ProducerFencedException is self-healing. This error handling logic is
>> *quite* confusing to reason about... I think I'm going to sit down with
>> the code a bit more today, but I'm inclined to think that if the fenced
>> exceptions are, indeed, self healing that we still invoke the handler but
>> ignore its result for those exceptions.
>>
>> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me> wrote:
>>
>>> Hi there,
>>>
>>> Following up here...
>>>
>>>> One tiny comment: I would prefer to remove the "Always" from the
>>> handler implementation names -- it sounds "cleaner" to me without it.
>>>
>>> Let me think on this. I generally prefer expressiveness to clean-ness,
>>> and I think that comes out in the names I chose for things. The straw man
>>> in my head is the person that tried to substitute in the "AlwaysContinue"
>>> variant and the "Always" is a trigger to really consider whether or not
>>> they *always* want to try to continue.
>>>
>>> To be truthful, after some thought, using the "AlwaysContinue" variant
>>> isn't something I'd recommend generally in a production system. Ideally
>>> these handlers are targeted at handling a specific series of exceptions
>>> that a user wants to ignore, and not ignoring all exceptions. More on this
>>> in a minute.
>>>
>>>> For the first category, it seems to not make sense to call the handle but
>>> Streams should always fail. If we follow this design, the KIP should list
>>> all exceptions for which the handler is not called.
>>>
>>> I strongly disagree here. The purpose of this handler isn't *just* to
>>> make a decision for streams. There may also be desirable side effects that
>>> users wish to cause when production exceptions occur. There may be
>>> side-effects that they wish to cause when AuthenticationExceptions occur,
>>> as well. We should still give them the hooks to preform those side effects.
>>>
>>> In light of the above, I'm thinking that the
>>> "AlwaysContinueProductionExceptionHandler" variant should probably be
>>> removed from the public API and moved into tests since that's where it's
>>> most useful. The more I think on it, the more I feel that having that in
>>> the public API is a landmine. If you agree, then, we can rename the
>>> "AlwaysFailProductionExceptionHandler" to
>>> "DefaultProductionExceptionHandler".
>>>
>>> Thoughts?
>>>
>>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <ma...@confluent.io>
>>> wrote:
>>>
>>>> I just review the PR, and there is one thing we should discuss.
>>>>
>>>> There are different types of exceptions that could occur. Some apply to
>>>> all records (like Authorization exception) while others are for single
>>>> records only (like record too large).
>>>>
>>>> For the first category, it seems to not make sense to call the handle
>>>> but Streams should always fail. If we follow this design, the KIP should
>>>> list all exceptions for which the handler is not called.
>>>>
>>>> WDYT?
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
>>>>> Just catching up on this KIP.
>>>>>
>>>>> One tiny comment: I would prefer to remove the "Always" from the
>>>> handler
>>>>> implementation names -- it sounds "cleaner" to me without it.
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>> On 11/5/17 12:57 PM, Matt Farmer wrote:
>>>>>> It is agreed, then. I've updated the pull request. I'm trying to also
>>>>>> update the KIP accordingly, but cwiki is being slow and dropping
>>>>>> connections..... I'll try again a bit later but please consider the
>>>> KIP
>>>>>> updated for all intents and purposes. heh.
>>>>>>
>>>>>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wa...@gmail.com>
>>>> wrote:
>>>>>>
>>>>>>> That makes sense.
>>>>>>>
>>>>>>>
>>>>>>> Guozhang
>>>>>>>
>>>>>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me> wrote:
>>>>>>>
>>>>>>>> Interesting. I'm not sure I agree. I've been bitten many times by
>>>>>>>> unintentionally shipping code that fails to properly implement
>>>> logging. I
>>>>>>>> always discover this at the exact *worst* moment, too. (Normally at
>>>> 3 AM
>>>>>>>> during an on-call shift. Hah.) However, if others feel the same way
>>>> I
>>>>>>> could
>>>>>>>> probably be convinced to remove it.
>>>>>>>>
>>>>>>>> We could also meet halfway and say that when a customized
>>>>>>>> ProductionExceptionHandler instructs Streams to CONTINUE, we log at
>>>> DEBUG
>>>>>>>> level instead of WARN level. Would that alternative be appealing to
>>>> you?
>>>>>>>>
>>>>>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wa...@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for the updates. I made a pass over the wiki again and it
>>>> looks
>>>>>>>>> good.
>>>>>>>>>
>>>>>>>>> About whether record collector should still internally log the
>>>> error in
>>>>>>>>> addition to what the customized ProductionExceptionHandler does. I
>>>>>>>>> personally would prefer only to log if the returned value is FAIL
>>>> to
>>>>>>>>> indicate that this thread is going to shutdown and trigger the
>>>>>>> exception
>>>>>>>>> handler.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Guozhang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me> wrote:
>>>>>>>>>
>>>>>>>>>> Hello, a bit later than I'd anticipated, but I've updated this
>>>> KIP as
>>>>>>>>>> outlined above. The updated KIP is now ready for review again!
>>>>>>>>>>
>>>>>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me> wrote:
>>>>>>>>>>
>>>>>>>>>>> Ah. I actually created both of those in the PR and forgot to
>>>>>>> mention
>>>>>>>>> them
>>>>>>>>>>> by name in the KIP! Thanks for pointing out the oversight.
>>>>>>>>>>>
>>>>>>>>>>> I’ll revise the KIP this afternoon accordingly.
>>>>>>>>>>>
>>>>>>>>>>> The logging is actually provided for in the record collector.
>>>>>>>> Whenever
>>>>>>>>> a
>>>>>>>>>>> handler continues it’ll log a warning to ensure that it’s
>>>>>>>> *impossible*
>>>>>>>>> to
>>>>>>>>>>> write a handler that totally suppresses production exceptions
>>>> from
>>>>>>>> the
>>>>>>>>>> log.
>>>>>>>>>>> As such, the default continue handler just returns the continue
>>>>>>>> value.
>>>>>>>>> I
>>>>>>>>>>> can add details about those semantics to the KIP as well.
>>>>>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
>>>>>>>> matthias@confluent.io
>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> One more comment.
>>>>>>>>>>>>
>>>>>>>>>>>> You mention a default implementation for the handler that
>>>> fails. I
>>>>>>>>>>>> think, this should be part of the public API and thus should
>>>> have
>>>>>>> a
>>>>>>>>>>>> proper defined named that is mentioned in the KIP.
>>>>>>>>>>>>
>>>>>>>>>>>> We could also add a second implementation for the
>>>> log-and-move-on
>>>>>>>>>>>> strategy, as both are the two most common cases. This class
>>>> should
>>>>>>>>> also
>>>>>>>>>>>> be part of public API (so users can just set in the config)
>>>> with a
>>>>>>>>>>>> proper name.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>>>>>>>>>>>>> Thanks for the heads up. Yes, I think my changes are compatible
>>>>>>>> with
>>>>>>>>>>>> that
>>>>>>>>>>>>> PR, but there will be a merge conflict that happens whenever
>>>> one
>>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>>>> PRs
>>>>>>>>>>>>> is merged. Happy to reconcile the changes in my PR if 4148 goes
>>>>>>> in
>>>>>>>>>>>> first. :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
>>>>>>> wangguoz@gmail.com
>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> That sounds reasonable, thanks Matt.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for the implementation, please note that there is another
>>>>>>>>> ongoing
>>>>>>>>>> PR
>>>>>>>>>>>>>> that may touch the same classes that you are working on:
>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4148
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So it may help if you can also take a look at that PR and see
>>>>>>> if
>>>>>>>> it
>>>>>>>>>> is
>>>>>>>>>>>>>> compatible with your changes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me>
>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've opened this pull request to implement the KIP as
>>>>>>> currently
>>>>>>>>>>>> written:
>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still needs
>>>>>>> some
>>>>>>>>>> tests
>>>>>>>>>>>>>>> added,
>>>>>>>>>>>>>>> but largely represents the shape I was going for.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If there are more points that folks would like to discuss,
>>>>>>>> please
>>>>>>>>>> let
>>>>>>>>>>>> me
>>>>>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon I'll
>>>>>>>> probably
>>>>>>>>>>>> start
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> [VOTE] thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I can’t think of a reason that would be problematic.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Most of the time I would write a handler like this, I either
>>>>>>>> want
>>>>>>>>>> to
>>>>>>>>>>>>>>>> ignore the error or fail and bring everything down so that I
>>>>>>>> can
>>>>>>>>>> spin
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> back up later and resume from earlier offsets. When we start
>>>>>>> up
>>>>>>>>>> after
>>>>>>>>>>>>>>>> crashing we’ll eventually try to process the message we
>>>>>>> failed
>>>>>>>> to
>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>>> again.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I’m concerned that “putting in a queue for later” opens you
>>>>>>> up
>>>>>>>> to
>>>>>>>>>>>>>> putting
>>>>>>>>>>>>>>>> messages into the destination topic in an unexpected order.
>>>>>>>>> However
>>>>>>>>>>>> if
>>>>>>>>>>>>>>>> others feel differently, I’m happy to talk about it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
>>>>>>>>> wangguoz@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>>>>>>> that
>>>>>>>>> the
>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred while
>>>>>>> trying
>>>>>>>>> to
>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> That is right. Thanks.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I looked at the example code, and one thing I realized that
>>>>>>>>> since
>>>>>>>>>> we
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> not passing the context in the handle function, we may not
>>>>>>> be
>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> logic to send the fail records into another queue for
>>>> future
>>>>>>>>>>>>>> processing.
>>>>>>>>>>>>>>>>> Would people think that would be a big issue?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
>>>> matt@frmr.me
>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've updated the KIP based on this conversation, and made
>>>>>>> it
>>>>>>>> so
>>>>>>>>>>>> that
>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>>>>> interface, config setting, and parameters line up more
>>>>>>>> closely
>>>>>>>>>> with
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I believe there are a few specific questions I need to
>>>>>>> reply
>>>>>>>>>>>> to.....
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
>>>> around
>>>>>>>> the
>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>>>>>>> generics
>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> At this point in the code we're guaranteed that this is a
>>>>>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics would just
>>>>>>>> make
>>>>>>>>>> it
>>>>>>>>>>>>>>>>> harder
>>>>>>>>>>>>>>>>>> to work with the key and value.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>> `RecordMetadata`
>>>>>>>>> as
>>>>>>>>>>>>>>> well
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>>>>>>> that
>>>>>>>>> the
>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>>>>> metadata is always null if an exception occurred while
>>>>>>> trying
>>>>>>>>> to
>>>>>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the following
>>>>>>>>>> handling
>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know what you
>>>>>>>>> think.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <matt@frmr.me
>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right now
>>>>>>> and
>>>>>>>> am
>>>>>>>>>>>>>>>>> planning
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> updating the KIP again with details from this
>>>> conversation
>>>>>>>>> later
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
>>>>>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
>>>>>>>>>> wangguoz@gmail.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Regarding the handle interface of
>>>>>>>>>> ProductionExceptionHandlerResp
>>>>>>>>>>>>>>> onse,
>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>> you write it on the wiki also, along with the actual
>>>>>>> added
>>>>>>>>>> config
>>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>>>>> (e.g. what
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>> described).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The question I had about then handle parameters are
>>>>>>> around
>>>>>>>>> the
>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>>>>>>> generics
>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>>>>>> `ProducerRecord<?
>>>>>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>>>>>> `RecordMetadata`
>>>>>>>>>> as
>>>>>>>>>>>>>>>>> well in
>>>>>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We may probably try to write down at least the following
>>>>>>>>>> handling
>>>>>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) throw
>>>>>>>> exception
>>>>>>>>>>>>>>>>>> immediately
>>>>>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error and
>>>>>>> drop
>>>>>>>>>> record
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a specific
>>>>>>> "error"
>>>>>>>>>> Kafka
>>>>>>>>>>>>>>>>> topic,
>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>> record it as an app-level metrics (
>>>>>>>>>>>>>>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_
>>>>>>>>>> monitoring
>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> monitoring.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
>>>>>>> matt@frmr.me
>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I did some more digging tonight.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler uses
>>>>>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for the
>>>>>>>> config
>>>>>>>>>>>>>>> name. No
>>>>>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this should
>>>>>>> use
>>>>>>>>>>>>>>>>>>>>> "default.production.exception.handler".
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
>>>>>>> matt@frmr.me
>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I think getting access to the serialized record is
>>>>>>>>> possible,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> changing
>>>>>>>>>>>>>>>>>>>>>> the naming and return type is certainly doable.
>>>>>>> However,
>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>> we're
>>>>>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have no
>>>>>>>> guarantee
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the time this
>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
>>>>>>> something
>>>>>>>>>>>>>> like:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
>>>>>>>>>>>>>>>>> ProducerRecord<..>
>>>>>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>>>>>> final Exception exception)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Would this be acceptable?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and
>>>>>>> return
>>>>>>>>> type
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> I’m
>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and the
>>>>>>> record
>>>>>>>>>>>>>>> itself
>>>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>>>>>> other changes.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>>>>>>>>>>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design (parameter
>>>>>>> naming,
>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>>>> types,
>>>>>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a
>>>>>>> unified
>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>> experience.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll update the
>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>> accordingly.
>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>>>>>>>>>>>>>>> yuzhihong@gmail.com>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
>>>>>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
>>>>>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name should be
>>>>>>>>>>>>>>> specified.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
>>>>>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP that I
>>>>>>>> just
>>>>>>>>>>>>>>> filed
>>>>>>>>>>>>>>>>>>>> here:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
>>>>>>>> error+handling++when+Kafka+
>>>>>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback from
>>>>>>> folks
>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can contribute back.
>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> -- Guozhang
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -- Guozhang
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
> 


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
Alright, here are some updates I'm planning to make after thinking on this
for awhile:

   - Given that the "always continue" handler isn't something I'd recommend
   for production use as is, I'm going to move it into the test namespace and
   remove it from mention in the public API.
   - I'm going to rename the "AlwaysFailProductionExceptionHandler" to
   "DefaultProductionExceptionHandler"
   - Given that the API for the exception handler involves being invoked by
   streams to make a decision about whether or not to continue — I think that
   we should:
      - Continue to *only* invoke it on the first exception that we
      encounter (before sendException is set)
      - Stop invoking it for the self-healing fenced exceptions.
   - I think I would rather invoke it for all exceptions that could occur
   from attempting to produce - even those exceptions were returning CONTINUE
   may not be a good idea (e.g. Authorization exception). Until there is a
   different type for exceptions that are totally fatal (for example a
   FatalStreamsException or some sort), maintaining a list of exceptions that
   you can intercept with this handler and exceptions you cannot would be
   really error-prone and hard to keep correct.
      - I'm happy to file a KIP for the creation of this new Exception type
      if there is interest.

@ Matthias — What do you think about the above?

On Tue, Nov 14, 2017 at 9:44 AM Matt Farmer <ma...@frmr.me> wrote:

> I responded before reading your code review and didn't see the bit about
> how ProducerFencedException is self-healing. This error handling logic is
> *quite* confusing to reason about... I think I'm going to sit down with
> the code a bit more today, but I'm inclined to think that if the fenced
> exceptions are, indeed, self healing that we still invoke the handler but
> ignore its result for those exceptions.
>
> On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me> wrote:
>
>> Hi there,
>>
>> Following up here...
>>
>> > One tiny comment: I would prefer to remove the "Always" from the
>> handler implementation names -- it sounds "cleaner" to me without it.
>>
>> Let me think on this. I generally prefer expressiveness to clean-ness,
>> and I think that comes out in the names I chose for things. The straw man
>> in my head is the person that tried to substitute in the "AlwaysContinue"
>> variant and the "Always" is a trigger to really consider whether or not
>> they *always* want to try to continue.
>>
>> To be truthful, after some thought, using the "AlwaysContinue" variant
>> isn't something I'd recommend generally in a production system. Ideally
>> these handlers are targeted at handling a specific series of exceptions
>> that a user wants to ignore, and not ignoring all exceptions. More on this
>> in a minute.
>>
>> > For the first category, it seems to not make sense to call the handle but
>> Streams should always fail. If we follow this design, the KIP should list
>> all exceptions for which the handler is not called.
>>
>> I strongly disagree here. The purpose of this handler isn't *just* to
>> make a decision for streams. There may also be desirable side effects that
>> users wish to cause when production exceptions occur. There may be
>> side-effects that they wish to cause when AuthenticationExceptions occur,
>> as well. We should still give them the hooks to preform those side effects.
>>
>> In light of the above, I'm thinking that the
>> "AlwaysContinueProductionExceptionHandler" variant should probably be
>> removed from the public API and moved into tests since that's where it's
>> most useful. The more I think on it, the more I feel that having that in
>> the public API is a landmine. If you agree, then, we can rename the
>> "AlwaysFailProductionExceptionHandler" to
>> "DefaultProductionExceptionHandler".
>>
>> Thoughts?
>>
>> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>
>>> I just review the PR, and there is one thing we should discuss.
>>>
>>> There are different types of exceptions that could occur. Some apply to
>>> all records (like Authorization exception) while others are for single
>>> records only (like record too large).
>>>
>>> For the first category, it seems to not make sense to call the handle
>>> but Streams should always fail. If we follow this design, the KIP should
>>> list all exceptions for which the handler is not called.
>>>
>>> WDYT?
>>>
>>>
>>> -Matthias
>>>
>>>
>>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
>>> > Just catching up on this KIP.
>>> >
>>> > One tiny comment: I would prefer to remove the "Always" from the
>>> handler
>>> > implementation names -- it sounds "cleaner" to me without it.
>>> >
>>> >
>>> > -Matthias
>>> >
>>> > On 11/5/17 12:57 PM, Matt Farmer wrote:
>>> >> It is agreed, then. I've updated the pull request. I'm trying to also
>>> >> update the KIP accordingly, but cwiki is being slow and dropping
>>> >> connections..... I'll try again a bit later but please consider the
>>> KIP
>>> >> updated for all intents and purposes. heh.
>>> >>
>>> >> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wa...@gmail.com>
>>> wrote:
>>> >>
>>> >>> That makes sense.
>>> >>>
>>> >>>
>>> >>> Guozhang
>>> >>>
>>> >>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me> wrote:
>>> >>>
>>> >>>> Interesting. I'm not sure I agree. I've been bitten many times by
>>> >>>> unintentionally shipping code that fails to properly implement
>>> logging. I
>>> >>>> always discover this at the exact *worst* moment, too. (Normally at
>>> 3 AM
>>> >>>> during an on-call shift. Hah.) However, if others feel the same way
>>> I
>>> >>> could
>>> >>>> probably be convinced to remove it.
>>> >>>>
>>> >>>> We could also meet halfway and say that when a customized
>>> >>>> ProductionExceptionHandler instructs Streams to CONTINUE, we log at
>>> DEBUG
>>> >>>> level instead of WARN level. Would that alternative be appealing to
>>> you?
>>> >>>>
>>> >>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wa...@gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>>> Thanks for the updates. I made a pass over the wiki again and it
>>> looks
>>> >>>>> good.
>>> >>>>>
>>> >>>>> About whether record collector should still internally log the
>>> error in
>>> >>>>> addition to what the customized ProductionExceptionHandler does. I
>>> >>>>> personally would prefer only to log if the returned value is FAIL
>>> to
>>> >>>>> indicate that this thread is going to shutdown and trigger the
>>> >>> exception
>>> >>>>> handler.
>>> >>>>>
>>> >>>>>
>>> >>>>> Guozhang
>>> >>>>>
>>> >>>>>
>>> >>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me> wrote:
>>> >>>>>
>>> >>>>>> Hello, a bit later than I'd anticipated, but I've updated this
>>> KIP as
>>> >>>>>> outlined above. The updated KIP is now ready for review again!
>>> >>>>>>
>>> >>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me> wrote:
>>> >>>>>>
>>> >>>>>>> Ah. I actually created both of those in the PR and forgot to
>>> >>> mention
>>> >>>>> them
>>> >>>>>>> by name in the KIP! Thanks for pointing out the oversight.
>>> >>>>>>>
>>> >>>>>>> I’ll revise the KIP this afternoon accordingly.
>>> >>>>>>>
>>> >>>>>>> The logging is actually provided for in the record collector.
>>> >>>> Whenever
>>> >>>>> a
>>> >>>>>>> handler continues it’ll log a warning to ensure that it’s
>>> >>>> *impossible*
>>> >>>>> to
>>> >>>>>>> write a handler that totally suppresses production exceptions
>>> from
>>> >>>> the
>>> >>>>>> log.
>>> >>>>>>> As such, the default continue handler just returns the continue
>>> >>>> value.
>>> >>>>> I
>>> >>>>>>> can add details about those semantics to the KIP as well.
>>> >>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
>>> >>>> matthias@confluent.io
>>> >>>>>>
>>> >>>>>>> wrote:
>>> >>>>>>>
>>> >>>>>>>> One more comment.
>>> >>>>>>>>
>>> >>>>>>>> You mention a default implementation for the handler that
>>> fails. I
>>> >>>>>>>> think, this should be part of the public API and thus should
>>> have
>>> >>> a
>>> >>>>>>>> proper defined named that is mentioned in the KIP.
>>> >>>>>>>>
>>> >>>>>>>> We could also add a second implementation for the
>>> log-and-move-on
>>> >>>>>>>> strategy, as both are the two most common cases. This class
>>> should
>>> >>>>> also
>>> >>>>>>>> be part of public API (so users can just set in the config)
>>> with a
>>> >>>>>>>> proper name.
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> Otherwise, I like the KIP a lot! Thanks.
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> -Matthias
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>>> >>>>>>>>> Thanks for the heads up. Yes, I think my changes are compatible
>>> >>>> with
>>> >>>>>>>> that
>>> >>>>>>>>> PR, but there will be a merge conflict that happens whenever
>>> one
>>> >>>> of
>>> >>>>>> the
>>> >>>>>>>> PRs
>>> >>>>>>>>> is merged. Happy to reconcile the changes in my PR if 4148 goes
>>> >>> in
>>> >>>>>>>> first. :)
>>> >>>>>>>>>
>>> >>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
>>> >>> wangguoz@gmail.com
>>> >>>>>
>>> >>>>>>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>> That sounds reasonable, thanks Matt.
>>> >>>>>>>>>>
>>> >>>>>>>>>> As for the implementation, please note that there is another
>>> >>>>> ongoing
>>> >>>>>> PR
>>> >>>>>>>>>> that may touch the same classes that you are working on:
>>> >>>>>>>>>> https://github.com/apache/kafka/pull/4148
>>> >>>>>>>>>>
>>> >>>>>>>>>> So it may help if you can also take a look at that PR and see
>>> >>> if
>>> >>>> it
>>> >>>>>> is
>>> >>>>>>>>>> compatible with your changes.
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> Guozhang
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me>
>>> >>>>> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>> I've opened this pull request to implement the KIP as
>>> >>> currently
>>> >>>>>>>> written:
>>> >>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still needs
>>> >>> some
>>> >>>>>> tests
>>> >>>>>>>>>>> added,
>>> >>>>>>>>>>> but largely represents the shape I was going for.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> If there are more points that folks would like to discuss,
>>> >>>> please
>>> >>>>>> let
>>> >>>>>>>> me
>>> >>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon I'll
>>> >>>> probably
>>> >>>>>>>> start
>>> >>>>>>>>>> a
>>> >>>>>>>>>>> [VOTE] thread.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Thanks,
>>> >>>>>>>>>>> Matt
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
>>> >>>> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> I can’t think of a reason that would be problematic.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> Most of the time I would write a handler like this, I either
>>> >>>> want
>>> >>>>>> to
>>> >>>>>>>>>>>> ignore the error or fail and bring everything down so that I
>>> >>>> can
>>> >>>>>> spin
>>> >>>>>>>>>> it
>>> >>>>>>>>>>>> back up later and resume from earlier offsets. When we start
>>> >>> up
>>> >>>>>> after
>>> >>>>>>>>>>>> crashing we’ll eventually try to process the message we
>>> >>> failed
>>> >>>> to
>>> >>>>>>>>>> produce
>>> >>>>>>>>>>>> again.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> I’m concerned that “putting in a queue for later” opens you
>>> >>> up
>>> >>>> to
>>> >>>>>>>>>> putting
>>> >>>>>>>>>>>> messages into the destination topic in an unexpected order.
>>> >>>>> However
>>> >>>>>>>> if
>>> >>>>>>>>>>>> others feel differently, I’m happy to talk about it.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
>>> >>>>> wangguoz@gmail.com>
>>> >>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>>> >>> that
>>> >>>>> the
>>> >>>>>>>>>>> record
>>> >>>>>>>>>>>>>> metadata is always null if an exception occurred while
>>> >>> trying
>>> >>>>> to
>>> >>>>>>>>>>>>> produce.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> That is right. Thanks.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> I looked at the example code, and one thing I realized that
>>> >>>>> since
>>> >>>>>> we
>>> >>>>>>>>>> are
>>> >>>>>>>>>>>>> not passing the context in the handle function, we may not
>>> >>> be
>>> >>>>>>>>>> implement
>>> >>>>>>>>>>>>> the
>>> >>>>>>>>>>>>> logic to send the fail records into another queue for
>>> future
>>> >>>>>>>>>> processing.
>>> >>>>>>>>>>>>> Would people think that would be a big issue?
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Guozhang
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <
>>> matt@frmr.me
>>> >>>>
>>> >>>>>> wrote:
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Hello all,
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> I've updated the KIP based on this conversation, and made
>>> >>> it
>>> >>>> so
>>> >>>>>>>> that
>>> >>>>>>>>>>> its
>>> >>>>>>>>>>>>>> interface, config setting, and parameters line up more
>>> >>>> closely
>>> >>>>>> with
>>> >>>>>>>>>>> the
>>> >>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> I believe there are a few specific questions I need to
>>> >>> reply
>>> >>>>>>>> to.....
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> The question I had about then handle parameters are
>>> around
>>> >>>> the
>>> >>>>>>>>>>> record,
>>> >>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>> >>>> generics
>>> >>>>> of
>>> >>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>> >>>>> `ProducerRecord<?
>>> >>>>>>>>>>>>> extends
>>> >>>>>>>>>>>>>>> Object, ? extends Object>`?
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> At this point in the code we're guaranteed that this is a
>>> >>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics would just
>>> >>>> make
>>> >>>>>> it
>>> >>>>>>>>>>>>> harder
>>> >>>>>>>>>>>>>> to work with the key and value.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Also, should the handle function include the
>>> >>>> `RecordMetadata`
>>> >>>>> as
>>> >>>>>>>>>>> well
>>> >>>>>>>>>>>>> in
>>> >>>>>>>>>>>>>>> case it is not null?
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>>> >>> that
>>> >>>>> the
>>> >>>>>>>>>>> record
>>> >>>>>>>>>>>>>> metadata is always null if an exception occurred while
>>> >>> trying
>>> >>>>> to
>>> >>>>>>>>>>>>> produce.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> We may probably try to write down at least the following
>>> >>>>>> handling
>>> >>>>>>>>>>>>> logic
>>> >>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>> see if the given API is sufficient for it
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> I've added some examples to the KIP. Let me know what you
>>> >>>>> think.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Cheers,
>>> >>>>>>>>>>>>>> Matt
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <matt@frmr.me
>>> >
>>> >>>>>> wrote:
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right now
>>> >>> and
>>> >>>> am
>>> >>>>>>>>>>>>> planning
>>> >>>>>>>>>>>>>> on
>>> >>>>>>>>>>>>>>> updating the KIP again with details from this
>>> conversation
>>> >>>>> later
>>> >>>>>>>>>>> this
>>> >>>>>>>>>>>>>> week.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
>>> >>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
>>> >>>>>> wangguoz@gmail.com
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Thanks for the KIP Matt.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Regarding the handle interface of
>>> >>>>>> ProductionExceptionHandlerResp
>>> >>>>>>>>>>> onse,
>>> >>>>>>>>>>>>>>>> could
>>> >>>>>>>>>>>>>>>> you write it on the wiki also, along with the actual
>>> >>> added
>>> >>>>>> config
>>> >>>>>>>>>>>>> names
>>> >>>>>>>>>>>>>>>> (e.g. what
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> >>>> 161%3A+streams+
>>> >>>>>>>>>>>>>> deserialization+exception+handlers
>>> >>>>>>>>>>>>>>>> described).
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> The question I had about then handle parameters are
>>> >>> around
>>> >>>>> the
>>> >>>>>>>>>>>>> record,
>>> >>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>> >>>> generics
>>> >>>>>> of
>>> >>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>> >>>>> `ProducerRecord<?
>>> >>>>>>>>>>>>> extends
>>> >>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Also, should the handle function include the
>>> >>>> `RecordMetadata`
>>> >>>>>> as
>>> >>>>>>>>>>>>> well in
>>> >>>>>>>>>>>>>>>> case it is not null?
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> We may probably try to write down at least the following
>>> >>>>>> handling
>>> >>>>>>>>>>>>> logic
>>> >>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) throw
>>> >>>> exception
>>> >>>>>>>>>>>>>> immediately
>>> >>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error and
>>> >>> drop
>>> >>>>>> record
>>> >>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a specific
>>> >>> "error"
>>> >>>>>> Kafka
>>> >>>>>>>>>>>>> topic,
>>> >>>>>>>>>>>>>>>> or
>>> >>>>>>>>>>>>>>>> record it as an app-level metrics (
>>> >>>>>>>>>>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_
>>> >>>>>> monitoring
>>> >>>>>>>>>> )
>>> >>>>>>>>>>>>> for
>>> >>>>>>>>>>>>>>>> monitoring.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Guozhang
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
>>> >>> matt@frmr.me
>>> >>>>>
>>> >>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> I did some more digging tonight.
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler uses
>>> >>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for the
>>> >>>> config
>>> >>>>>>>>>>> name. No
>>> >>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this should
>>> >>> use
>>> >>>>>>>>>>>>>>>>> "default.production.exception.handler".
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
>>> >>> matt@frmr.me
>>> >>>>>
>>> >>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> I think getting access to the serialized record is
>>> >>>>> possible,
>>> >>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>> changing
>>> >>>>>>>>>>>>>>>>>> the naming and return type is certainly doable.
>>> >>> However,
>>> >>>>>>>>>>> because
>>> >>>>>>>>>>>>>> we're
>>> >>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have no
>>> >>>> guarantee
>>> >>>>>>>>>>> that
>>> >>>>>>>>>>>>> the
>>> >>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the time this
>>> >>>>>>>>>>> particular
>>> >>>>>>>>>>>>>>>> handler
>>> >>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
>>> >>> something
>>> >>>>>>>>>> like:
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
>>> >>>>>>>>>>>>> ProducerRecord<..>
>>> >>>>>>>>>>>>>>>>> record,
>>> >>>>>>>>>>>>>>>>>> final Exception exception)
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> Would this be acceptable?
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
>>> >>>> matt@frmr.me>
>>> >>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and
>>> >>> return
>>> >>>>> type
>>> >>>>>>>>>>> but
>>> >>>>>>>>>>>>>> I’m
>>> >>>>>>>>>>>>>>>> not
>>> >>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and the
>>> >>> record
>>> >>>>>>>>>>> itself
>>> >>>>>>>>>>>>>>>> without
>>> >>>>>>>>>>>>>>>>>>> other changes.
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
>>> >>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>>> >>>>>>>>>>>>>>>> matthias@confluent.io>
>>> >>>>>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> >>>> 161%3A+streams+
>>> >>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> I thinks, we should align the design (parameter
>>> >>> naming,
>>> >>>>>>>>>>> return
>>> >>>>>>>>>>>>>>>> types,
>>> >>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a
>>> >>> unified
>>> >>>>>>>>>> user
>>> >>>>>>>>>>>>>>>>> experience.
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> -Matthias
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>>> >>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll update the
>>> >>>> KIP
>>> >>>>>>>>>>>>>>>> accordingly.
>>> >>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>>> >>>>>>>>>>> yuzhihong@gmail.com>
>>> >>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ?
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
>>> >>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
>>> >>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name should be
>>> >>>>>>>>>>> specified.
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> Cheers
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
>>> >>>>>>>>>>> matt@frmr.me>
>>> >>>>>>>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP that I
>>> >>>> just
>>> >>>>>>>>>>> filed
>>> >>>>>>>>>>>>>>>> here:
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> >>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
>>> >>>> error+handling++when+Kafka+
>>> >>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback from
>>> >>> folks
>>> >>>>>>>>>> about
>>> >>>>>>>>>>>>> this
>>> >>>>>>>>>>>>>>>> idea
>>> >>>>>>>>>>>>>>>>>>>> and
>>> >>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can contribute back.
>>> >>> :)
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>> >>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
>>> >>>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> --
>>> >>>>>>>>>>>>>>>> -- Guozhang
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> --
>>> >>>>>>>>>>>>> -- Guozhang
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> --
>>> >>>>>>>>>> -- Guozhang
>>> >>>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> -- Guozhang
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> -- Guozhang
>>> >>>
>>> >>
>>> >
>>>
>>>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
I responded before reading your code review and didn't see the bit about
how ProducerFencedException is self-healing. This error handling logic is
*quite* confusing to reason about... I think I'm going to sit down with the
code a bit more today, but I'm inclined to think that if the fenced
exceptions are, indeed, self healing that we still invoke the handler but
ignore its result for those exceptions.

On Tue, Nov 14, 2017 at 9:37 AM Matt Farmer <ma...@frmr.me> wrote:

> Hi there,
>
> Following up here...
>
> > One tiny comment: I would prefer to remove the "Always" from the
> handler implementation names -- it sounds "cleaner" to me without it.
>
> Let me think on this. I generally prefer expressiveness to clean-ness, and
> I think that comes out in the names I chose for things. The straw man in my
> head is the person that tried to substitute in the "AlwaysContinue" variant
> and the "Always" is a trigger to really consider whether or not they
> *always* want to try to continue.
>
> To be truthful, after some thought, using the "AlwaysContinue" variant
> isn't something I'd recommend generally in a production system. Ideally
> these handlers are targeted at handling a specific series of exceptions
> that a user wants to ignore, and not ignoring all exceptions. More on this
> in a minute.
>
> > For the first category, it seems to not make sense to call the handle but
> Streams should always fail. If we follow this design, the KIP should list
> all exceptions for which the handler is not called.
>
> I strongly disagree here. The purpose of this handler isn't *just* to
> make a decision for streams. There may also be desirable side effects that
> users wish to cause when production exceptions occur. There may be
> side-effects that they wish to cause when AuthenticationExceptions occur,
> as well. We should still give them the hooks to preform those side effects.
>
> In light of the above, I'm thinking that the
> "AlwaysContinueProductionExceptionHandler" variant should probably be
> removed from the public API and moved into tests since that's where it's
> most useful. The more I think on it, the more I feel that having that in
> the public API is a landmine. If you agree, then, we can rename the
> "AlwaysFailProductionExceptionHandler" to
> "DefaultProductionExceptionHandler".
>
> Thoughts?
>
> On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <ma...@confluent.io>
> wrote:
>
>> I just review the PR, and there is one thing we should discuss.
>>
>> There are different types of exceptions that could occur. Some apply to
>> all records (like Authorization exception) while others are for single
>> records only (like record too large).
>>
>> For the first category, it seems to not make sense to call the handle
>> but Streams should always fail. If we follow this design, the KIP should
>> list all exceptions for which the handler is not called.
>>
>> WDYT?
>>
>>
>> -Matthias
>>
>>
>> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
>> > Just catching up on this KIP.
>> >
>> > One tiny comment: I would prefer to remove the "Always" from the handler
>> > implementation names -- it sounds "cleaner" to me without it.
>> >
>> >
>> > -Matthias
>> >
>> > On 11/5/17 12:57 PM, Matt Farmer wrote:
>> >> It is agreed, then. I've updated the pull request. I'm trying to also
>> >> update the KIP accordingly, but cwiki is being slow and dropping
>> >> connections..... I'll try again a bit later but please consider the KIP
>> >> updated for all intents and purposes. heh.
>> >>
>> >> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wa...@gmail.com>
>> wrote:
>> >>
>> >>> That makes sense.
>> >>>
>> >>>
>> >>> Guozhang
>> >>>
>> >>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me> wrote:
>> >>>
>> >>>> Interesting. I'm not sure I agree. I've been bitten many times by
>> >>>> unintentionally shipping code that fails to properly implement
>> logging. I
>> >>>> always discover this at the exact *worst* moment, too. (Normally at
>> 3 AM
>> >>>> during an on-call shift. Hah.) However, if others feel the same way I
>> >>> could
>> >>>> probably be convinced to remove it.
>> >>>>
>> >>>> We could also meet halfway and say that when a customized
>> >>>> ProductionExceptionHandler instructs Streams to CONTINUE, we log at
>> DEBUG
>> >>>> level instead of WARN level. Would that alternative be appealing to
>> you?
>> >>>>
>> >>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wa...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>>> Thanks for the updates. I made a pass over the wiki again and it
>> looks
>> >>>>> good.
>> >>>>>
>> >>>>> About whether record collector should still internally log the
>> error in
>> >>>>> addition to what the customized ProductionExceptionHandler does. I
>> >>>>> personally would prefer only to log if the returned value is FAIL to
>> >>>>> indicate that this thread is going to shutdown and trigger the
>> >>> exception
>> >>>>> handler.
>> >>>>>
>> >>>>>
>> >>>>> Guozhang
>> >>>>>
>> >>>>>
>> >>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me> wrote:
>> >>>>>
>> >>>>>> Hello, a bit later than I'd anticipated, but I've updated this KIP
>> as
>> >>>>>> outlined above. The updated KIP is now ready for review again!
>> >>>>>>
>> >>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me> wrote:
>> >>>>>>
>> >>>>>>> Ah. I actually created both of those in the PR and forgot to
>> >>> mention
>> >>>>> them
>> >>>>>>> by name in the KIP! Thanks for pointing out the oversight.
>> >>>>>>>
>> >>>>>>> I’ll revise the KIP this afternoon accordingly.
>> >>>>>>>
>> >>>>>>> The logging is actually provided for in the record collector.
>> >>>> Whenever
>> >>>>> a
>> >>>>>>> handler continues it’ll log a warning to ensure that it’s
>> >>>> *impossible*
>> >>>>> to
>> >>>>>>> write a handler that totally suppresses production exceptions from
>> >>>> the
>> >>>>>> log.
>> >>>>>>> As such, the default continue handler just returns the continue
>> >>>> value.
>> >>>>> I
>> >>>>>>> can add details about those semantics to the KIP as well.
>> >>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
>> >>>> matthias@confluent.io
>> >>>>>>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> One more comment.
>> >>>>>>>>
>> >>>>>>>> You mention a default implementation for the handler that fails.
>> I
>> >>>>>>>> think, this should be part of the public API and thus should have
>> >>> a
>> >>>>>>>> proper defined named that is mentioned in the KIP.
>> >>>>>>>>
>> >>>>>>>> We could also add a second implementation for the log-and-move-on
>> >>>>>>>> strategy, as both are the two most common cases. This class
>> should
>> >>>>> also
>> >>>>>>>> be part of public API (so users can just set in the config) with
>> a
>> >>>>>>>> proper name.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Otherwise, I like the KIP a lot! Thanks.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> -Matthias
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>> >>>>>>>>> Thanks for the heads up. Yes, I think my changes are compatible
>> >>>> with
>> >>>>>>>> that
>> >>>>>>>>> PR, but there will be a merge conflict that happens whenever one
>> >>>> of
>> >>>>>> the
>> >>>>>>>> PRs
>> >>>>>>>>> is merged. Happy to reconcile the changes in my PR if 4148 goes
>> >>> in
>> >>>>>>>> first. :)
>> >>>>>>>>>
>> >>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
>> >>> wangguoz@gmail.com
>> >>>>>
>> >>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> That sounds reasonable, thanks Matt.
>> >>>>>>>>>>
>> >>>>>>>>>> As for the implementation, please note that there is another
>> >>>>> ongoing
>> >>>>>> PR
>> >>>>>>>>>> that may touch the same classes that you are working on:
>> >>>>>>>>>> https://github.com/apache/kafka/pull/4148
>> >>>>>>>>>>
>> >>>>>>>>>> So it may help if you can also take a look at that PR and see
>> >>> if
>> >>>> it
>> >>>>>> is
>> >>>>>>>>>> compatible with your changes.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Guozhang
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me>
>> >>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> I've opened this pull request to implement the KIP as
>> >>> currently
>> >>>>>>>> written:
>> >>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still needs
>> >>> some
>> >>>>>> tests
>> >>>>>>>>>>> added,
>> >>>>>>>>>>> but largely represents the shape I was going for.
>> >>>>>>>>>>>
>> >>>>>>>>>>> If there are more points that folks would like to discuss,
>> >>>> please
>> >>>>>> let
>> >>>>>>>> me
>> >>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon I'll
>> >>>> probably
>> >>>>>>>> start
>> >>>>>>>>>> a
>> >>>>>>>>>>> [VOTE] thread.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks,
>> >>>>>>>>>>> Matt
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
>> >>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> I can’t think of a reason that would be problematic.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Most of the time I would write a handler like this, I either
>> >>>> want
>> >>>>>> to
>> >>>>>>>>>>>> ignore the error or fail and bring everything down so that I
>> >>>> can
>> >>>>>> spin
>> >>>>>>>>>> it
>> >>>>>>>>>>>> back up later and resume from earlier offsets. When we start
>> >>> up
>> >>>>>> after
>> >>>>>>>>>>>> crashing we’ll eventually try to process the message we
>> >>> failed
>> >>>> to
>> >>>>>>>>>> produce
>> >>>>>>>>>>>> again.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I’m concerned that “putting in a queue for later” opens you
>> >>> up
>> >>>> to
>> >>>>>>>>>> putting
>> >>>>>>>>>>>> messages into the destination topic in an unexpected order.
>> >>>>> However
>> >>>>>>>> if
>> >>>>>>>>>>>> others feel differently, I’m happy to talk about it.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
>> >>>>> wangguoz@gmail.com>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>> >>> that
>> >>>>> the
>> >>>>>>>>>>> record
>> >>>>>>>>>>>>>> metadata is always null if an exception occurred while
>> >>> trying
>> >>>>> to
>> >>>>>>>>>>>>> produce.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> That is right. Thanks.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I looked at the example code, and one thing I realized that
>> >>>>> since
>> >>>>>> we
>> >>>>>>>>>> are
>> >>>>>>>>>>>>> not passing the context in the handle function, we may not
>> >>> be
>> >>>>>>>>>> implement
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>> logic to send the fail records into another queue for future
>> >>>>>>>>>> processing.
>> >>>>>>>>>>>>> Would people think that would be a big issue?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Guozhang
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <matt@frmr.me
>> >>>>
>> >>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Hello all,
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I've updated the KIP based on this conversation, and made
>> >>> it
>> >>>> so
>> >>>>>>>> that
>> >>>>>>>>>>> its
>> >>>>>>>>>>>>>> interface, config setting, and parameters line up more
>> >>>> closely
>> >>>>>> with
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I believe there are a few specific questions I need to
>> >>> reply
>> >>>>>>>> to.....
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> The question I had about then handle parameters are around
>> >>>> the
>> >>>>>>>>>>> record,
>> >>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>> >>>> generics
>> >>>>> of
>> >>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>> >>>>> `ProducerRecord<?
>> >>>>>>>>>>>>> extends
>> >>>>>>>>>>>>>>> Object, ? extends Object>`?
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> At this point in the code we're guaranteed that this is a
>> >>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics would just
>> >>>> make
>> >>>>>> it
>> >>>>>>>>>>>>> harder
>> >>>>>>>>>>>>>> to work with the key and value.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Also, should the handle function include the
>> >>>> `RecordMetadata`
>> >>>>> as
>> >>>>>>>>>>> well
>> >>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>> case it is not null?
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>> >>> that
>> >>>>> the
>> >>>>>>>>>>> record
>> >>>>>>>>>>>>>> metadata is always null if an exception occurred while
>> >>> trying
>> >>>>> to
>> >>>>>>>>>>>>> produce.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> We may probably try to write down at least the following
>> >>>>>> handling
>> >>>>>>>>>>>>> logic
>> >>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>> see if the given API is sufficient for it
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I've added some examples to the KIP. Let me know what you
>> >>>>> think.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>> Matt
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me>
>> >>>>>> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right now
>> >>> and
>> >>>> am
>> >>>>>>>>>>>>> planning
>> >>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>> updating the KIP again with details from this conversation
>> >>>>> later
>> >>>>>>>>>>> this
>> >>>>>>>>>>>>>> week.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
>> >>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
>> >>>>>> wangguoz@gmail.com
>> >>>>>>>>>>>
>> >>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Thanks for the KIP Matt.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Regarding the handle interface of
>> >>>>>> ProductionExceptionHandlerResp
>> >>>>>>>>>>> onse,
>> >>>>>>>>>>>>>>>> could
>> >>>>>>>>>>>>>>>> you write it on the wiki also, along with the actual
>> >>> added
>> >>>>>> config
>> >>>>>>>>>>>>> names
>> >>>>>>>>>>>>>>>> (e.g. what
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> >>>> 161%3A+streams+
>> >>>>>>>>>>>>>> deserialization+exception+handlers
>> >>>>>>>>>>>>>>>> described).
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> The question I had about then handle parameters are
>> >>> around
>> >>>>> the
>> >>>>>>>>>>>>> record,
>> >>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>> >>>> generics
>> >>>>>> of
>> >>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>> >>>>> `ProducerRecord<?
>> >>>>>>>>>>>>> extends
>> >>>>>>>>>>>>>>>> Object, ? extends Object>`?
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Also, should the handle function include the
>> >>>> `RecordMetadata`
>> >>>>>> as
>> >>>>>>>>>>>>> well in
>> >>>>>>>>>>>>>>>> case it is not null?
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> We may probably try to write down at least the following
>> >>>>>> handling
>> >>>>>>>>>>>>> logic
>> >>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) throw
>> >>>> exception
>> >>>>>>>>>>>>>> immediately
>> >>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error and
>> >>> drop
>> >>>>>> record
>> >>>>>>>>>>> and
>> >>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a specific
>> >>> "error"
>> >>>>>> Kafka
>> >>>>>>>>>>>>> topic,
>> >>>>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>>> record it as an app-level metrics (
>> >>>>>>>>>>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_
>> >>>>>> monitoring
>> >>>>>>>>>> )
>> >>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>> monitoring.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Guozhang
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
>> >>> matt@frmr.me
>> >>>>>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> I did some more digging tonight.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler uses
>> >>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for the
>> >>>> config
>> >>>>>>>>>>> name. No
>> >>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this should
>> >>> use
>> >>>>>>>>>>>>>>>>> "default.production.exception.handler".
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
>> >>> matt@frmr.me
>> >>>>>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> I think getting access to the serialized record is
>> >>>>> possible,
>> >>>>>>>>>>> and
>> >>>>>>>>>>>>>>>> changing
>> >>>>>>>>>>>>>>>>>> the naming and return type is certainly doable.
>> >>> However,
>> >>>>>>>>>>> because
>> >>>>>>>>>>>>>> we're
>> >>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have no
>> >>>> guarantee
>> >>>>>>>>>>> that
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the time this
>> >>>>>>>>>>> particular
>> >>>>>>>>>>>>>>>> handler
>> >>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
>> >>> something
>> >>>>>>>>>> like:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
>> >>>>>>>>>>>>> ProducerRecord<..>
>> >>>>>>>>>>>>>>>>> record,
>> >>>>>>>>>>>>>>>>>> final Exception exception)
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Would this be acceptable?
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
>> >>>> matt@frmr.me>
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and
>> >>> return
>> >>>>> type
>> >>>>>>>>>>> but
>> >>>>>>>>>>>>>> I’m
>> >>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and the
>> >>> record
>> >>>>>>>>>>> itself
>> >>>>>>>>>>>>>>>> without
>> >>>>>>>>>>>>>>>>>>> other changes.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
>> >>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>> >>>>>>>>>>>>>>>> matthias@confluent.io>
>> >>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> >>>> 161%3A+streams+
>> >>>>>>>>>>>>>>>>> deserialization+exception+handlers
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> I thinks, we should align the design (parameter
>> >>> naming,
>> >>>>>>>>>>> return
>> >>>>>>>>>>>>>>>> types,
>> >>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a
>> >>> unified
>> >>>>>>>>>> user
>> >>>>>>>>>>>>>>>>> experience.
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> -Matthias
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>> >>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll update the
>> >>>> KIP
>> >>>>>>>>>>>>>>>> accordingly.
>> >>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>> >>>>>>>>>>> yuzhihong@gmail.com>
>> >>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ?
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
>> >>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
>> >>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name should be
>> >>>>>>>>>>> specified.
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> Cheers
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
>> >>>>>>>>>>> matt@frmr.me>
>> >>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP that I
>> >>>> just
>> >>>>>>>>>>> filed
>> >>>>>>>>>>>>>>>> here:
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> >>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
>> >>>> error+handling++when+Kafka+
>> >>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback from
>> >>> folks
>> >>>>>>>>>> about
>> >>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>> idea
>> >>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can contribute back.
>> >>> :)
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>>> -- Guozhang
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> --
>> >>>>>>>>>>>>> -- Guozhang
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> --
>> >>>>>>>>>> -- Guozhang
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> -- Guozhang
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> -- Guozhang
>> >>>
>> >>
>> >
>>
>>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
Hi there,

Following up here...

> One tiny comment: I would prefer to remove the "Always" from the handler implementation
names -- it sounds "cleaner" to me without it.

Let me think on this. I generally prefer expressiveness to clean-ness, and
I think that comes out in the names I chose for things. The straw man in my
head is the person that tried to substitute in the "AlwaysContinue" variant
and the "Always" is a trigger to really consider whether or not they
*always* want to try to continue.

To be truthful, after some thought, using the "AlwaysContinue" variant
isn't something I'd recommend generally in a production system. Ideally
these handlers are targeted at handling a specific series of exceptions
that a user wants to ignore, and not ignoring all exceptions. More on this
in a minute.

> For the first category, it seems to not make sense to call the handle but
Streams should always fail. If we follow this design, the KIP should list
all exceptions for which the handler is not called.

I strongly disagree here. The purpose of this handler isn't *just* to make
a decision for streams. There may also be desirable side effects that users
wish to cause when production exceptions occur. There may be side-effects
that they wish to cause when AuthenticationExceptions occur, as well. We
should still give them the hooks to preform those side effects.

In light of the above, I'm thinking that the
"AlwaysContinueProductionExceptionHandler" variant should probably be
removed from the public API and moved into tests since that's where it's
most useful. The more I think on it, the more I feel that having that in
the public API is a landmine. If you agree, then, we can rename the
"AlwaysFailProductionExceptionHandler" to
"DefaultProductionExceptionHandler".

Thoughts?

On Fri, Nov 10, 2017 at 6:13 PM Matthias J. Sax <ma...@confluent.io>
wrote:

> I just review the PR, and there is one thing we should discuss.
>
> There are different types of exceptions that could occur. Some apply to
> all records (like Authorization exception) while others are for single
> records only (like record too large).
>
> For the first category, it seems to not make sense to call the handle
> but Streams should always fail. If we follow this design, the KIP should
> list all exceptions for which the handler is not called.
>
> WDYT?
>
>
> -Matthias
>
>
> On 11/10/17 2:56 PM, Matthias J. Sax wrote:
> > Just catching up on this KIP.
> >
> > One tiny comment: I would prefer to remove the "Always" from the handler
> > implementation names -- it sounds "cleaner" to me without it.
> >
> >
> > -Matthias
> >
> > On 11/5/17 12:57 PM, Matt Farmer wrote:
> >> It is agreed, then. I've updated the pull request. I'm trying to also
> >> update the KIP accordingly, but cwiki is being slow and dropping
> >> connections..... I'll try again a bit later but please consider the KIP
> >> updated for all intents and purposes. heh.
> >>
> >> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wa...@gmail.com>
> wrote:
> >>
> >>> That makes sense.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me> wrote:
> >>>
> >>>> Interesting. I'm not sure I agree. I've been bitten many times by
> >>>> unintentionally shipping code that fails to properly implement
> logging. I
> >>>> always discover this at the exact *worst* moment, too. (Normally at 3
> AM
> >>>> during an on-call shift. Hah.) However, if others feel the same way I
> >>> could
> >>>> probably be convinced to remove it.
> >>>>
> >>>> We could also meet halfway and say that when a customized
> >>>> ProductionExceptionHandler instructs Streams to CONTINUE, we log at
> DEBUG
> >>>> level instead of WARN level. Would that alternative be appealing to
> you?
> >>>>
> >>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wa...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> Thanks for the updates. I made a pass over the wiki again and it
> looks
> >>>>> good.
> >>>>>
> >>>>> About whether record collector should still internally log the error
> in
> >>>>> addition to what the customized ProductionExceptionHandler does. I
> >>>>> personally would prefer only to log if the returned value is FAIL to
> >>>>> indicate that this thread is going to shutdown and trigger the
> >>> exception
> >>>>> handler.
> >>>>>
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>>
> >>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me> wrote:
> >>>>>
> >>>>>> Hello, a bit later than I'd anticipated, but I've updated this KIP
> as
> >>>>>> outlined above. The updated KIP is now ready for review again!
> >>>>>>
> >>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me> wrote:
> >>>>>>
> >>>>>>> Ah. I actually created both of those in the PR and forgot to
> >>> mention
> >>>>> them
> >>>>>>> by name in the KIP! Thanks for pointing out the oversight.
> >>>>>>>
> >>>>>>> I’ll revise the KIP this afternoon accordingly.
> >>>>>>>
> >>>>>>> The logging is actually provided for in the record collector.
> >>>> Whenever
> >>>>> a
> >>>>>>> handler continues it’ll log a warning to ensure that it’s
> >>>> *impossible*
> >>>>> to
> >>>>>>> write a handler that totally suppresses production exceptions from
> >>>> the
> >>>>>> log.
> >>>>>>> As such, the default continue handler just returns the continue
> >>>> value.
> >>>>> I
> >>>>>>> can add details about those semantics to the KIP as well.
> >>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
> >>>> matthias@confluent.io
> >>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> One more comment.
> >>>>>>>>
> >>>>>>>> You mention a default implementation for the handler that fails. I
> >>>>>>>> think, this should be part of the public API and thus should have
> >>> a
> >>>>>>>> proper defined named that is mentioned in the KIP.
> >>>>>>>>
> >>>>>>>> We could also add a second implementation for the log-and-move-on
> >>>>>>>> strategy, as both are the two most common cases. This class should
> >>>>> also
> >>>>>>>> be part of public API (so users can just set in the config) with a
> >>>>>>>> proper name.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Otherwise, I like the KIP a lot! Thanks.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> -Matthias
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
> >>>>>>>>> Thanks for the heads up. Yes, I think my changes are compatible
> >>>> with
> >>>>>>>> that
> >>>>>>>>> PR, but there will be a merge conflict that happens whenever one
> >>>> of
> >>>>>> the
> >>>>>>>> PRs
> >>>>>>>>> is merged. Happy to reconcile the changes in my PR if 4148 goes
> >>> in
> >>>>>>>> first. :)
> >>>>>>>>>
> >>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
> >>> wangguoz@gmail.com
> >>>>>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> That sounds reasonable, thanks Matt.
> >>>>>>>>>>
> >>>>>>>>>> As for the implementation, please note that there is another
> >>>>> ongoing
> >>>>>> PR
> >>>>>>>>>> that may touch the same classes that you are working on:
> >>>>>>>>>> https://github.com/apache/kafka/pull/4148
> >>>>>>>>>>
> >>>>>>>>>> So it may help if you can also take a look at that PR and see
> >>> if
> >>>> it
> >>>>>> is
> >>>>>>>>>> compatible with your changes.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Guozhang
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I've opened this pull request to implement the KIP as
> >>> currently
> >>>>>>>> written:
> >>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still needs
> >>> some
> >>>>>> tests
> >>>>>>>>>>> added,
> >>>>>>>>>>> but largely represents the shape I was going for.
> >>>>>>>>>>>
> >>>>>>>>>>> If there are more points that folks would like to discuss,
> >>>> please
> >>>>>> let
> >>>>>>>> me
> >>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon I'll
> >>>> probably
> >>>>>>>> start
> >>>>>>>>>> a
> >>>>>>>>>>> [VOTE] thread.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Matt
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
> >>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I can’t think of a reason that would be problematic.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Most of the time I would write a handler like this, I either
> >>>> want
> >>>>>> to
> >>>>>>>>>>>> ignore the error or fail and bring everything down so that I
> >>>> can
> >>>>>> spin
> >>>>>>>>>> it
> >>>>>>>>>>>> back up later and resume from earlier offsets. When we start
> >>> up
> >>>>>> after
> >>>>>>>>>>>> crashing we’ll eventually try to process the message we
> >>> failed
> >>>> to
> >>>>>>>>>> produce
> >>>>>>>>>>>> again.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I’m concerned that “putting in a queue for later” opens you
> >>> up
> >>>> to
> >>>>>>>>>> putting
> >>>>>>>>>>>> messages into the destination topic in an unexpected order.
> >>>>> However
> >>>>>>>> if
> >>>>>>>>>>>> others feel differently, I’m happy to talk about it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
> >>>>> wangguoz@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
> >>> that
> >>>>> the
> >>>>>>>>>>> record
> >>>>>>>>>>>>>> metadata is always null if an exception occurred while
> >>> trying
> >>>>> to
> >>>>>>>>>>>>> produce.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> That is right. Thanks.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I looked at the example code, and one thing I realized that
> >>>>> since
> >>>>>> we
> >>>>>>>>>> are
> >>>>>>>>>>>>> not passing the context in the handle function, we may not
> >>> be
> >>>>>>>>>> implement
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> logic to send the fail records into another queue for future
> >>>>>>>>>> processing.
> >>>>>>>>>>>>> Would people think that would be a big issue?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <matt@frmr.me
> >>>>
> >>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hello all,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I've updated the KIP based on this conversation, and made
> >>> it
> >>>> so
> >>>>>>>> that
> >>>>>>>>>>> its
> >>>>>>>>>>>>>> interface, config setting, and parameters line up more
> >>>> closely
> >>>>>> with
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I believe there are a few specific questions I need to
> >>> reply
> >>>>>>>> to.....
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The question I had about then handle parameters are around
> >>>> the
> >>>>>>>>>>> record,
> >>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
> >>>> generics
> >>>>> of
> >>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> >>>>> `ProducerRecord<?
> >>>>>>>>>>>>> extends
> >>>>>>>>>>>>>>> Object, ? extends Object>`?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> At this point in the code we're guaranteed that this is a
> >>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics would just
> >>>> make
> >>>>>> it
> >>>>>>>>>>>>> harder
> >>>>>>>>>>>>>> to work with the key and value.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Also, should the handle function include the
> >>>> `RecordMetadata`
> >>>>> as
> >>>>>>>>>>> well
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>> case it is not null?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
> >>> that
> >>>>> the
> >>>>>>>>>>> record
> >>>>>>>>>>>>>> metadata is always null if an exception occurred while
> >>> trying
> >>>>> to
> >>>>>>>>>>>>> produce.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> We may probably try to write down at least the following
> >>>>>> handling
> >>>>>>>>>>>>> logic
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>> see if the given API is sufficient for it
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I've added some examples to the KIP. Let me know what you
> >>>>> think.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>> Matt
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me>
> >>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right now
> >>> and
> >>>> am
> >>>>>>>>>>>>> planning
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>> updating the KIP again with details from this conversation
> >>>>> later
> >>>>>>>>>>> this
> >>>>>>>>>>>>>> week.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
> >>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
> >>>>>> wangguoz@gmail.com
> >>>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks for the KIP Matt.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Regarding the handle interface of
> >>>>>> ProductionExceptionHandlerResp
> >>>>>>>>>>> onse,
> >>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>> you write it on the wiki also, along with the actual
> >>> added
> >>>>>> config
> >>>>>>>>>>>>> names
> >>>>>>>>>>>>>>>> (e.g. what
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>> 161%3A+streams+
> >>>>>>>>>>>>>> deserialization+exception+handlers
> >>>>>>>>>>>>>>>> described).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The question I had about then handle parameters are
> >>> around
> >>>>> the
> >>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
> >>>> generics
> >>>>>> of
> >>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> >>>>> `ProducerRecord<?
> >>>>>>>>>>>>> extends
> >>>>>>>>>>>>>>>> Object, ? extends Object>`?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Also, should the handle function include the
> >>>> `RecordMetadata`
> >>>>>> as
> >>>>>>>>>>>>> well in
> >>>>>>>>>>>>>>>> case it is not null?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> We may probably try to write down at least the following
> >>>>>> handling
> >>>>>>>>>>>>> logic
> >>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) throw
> >>>> exception
> >>>>>>>>>>>>>> immediately
> >>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error and
> >>> drop
> >>>>>> record
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a specific
> >>> "error"
> >>>>>> Kafka
> >>>>>>>>>>>>> topic,
> >>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>> record it as an app-level metrics (
> >>>>>>>>>>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_
> >>>>>> monitoring
> >>>>>>>>>> )
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>> monitoring.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
> >>> matt@frmr.me
> >>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I did some more digging tonight.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler uses
> >>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for the
> >>>> config
> >>>>>>>>>>> name. No
> >>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this should
> >>> use
> >>>>>>>>>>>>>>>>> "default.production.exception.handler".
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
> >>> matt@frmr.me
> >>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I think getting access to the serialized record is
> >>>>> possible,
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>> changing
> >>>>>>>>>>>>>>>>>> the naming and return type is certainly doable.
> >>> However,
> >>>>>>>>>>> because
> >>>>>>>>>>>>>> we're
> >>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have no
> >>>> guarantee
> >>>>>>>>>>> that
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the time this
> >>>>>>>>>>> particular
> >>>>>>>>>>>>>>>> handler
> >>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
> >>> something
> >>>>>>>>>> like:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
> >>>>>>>>>>>>> ProducerRecord<..>
> >>>>>>>>>>>>>>>>> record,
> >>>>>>>>>>>>>>>>>> final Exception exception)
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Would this be acceptable?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
> >>>> matt@frmr.me>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and
> >>> return
> >>>>> type
> >>>>>>>>>>> but
> >>>>>>>>>>>>>> I’m
> >>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and the
> >>> record
> >>>>>>>>>>> itself
> >>>>>>>>>>>>>>>> without
> >>>>>>>>>>>>>>>>>>> other changes.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
> >>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
> >>>>>>>>>>>>>>>> matthias@confluent.io>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>> 161%3A+streams+
> >>>>>>>>>>>>>>>>> deserialization+exception+handlers
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I thinks, we should align the design (parameter
> >>> naming,
> >>>>>>>>>>> return
> >>>>>>>>>>>>>>>> types,
> >>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a
> >>> unified
> >>>>>>>>>> user
> >>>>>>>>>>>>>>>>> experience.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> >>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll update the
> >>>> KIP
> >>>>>>>>>>>>>>>> accordingly.
> >>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
> >>>>>>>>>>> yuzhihong@gmail.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
> >>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
> >>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name should be
> >>>>>>>>>>> specified.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
> >>>>>>>>>>> matt@frmr.me>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP that I
> >>>> just
> >>>>>>>>>>> filed
> >>>>>>>>>>>>>>>> here:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
> >>>> error+handling++when+Kafka+
> >>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback from
> >>> folks
> >>>>>>>>>> about
> >>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>> idea
> >>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can contribute back.
> >>> :)
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> -- Guozhang
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> -- Guozhang
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> -- Guozhang
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> -- Guozhang
> >>>
> >>
> >
>
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by "Matthias J. Sax" <ma...@confluent.io>.
I just review the PR, and there is one thing we should discuss.

There are different types of exceptions that could occur. Some apply to
all records (like Authorization exception) while others are for single
records only (like record too large).

For the first category, it seems to not make sense to call the handle
but Streams should always fail. If we follow this design, the KIP should
list all exceptions for which the handler is not called.

WDYT?


-Matthias


On 11/10/17 2:56 PM, Matthias J. Sax wrote:
> Just catching up on this KIP.
> 
> One tiny comment: I would prefer to remove the "Always" from the handler
> implementation names -- it sounds "cleaner" to me without it.
> 
> 
> -Matthias
> 
> On 11/5/17 12:57 PM, Matt Farmer wrote:
>> It is agreed, then. I've updated the pull request. I'm trying to also
>> update the KIP accordingly, but cwiki is being slow and dropping
>> connections..... I'll try again a bit later but please consider the KIP
>> updated for all intents and purposes. heh.
>>
>> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wa...@gmail.com> wrote:
>>
>>> That makes sense.
>>>
>>>
>>> Guozhang
>>>
>>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me> wrote:
>>>
>>>> Interesting. I'm not sure I agree. I've been bitten many times by
>>>> unintentionally shipping code that fails to properly implement logging. I
>>>> always discover this at the exact *worst* moment, too. (Normally at 3 AM
>>>> during an on-call shift. Hah.) However, if others feel the same way I
>>> could
>>>> probably be convinced to remove it.
>>>>
>>>> We could also meet halfway and say that when a customized
>>>> ProductionExceptionHandler instructs Streams to CONTINUE, we log at DEBUG
>>>> level instead of WARN level. Would that alternative be appealing to you?
>>>>
>>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wa...@gmail.com>
>>> wrote:
>>>>
>>>>> Thanks for the updates. I made a pass over the wiki again and it looks
>>>>> good.
>>>>>
>>>>> About whether record collector should still internally log the error in
>>>>> addition to what the customized ProductionExceptionHandler does. I
>>>>> personally would prefer only to log if the returned value is FAIL to
>>>>> indicate that this thread is going to shutdown and trigger the
>>> exception
>>>>> handler.
>>>>>
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me> wrote:
>>>>>
>>>>>> Hello, a bit later than I'd anticipated, but I've updated this KIP as
>>>>>> outlined above. The updated KIP is now ready for review again!
>>>>>>
>>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me> wrote:
>>>>>>
>>>>>>> Ah. I actually created both of those in the PR and forgot to
>>> mention
>>>>> them
>>>>>>> by name in the KIP! Thanks for pointing out the oversight.
>>>>>>>
>>>>>>> I’ll revise the KIP this afternoon accordingly.
>>>>>>>
>>>>>>> The logging is actually provided for in the record collector.
>>>> Whenever
>>>>> a
>>>>>>> handler continues it’ll log a warning to ensure that it’s
>>>> *impossible*
>>>>> to
>>>>>>> write a handler that totally suppresses production exceptions from
>>>> the
>>>>>> log.
>>>>>>> As such, the default continue handler just returns the continue
>>>> value.
>>>>> I
>>>>>>> can add details about those semantics to the KIP as well.
>>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
>>>> matthias@confluent.io
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> One more comment.
>>>>>>>>
>>>>>>>> You mention a default implementation for the handler that fails. I
>>>>>>>> think, this should be part of the public API and thus should have
>>> a
>>>>>>>> proper defined named that is mentioned in the KIP.
>>>>>>>>
>>>>>>>> We could also add a second implementation for the log-and-move-on
>>>>>>>> strategy, as both are the two most common cases. This class should
>>>>> also
>>>>>>>> be part of public API (so users can just set in the config) with a
>>>>>>>> proper name.
>>>>>>>>
>>>>>>>>
>>>>>>>> Otherwise, I like the KIP a lot! Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>>>>>>>>> Thanks for the heads up. Yes, I think my changes are compatible
>>>> with
>>>>>>>> that
>>>>>>>>> PR, but there will be a merge conflict that happens whenever one
>>>> of
>>>>>> the
>>>>>>>> PRs
>>>>>>>>> is merged. Happy to reconcile the changes in my PR if 4148 goes
>>> in
>>>>>>>> first. :)
>>>>>>>>>
>>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
>>> wangguoz@gmail.com
>>>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> That sounds reasonable, thanks Matt.
>>>>>>>>>>
>>>>>>>>>> As for the implementation, please note that there is another
>>>>> ongoing
>>>>>> PR
>>>>>>>>>> that may touch the same classes that you are working on:
>>>>>>>>>> https://github.com/apache/kafka/pull/4148
>>>>>>>>>>
>>>>>>>>>> So it may help if you can also take a look at that PR and see
>>> if
>>>> it
>>>>>> is
>>>>>>>>>> compatible with your changes.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Guozhang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me>
>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I've opened this pull request to implement the KIP as
>>> currently
>>>>>>>> written:
>>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still needs
>>> some
>>>>>> tests
>>>>>>>>>>> added,
>>>>>>>>>>> but largely represents the shape I was going for.
>>>>>>>>>>>
>>>>>>>>>>> If there are more points that folks would like to discuss,
>>>> please
>>>>>> let
>>>>>>>> me
>>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon I'll
>>>> probably
>>>>>>>> start
>>>>>>>>>> a
>>>>>>>>>>> [VOTE] thread.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Matt
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I can’t think of a reason that would be problematic.
>>>>>>>>>>>>
>>>>>>>>>>>> Most of the time I would write a handler like this, I either
>>>> want
>>>>>> to
>>>>>>>>>>>> ignore the error or fail and bring everything down so that I
>>>> can
>>>>>> spin
>>>>>>>>>> it
>>>>>>>>>>>> back up later and resume from earlier offsets. When we start
>>> up
>>>>>> after
>>>>>>>>>>>> crashing we’ll eventually try to process the message we
>>> failed
>>>> to
>>>>>>>>>> produce
>>>>>>>>>>>> again.
>>>>>>>>>>>>
>>>>>>>>>>>> I’m concerned that “putting in a queue for later” opens you
>>> up
>>>> to
>>>>>>>>>> putting
>>>>>>>>>>>> messages into the destination topic in an unexpected order.
>>>>> However
>>>>>>>> if
>>>>>>>>>>>> others feel differently, I’m happy to talk about it.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
>>>>> wangguoz@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>>> that
>>>>> the
>>>>>>>>>>> record
>>>>>>>>>>>>>> metadata is always null if an exception occurred while
>>> trying
>>>>> to
>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That is right. Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I looked at the example code, and one thing I realized that
>>>>> since
>>>>>> we
>>>>>>>>>> are
>>>>>>>>>>>>> not passing the context in the handle function, we may not
>>> be
>>>>>>>>>> implement
>>>>>>>>>>>>> the
>>>>>>>>>>>>> logic to send the fail records into another queue for future
>>>>>>>>>> processing.
>>>>>>>>>>>>> Would people think that would be a big issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <matt@frmr.me
>>>>
>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've updated the KIP based on this conversation, and made
>>> it
>>>> so
>>>>>>>> that
>>>>>>>>>>> its
>>>>>>>>>>>>>> interface, config setting, and parameters line up more
>>>> closely
>>>>>> with
>>>>>>>>>>> the
>>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I believe there are a few specific questions I need to
>>> reply
>>>>>>>> to.....
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The question I had about then handle parameters are around
>>>> the
>>>>>>>>>>> record,
>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>>> generics
>>>>> of
>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>> `ProducerRecord<?
>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At this point in the code we're guaranteed that this is a
>>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics would just
>>>> make
>>>>>> it
>>>>>>>>>>>>> harder
>>>>>>>>>>>>>> to work with the key and value.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, should the handle function include the
>>>> `RecordMetadata`
>>>>> as
>>>>>>>>>>> well
>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>>> that
>>>>> the
>>>>>>>>>>> record
>>>>>>>>>>>>>> metadata is always null if an exception occurred while
>>> trying
>>>>> to
>>>>>>>>>>>>> produce.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We may probably try to write down at least the following
>>>>>> handling
>>>>>>>>>>>>> logic
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> see if the given API is sufficient for it
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've added some examples to the KIP. Let me know what you
>>>>> think.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me>
>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right now
>>> and
>>>> am
>>>>>>>>>>>>> planning
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>> updating the KIP again with details from this conversation
>>>>> later
>>>>>>>>>>> this
>>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
>>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
>>>>>> wangguoz@gmail.com
>>>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the KIP Matt.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regarding the handle interface of
>>>>>> ProductionExceptionHandlerResp
>>>>>>>>>>> onse,
>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>> you write it on the wiki also, along with the actual
>>> added
>>>>>> config
>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>> (e.g. what
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>> 161%3A+streams+
>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>> described).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The question I had about then handle parameters are
>>> around
>>>>> the
>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>>> generics
>>>>>> of
>>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>>> `ProducerRecord<?
>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, should the handle function include the
>>>> `RecordMetadata`
>>>>>> as
>>>>>>>>>>>>> well in
>>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We may probably try to write down at least the following
>>>>>> handling
>>>>>>>>>>>>> logic
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) throw
>>>> exception
>>>>>>>>>>>>>> immediately
>>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error and
>>> drop
>>>>>> record
>>>>>>>>>>> and
>>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a specific
>>> "error"
>>>>>> Kafka
>>>>>>>>>>>>> topic,
>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>> record it as an app-level metrics (
>>>>>>>>>>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_
>>>>>> monitoring
>>>>>>>>>> )
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> monitoring.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
>>> matt@frmr.me
>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I did some more digging tonight.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler uses
>>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for the
>>>> config
>>>>>>>>>>> name. No
>>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this should
>>> use
>>>>>>>>>>>>>>>>> "default.production.exception.handler".
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
>>> matt@frmr.me
>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think getting access to the serialized record is
>>>>> possible,
>>>>>>>>>>> and
>>>>>>>>>>>>>>>> changing
>>>>>>>>>>>>>>>>>> the naming and return type is certainly doable.
>>> However,
>>>>>>>>>>> because
>>>>>>>>>>>>>> we're
>>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have no
>>>> guarantee
>>>>>>>>>>> that
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the time this
>>>>>>>>>>> particular
>>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
>>> something
>>>>>>>>>> like:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
>>>>>>>>>>>>> ProducerRecord<..>
>>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>>> final Exception exception)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Would this be acceptable?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
>>>> matt@frmr.me>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and
>>> return
>>>>> type
>>>>>>>>>>> but
>>>>>>>>>>>>>> I’m
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and the
>>> record
>>>>>>>>>>> itself
>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>> other changes.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
>>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>>>>>>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>> 161%3A+streams+
>>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I thinks, we should align the design (parameter
>>> naming,
>>>>>>>>>>> return
>>>>>>>>>>>>>>>> types,
>>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a
>>> unified
>>>>>>>>>> user
>>>>>>>>>>>>>>>>> experience.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll update the
>>>> KIP
>>>>>>>>>>>>>>>> accordingly.
>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>>>>>>>>>>> yuzhihong@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
>>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
>>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name should be
>>>>>>>>>>> specified.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
>>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP that I
>>>> just
>>>>>>>>>>> filed
>>>>>>>>>>>>>>>> here:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
>>>> error+handling++when+Kafka+
>>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback from
>>> folks
>>>>>>>>>> about
>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can contribute back.
>>> :)
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> -- Guozhang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -- Guozhang
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> -- Guozhang
>>>
>>
> 


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Just catching up on this KIP.

One tiny comment: I would prefer to remove the "Always" from the handler
implementation names -- it sounds "cleaner" to me without it.


-Matthias

On 11/5/17 12:57 PM, Matt Farmer wrote:
> It is agreed, then. I've updated the pull request. I'm trying to also
> update the KIP accordingly, but cwiki is being slow and dropping
> connections..... I'll try again a bit later but please consider the KIP
> updated for all intents and purposes. heh.
> 
> On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wa...@gmail.com> wrote:
> 
>> That makes sense.
>>
>>
>> Guozhang
>>
>> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me> wrote:
>>
>>> Interesting. I'm not sure I agree. I've been bitten many times by
>>> unintentionally shipping code that fails to properly implement logging. I
>>> always discover this at the exact *worst* moment, too. (Normally at 3 AM
>>> during an on-call shift. Hah.) However, if others feel the same way I
>> could
>>> probably be convinced to remove it.
>>>
>>> We could also meet halfway and say that when a customized
>>> ProductionExceptionHandler instructs Streams to CONTINUE, we log at DEBUG
>>> level instead of WARN level. Would that alternative be appealing to you?
>>>
>>> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wa...@gmail.com>
>> wrote:
>>>
>>>> Thanks for the updates. I made a pass over the wiki again and it looks
>>>> good.
>>>>
>>>> About whether record collector should still internally log the error in
>>>> addition to what the customized ProductionExceptionHandler does. I
>>>> personally would prefer only to log if the returned value is FAIL to
>>>> indicate that this thread is going to shutdown and trigger the
>> exception
>>>> handler.
>>>>
>>>>
>>>> Guozhang
>>>>
>>>>
>>>> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me> wrote:
>>>>
>>>>> Hello, a bit later than I'd anticipated, but I've updated this KIP as
>>>>> outlined above. The updated KIP is now ready for review again!
>>>>>
>>>>> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me> wrote:
>>>>>
>>>>>> Ah. I actually created both of those in the PR and forgot to
>> mention
>>>> them
>>>>>> by name in the KIP! Thanks for pointing out the oversight.
>>>>>>
>>>>>> I’ll revise the KIP this afternoon accordingly.
>>>>>>
>>>>>> The logging is actually provided for in the record collector.
>>> Whenever
>>>> a
>>>>>> handler continues it’ll log a warning to ensure that it’s
>>> *impossible*
>>>> to
>>>>>> write a handler that totally suppresses production exceptions from
>>> the
>>>>> log.
>>>>>> As such, the default continue handler just returns the continue
>>> value.
>>>> I
>>>>>> can add details about those semantics to the KIP as well.
>>>>>> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
>>> matthias@confluent.io
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> One more comment.
>>>>>>>
>>>>>>> You mention a default implementation for the handler that fails. I
>>>>>>> think, this should be part of the public API and thus should have
>> a
>>>>>>> proper defined named that is mentioned in the KIP.
>>>>>>>
>>>>>>> We could also add a second implementation for the log-and-move-on
>>>>>>> strategy, as both are the two most common cases. This class should
>>>> also
>>>>>>> be part of public API (so users can just set in the config) with a
>>>>>>> proper name.
>>>>>>>
>>>>>>>
>>>>>>> Otherwise, I like the KIP a lot! Thanks.
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>>>>>>>> Thanks for the heads up. Yes, I think my changes are compatible
>>> with
>>>>>>> that
>>>>>>>> PR, but there will be a merge conflict that happens whenever one
>>> of
>>>>> the
>>>>>>> PRs
>>>>>>>> is merged. Happy to reconcile the changes in my PR if 4148 goes
>> in
>>>>>>> first. :)
>>>>>>>>
>>>>>>>> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
>> wangguoz@gmail.com
>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> That sounds reasonable, thanks Matt.
>>>>>>>>>
>>>>>>>>> As for the implementation, please note that there is another
>>>> ongoing
>>>>> PR
>>>>>>>>> that may touch the same classes that you are working on:
>>>>>>>>> https://github.com/apache/kafka/pull/4148
>>>>>>>>>
>>>>>>>>> So it may help if you can also take a look at that PR and see
>> if
>>> it
>>>>> is
>>>>>>>>> compatible with your changes.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Guozhang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me>
>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I've opened this pull request to implement the KIP as
>> currently
>>>>>>> written:
>>>>>>>>>> https://github.com/apache/kafka/pull/4165. It still needs
>> some
>>>>> tests
>>>>>>>>>> added,
>>>>>>>>>> but largely represents the shape I was going for.
>>>>>>>>>>
>>>>>>>>>> If there are more points that folks would like to discuss,
>>> please
>>>>> let
>>>>>>> me
>>>>>>>>>> know. If I don't hear anything by tomorrow afternoon I'll
>>> probably
>>>>>>> start
>>>>>>>>> a
>>>>>>>>>> [VOTE] thread.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Matt
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I can’t think of a reason that would be problematic.
>>>>>>>>>>>
>>>>>>>>>>> Most of the time I would write a handler like this, I either
>>> want
>>>>> to
>>>>>>>>>>> ignore the error or fail and bring everything down so that I
>>> can
>>>>> spin
>>>>>>>>> it
>>>>>>>>>>> back up later and resume from earlier offsets. When we start
>> up
>>>>> after
>>>>>>>>>>> crashing we’ll eventually try to process the message we
>> failed
>>> to
>>>>>>>>> produce
>>>>>>>>>>> again.
>>>>>>>>>>>
>>>>>>>>>>> I’m concerned that “putting in a queue for later” opens you
>> up
>>> to
>>>>>>>>> putting
>>>>>>>>>>> messages into the destination topic in an unexpected order.
>>>> However
>>>>>>> if
>>>>>>>>>>> others feel differently, I’m happy to talk about it.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
>>>> wangguoz@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>> that
>>>> the
>>>>>>>>>> record
>>>>>>>>>>>>> metadata is always null if an exception occurred while
>> trying
>>>> to
>>>>>>>>>>>> produce.
>>>>>>>>>>>>
>>>>>>>>>>>> That is right. Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> I looked at the example code, and one thing I realized that
>>>> since
>>>>> we
>>>>>>>>> are
>>>>>>>>>>>> not passing the context in the handle function, we may not
>> be
>>>>>>>>> implement
>>>>>>>>>>>> the
>>>>>>>>>>>> logic to send the fail records into another queue for future
>>>>>>>>> processing.
>>>>>>>>>>>> Would people think that would be a big issue?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <matt@frmr.me
>>>
>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've updated the KIP based on this conversation, and made
>> it
>>> so
>>>>>>> that
>>>>>>>>>> its
>>>>>>>>>>>>> interface, config setting, and parameters line up more
>>> closely
>>>>> with
>>>>>>>>>> the
>>>>>>>>>>>>> interface in KIP-161 (deserialization handler).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I believe there are a few specific questions I need to
>> reply
>>>>>>> to.....
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The question I had about then handle parameters are around
>>> the
>>>>>>>>>> record,
>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>> generics
>>>> of
>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>> `ProducerRecord<?
>>>>>>>>>>>> extends
>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>
>>>>>>>>>>>>> At this point in the code we're guaranteed that this is a
>>>>>>>>>>>>> ProducerRecord<byte[], byte[]>, so the generics would just
>>> make
>>>>> it
>>>>>>>>>>>> harder
>>>>>>>>>>>>> to work with the key and value.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, should the handle function include the
>>> `RecordMetadata`
>>>> as
>>>>>>>>>> well
>>>>>>>>>>>> in
>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please correct me if I'm wrong, but my understanding is
>> that
>>>> the
>>>>>>>>>> record
>>>>>>>>>>>>> metadata is always null if an exception occurred while
>> trying
>>>> to
>>>>>>>>>>>> produce.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> We may probably try to write down at least the following
>>>>> handling
>>>>>>>>>>>> logic
>>>>>>>>>>>>> and
>>>>>>>>>>>>>> see if the given API is sufficient for it
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've added some examples to the KIP. Let me know what you
>>>> think.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Matt
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me>
>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for this feedback. I’m at a conference right now
>> and
>>> am
>>>>>>>>>>>> planning
>>>>>>>>>>>>> on
>>>>>>>>>>>>>> updating the KIP again with details from this conversation
>>>> later
>>>>>>>>>> this
>>>>>>>>>>>>> week.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I’ll shoot you a more detailed response then! :)
>>>>>>>>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
>>>>> wangguoz@gmail.com
>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the KIP Matt.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regarding the handle interface of
>>>>> ProductionExceptionHandlerResp
>>>>>>>>>> onse,
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>> you write it on the wiki also, along with the actual
>> added
>>>>> config
>>>>>>>>>>>> names
>>>>>>>>>>>>>>> (e.g. what
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 161%3A+streams+
>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>> described).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The question I had about then handle parameters are
>> around
>>>> the
>>>>>>>>>>>> record,
>>>>>>>>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
>>> generics
>>>>> of
>>>>>>>>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
>>>> `ProducerRecord<?
>>>>>>>>>>>> extends
>>>>>>>>>>>>>>> Object, ? extends Object>`?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, should the handle function include the
>>> `RecordMetadata`
>>>>> as
>>>>>>>>>>>> well in
>>>>>>>>>>>>>>> case it is not null?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We may probably try to write down at least the following
>>>>> handling
>>>>>>>>>>>> logic
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> see if the given API is sufficient for it: 1) throw
>>> exception
>>>>>>>>>>>>> immediately
>>>>>>>>>>>>>>> to fail fast and stop the world, 2) log the error and
>> drop
>>>>> record
>>>>>>>>>> and
>>>>>>>>>>>>>>> proceed silently, 3) send such errors to a specific
>> "error"
>>>>> Kafka
>>>>>>>>>>>> topic,
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>> record it as an app-level metrics (
>>>>>>>>>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_
>>>>> monitoring
>>>>>>>>> )
>>>>>>>>>>>> for
>>>>>>>>>>>>>>> monitoring.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
>> matt@frmr.me
>>>>
>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I did some more digging tonight.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> @Ted: It looks like the deserialization handler uses
>>>>>>>>>>>>>>>> "default.deserialization.exception.handler" for the
>>> config
>>>>>>>>>> name. No
>>>>>>>>>>>>>>>> ".class" on the end. I'm inclined to think this should
>> use
>>>>>>>>>>>>>>>> "default.production.exception.handler".
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
>> matt@frmr.me
>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Okay, I've dug into this a little bit.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think getting access to the serialized record is
>>>> possible,
>>>>>>>>>> and
>>>>>>>>>>>>>>> changing
>>>>>>>>>>>>>>>>> the naming and return type is certainly doable.
>> However,
>>>>>>>>>> because
>>>>>>>>>>>>> we're
>>>>>>>>>>>>>>>>> hooking into the onCompletion callback we have no
>>> guarantee
>>>>>>>>>> that
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> ProcessorContext state hasn't changed by the time this
>>>>>>>>>> particular
>>>>>>>>>>>>>>> handler
>>>>>>>>>>>>>>>>> runs. So I think the signature would change to
>> something
>>>>>>>>> like:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
>>>>>>>>>>>> ProducerRecord<..>
>>>>>>>>>>>>>>>> record,
>>>>>>>>>>>>>>>>> final Exception exception)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Would this be acceptable?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
>>> matt@frmr.me>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and
>> return
>>>> type
>>>>>>>>>> but
>>>>>>>>>>>>> I’m
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>> sure if I can get my hands on the context and the
>> record
>>>>>>>>>> itself
>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>> other changes.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Let me dig in and follow up here tomorrow.
>>>>>>>>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>>>>>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Are you familiar with KIP-161?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 161%3A+streams+
>>>>>>>>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I thinks, we should align the design (parameter
>> naming,
>>>>>>>>>> return
>>>>>>>>>>>>>>> types,
>>>>>>>>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a
>> unified
>>>>>>>>> user
>>>>>>>>>>>>>>>> experience.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>>>>>>>>>>>>>>>>>>>> I’ll create the JIRA ticket.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I think that config name will work. I’ll update the
>>> KIP
>>>>>>>>>>>>>>> accordingly.
>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>>>>>>>>>> yuzhihong@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> For the new config, how about naming it
>>>>>>>>>>>>>>>>>>>>> production.exception.processor.class
>>>>>>>>>>>>>>>>>>>>> ? This way it is clear that class name should be
>>>>>>>>>> specified.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
>>>>>>>>>> matt@frmr.me>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This is the discussion thread for the KIP that I
>>> just
>>>>>>>>>> filed
>>>>>>>>>>>>>>> here:
>>>>>>>>>>>>>>>>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>> 210+-+Provide+for+custom+
>>> error+handling++when+Kafka+
>>>>>>>>>>>>>>>>>>>>>> Streams+fails+to+produce
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Looking forward to getting some feedback from
>> folks
>>>>>>>>> about
>>>>>>>>>>>> this
>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> working toward a solution we can contribute back.
>> :)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>> Matt Farmer
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> -- Guozhang
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -- Guozhang
>>>>
>>>
>>
>>
>>
>> --
>> -- Guozhang
>>
> 


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
It is agreed, then. I've updated the pull request. I'm trying to also
update the KIP accordingly, but cwiki is being slow and dropping
connections..... I'll try again a bit later but please consider the KIP
updated for all intents and purposes. heh.

On Sun, Nov 5, 2017 at 3:45 PM Guozhang Wang <wa...@gmail.com> wrote:

> That makes sense.
>
>
> Guozhang
>
> On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me> wrote:
>
> > Interesting. I'm not sure I agree. I've been bitten many times by
> > unintentionally shipping code that fails to properly implement logging. I
> > always discover this at the exact *worst* moment, too. (Normally at 3 AM
> > during an on-call shift. Hah.) However, if others feel the same way I
> could
> > probably be convinced to remove it.
> >
> > We could also meet halfway and say that when a customized
> > ProductionExceptionHandler instructs Streams to CONTINUE, we log at DEBUG
> > level instead of WARN level. Would that alternative be appealing to you?
> >
> > On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> > > Thanks for the updates. I made a pass over the wiki again and it looks
> > > good.
> > >
> > > About whether record collector should still internally log the error in
> > > addition to what the customized ProductionExceptionHandler does. I
> > > personally would prefer only to log if the returned value is FAIL to
> > > indicate that this thread is going to shutdown and trigger the
> exception
> > > handler.
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me> wrote:
> > >
> > > > Hello, a bit later than I'd anticipated, but I've updated this KIP as
> > > > outlined above. The updated KIP is now ready for review again!
> > > >
> > > > On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me> wrote:
> > > >
> > > > > Ah. I actually created both of those in the PR and forgot to
> mention
> > > them
> > > > > by name in the KIP! Thanks for pointing out the oversight.
> > > > >
> > > > > I’ll revise the KIP this afternoon accordingly.
> > > > >
> > > > > The logging is actually provided for in the record collector.
> > Whenever
> > > a
> > > > > handler continues it’ll log a warning to ensure that it’s
> > *impossible*
> > > to
> > > > > write a handler that totally suppresses production exceptions from
> > the
> > > > log.
> > > > > As such, the default continue handler just returns the continue
> > value.
> > > I
> > > > > can add details about those semantics to the KIP as well.
> > > > > On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
> > matthias@confluent.io
> > > >
> > > > > wrote:
> > > > >
> > > > >> One more comment.
> > > > >>
> > > > >> You mention a default implementation for the handler that fails. I
> > > > >> think, this should be part of the public API and thus should have
> a
> > > > >> proper defined named that is mentioned in the KIP.
> > > > >>
> > > > >> We could also add a second implementation for the log-and-move-on
> > > > >> strategy, as both are the two most common cases. This class should
> > > also
> > > > >> be part of public API (so users can just set in the config) with a
> > > > >> proper name.
> > > > >>
> > > > >>
> > > > >> Otherwise, I like the KIP a lot! Thanks.
> > > > >>
> > > > >>
> > > > >> -Matthias
> > > > >>
> > > > >>
> > > > >> On 11/1/17 12:23 AM, Matt Farmer wrote:
> > > > >> > Thanks for the heads up. Yes, I think my changes are compatible
> > with
> > > > >> that
> > > > >> > PR, but there will be a merge conflict that happens whenever one
> > of
> > > > the
> > > > >> PRs
> > > > >> > is merged. Happy to reconcile the changes in my PR if 4148 goes
> in
> > > > >> first. :)
> > > > >> >
> > > > >> > On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <
> wangguoz@gmail.com
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> >> That sounds reasonable, thanks Matt.
> > > > >> >>
> > > > >> >> As for the implementation, please note that there is another
> > > ongoing
> > > > PR
> > > > >> >> that may touch the same classes that you are working on:
> > > > >> >> https://github.com/apache/kafka/pull/4148
> > > > >> >>
> > > > >> >> So it may help if you can also take a look at that PR and see
> if
> > it
> > > > is
> > > > >> >> compatible with your changes.
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >> >> Guozhang
> > > > >> >>
> > > > >> >>
> > > > >> >> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me>
> > > wrote:
> > > > >> >>
> > > > >> >>> I've opened this pull request to implement the KIP as
> currently
> > > > >> written:
> > > > >> >>> https://github.com/apache/kafka/pull/4165. It still needs
> some
> > > > tests
> > > > >> >>> added,
> > > > >> >>> but largely represents the shape I was going for.
> > > > >> >>>
> > > > >> >>> If there are more points that folks would like to discuss,
> > please
> > > > let
> > > > >> me
> > > > >> >>> know. If I don't hear anything by tomorrow afternoon I'll
> > probably
> > > > >> start
> > > > >> >> a
> > > > >> >>> [VOTE] thread.
> > > > >> >>>
> > > > >> >>> Thanks,
> > > > >> >>> Matt
> > > > >> >>>
> > > > >> >>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
> > wrote:
> > > > >> >>>
> > > > >> >>>> I can’t think of a reason that would be problematic.
> > > > >> >>>>
> > > > >> >>>> Most of the time I would write a handler like this, I either
> > want
> > > > to
> > > > >> >>>> ignore the error or fail and bring everything down so that I
> > can
> > > > spin
> > > > >> >> it
> > > > >> >>>> back up later and resume from earlier offsets. When we start
> up
> > > > after
> > > > >> >>>> crashing we’ll eventually try to process the message we
> failed
> > to
> > > > >> >> produce
> > > > >> >>>> again.
> > > > >> >>>>
> > > > >> >>>> I’m concerned that “putting in a queue for later” opens you
> up
> > to
> > > > >> >> putting
> > > > >> >>>> messages into the destination topic in an unexpected order.
> > > However
> > > > >> if
> > > > >> >>>> others feel differently, I’m happy to talk about it.
> > > > >> >>>>
> > > > >> >>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
> > > wangguoz@gmail.com>
> > > > >> >>> wrote:
> > > > >> >>>>
> > > > >> >>>>>> Please correct me if I'm wrong, but my understanding is
> that
> > > the
> > > > >> >>> record
> > > > >> >>>>>> metadata is always null if an exception occurred while
> trying
> > > to
> > > > >> >>>>> produce.
> > > > >> >>>>>
> > > > >> >>>>> That is right. Thanks.
> > > > >> >>>>>
> > > > >> >>>>> I looked at the example code, and one thing I realized that
> > > since
> > > > we
> > > > >> >> are
> > > > >> >>>>> not passing the context in the handle function, we may not
> be
> > > > >> >> implement
> > > > >> >>>>> the
> > > > >> >>>>> logic to send the fail records into another queue for future
> > > > >> >> processing.
> > > > >> >>>>> Would people think that would be a big issue?
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> Guozhang
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <matt@frmr.me
> >
> > > > wrote:
> > > > >> >>>>>
> > > > >> >>>>>> Hello all,
> > > > >> >>>>>>
> > > > >> >>>>>> I've updated the KIP based on this conversation, and made
> it
> > so
> > > > >> that
> > > > >> >>> its
> > > > >> >>>>>> interface, config setting, and parameters line up more
> > closely
> > > > with
> > > > >> >>> the
> > > > >> >>>>>> interface in KIP-161 (deserialization handler).
> > > > >> >>>>>>
> > > > >> >>>>>> I believe there are a few specific questions I need to
> reply
> > > > >> to.....
> > > > >> >>>>>>
> > > > >> >>>>>>> The question I had about then handle parameters are around
> > the
> > > > >> >>> record,
> > > > >> >>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
> > generics
> > > of
> > > > >> >>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> > > `ProducerRecord<?
> > > > >> >>>>> extends
> > > > >> >>>>>>> Object, ? extends Object>`?
> > > > >> >>>>>>
> > > > >> >>>>>> At this point in the code we're guaranteed that this is a
> > > > >> >>>>>> ProducerRecord<byte[], byte[]>, so the generics would just
> > make
> > > > it
> > > > >> >>>>> harder
> > > > >> >>>>>> to work with the key and value.
> > > > >> >>>>>>
> > > > >> >>>>>>> Also, should the handle function include the
> > `RecordMetadata`
> > > as
> > > > >> >>> well
> > > > >> >>>>> in
> > > > >> >>>>>>> case it is not null?
> > > > >> >>>>>>
> > > > >> >>>>>> Please correct me if I'm wrong, but my understanding is
> that
> > > the
> > > > >> >>> record
> > > > >> >>>>>> metadata is always null if an exception occurred while
> trying
> > > to
> > > > >> >>>>> produce.
> > > > >> >>>>>>
> > > > >> >>>>>>> We may probably try to write down at least the following
> > > > handling
> > > > >> >>>>> logic
> > > > >> >>>>>> and
> > > > >> >>>>>>> see if the given API is sufficient for it
> > > > >> >>>>>>
> > > > >> >>>>>> I've added some examples to the KIP. Let me know what you
> > > think.
> > > > >> >>>>>>
> > > > >> >>>>>> Cheers,
> > > > >> >>>>>> Matt
> > > > >> >>>>>>
> > > > >> >>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me>
> > > > wrote:
> > > > >> >>>>>>
> > > > >> >>>>>>> Thanks for this feedback. I’m at a conference right now
> and
> > am
> > > > >> >>>>> planning
> > > > >> >>>>>> on
> > > > >> >>>>>>> updating the KIP again with details from this conversation
> > > later
> > > > >> >>> this
> > > > >> >>>>>> week.
> > > > >> >>>>>>>
> > > > >> >>>>>>> I’ll shoot you a more detailed response then! :)
> > > > >> >>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
> > > > wangguoz@gmail.com
> > > > >> >>>
> > > > >> >>>>>> wrote:
> > > > >> >>>>>>>
> > > > >> >>>>>>>> Thanks for the KIP Matt.
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> Regarding the handle interface of
> > > > ProductionExceptionHandlerResp
> > > > >> >>> onse,
> > > > >> >>>>>>>> could
> > > > >> >>>>>>>> you write it on the wiki also, along with the actual
> added
> > > > config
> > > > >> >>>>> names
> > > > >> >>>>>>>> (e.g. what
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>
> > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 161%3A+streams+
> > > > >> >>>>>> deserialization+exception+handlers
> > > > >> >>>>>>>> described).
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> The question I had about then handle parameters are
> around
> > > the
> > > > >> >>>>> record,
> > > > >> >>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
> > generics
> > > > of
> > > > >> >>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> > > `ProducerRecord<?
> > > > >> >>>>> extends
> > > > >> >>>>>>>> Object, ? extends Object>`?
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> Also, should the handle function include the
> > `RecordMetadata`
> > > > as
> > > > >> >>>>> well in
> > > > >> >>>>>>>> case it is not null?
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> We may probably try to write down at least the following
> > > > handling
> > > > >> >>>>> logic
> > > > >> >>>>>>>> and
> > > > >> >>>>>>>> see if the given API is sufficient for it: 1) throw
> > exception
> > > > >> >>>>>> immediately
> > > > >> >>>>>>>> to fail fast and stop the world, 2) log the error and
> drop
> > > > record
> > > > >> >>> and
> > > > >> >>>>>>>> proceed silently, 3) send such errors to a specific
> "error"
> > > > Kafka
> > > > >> >>>>> topic,
> > > > >> >>>>>>>> or
> > > > >> >>>>>>>> record it as an app-level metrics (
> > > > >> >>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_
> > > > monitoring
> > > > >> >> )
> > > > >> >>>>> for
> > > > >> >>>>>>>> monitoring.
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> Guozhang
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <
> matt@frmr.me
> > >
> > > > >> >> wrote:
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>> I did some more digging tonight.
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> @Ted: It looks like the deserialization handler uses
> > > > >> >>>>>>>>> "default.deserialization.exception.handler" for the
> > config
> > > > >> >>> name. No
> > > > >> >>>>>>>>> ".class" on the end. I'm inclined to think this should
> use
> > > > >> >>>>>>>>> "default.production.exception.handler".
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <
> matt@frmr.me
> > >
> > > > >> >>> wrote:
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>>> Okay, I've dug into this a little bit.
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>> I think getting access to the serialized record is
> > > possible,
> > > > >> >>> and
> > > > >> >>>>>>>> changing
> > > > >> >>>>>>>>>> the naming and return type is certainly doable.
> However,
> > > > >> >>> because
> > > > >> >>>>>> we're
> > > > >> >>>>>>>>>> hooking into the onCompletion callback we have no
> > guarantee
> > > > >> >>> that
> > > > >> >>>>> the
> > > > >> >>>>>>>>>> ProcessorContext state hasn't changed by the time this
> > > > >> >>> particular
> > > > >> >>>>>>>> handler
> > > > >> >>>>>>>>>> runs. So I think the signature would change to
> something
> > > > >> >> like:
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>> ProductionExceptionHandlerResponse handle(final
> > > > >> >>>>> ProducerRecord<..>
> > > > >> >>>>>>>>> record,
> > > > >> >>>>>>>>>> final Exception exception)
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>> Would this be acceptable?
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
> > matt@frmr.me>
> > > > >> >>>>> wrote:
> > > > >> >>>>>>>>>>
> > > > >> >>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and
> return
> > > type
> > > > >> >>> but
> > > > >> >>>>>> I’m
> > > > >> >>>>>>>> not
> > > > >> >>>>>>>>>>> sure if I can get my hands on the context and the
> record
> > > > >> >>> itself
> > > > >> >>>>>>>> without
> > > > >> >>>>>>>>>>> other changes.
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>> Let me dig in and follow up here tomorrow.
> > > > >> >>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
> > > > >> >>>>>>>> matthias@confluent.io>
> > > > >> >>>>>>>>>>> wrote:
> > > > >> >>>>>>>>>>>
> > > > >> >>>>>>>>>>>> Thanks for the KIP.
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>> Are you familiar with KIP-161?
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>
> > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 161%3A+streams+
> > > > >> >>>>>>>>> deserialization+exception+handlers
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>> I thinks, we should align the design (parameter
> naming,
> > > > >> >>> return
> > > > >> >>>>>>>> types,
> > > > >> >>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a
> unified
> > > > >> >> user
> > > > >> >>>>>>>>> experience.
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>> -Matthias
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> > > > >> >>>>>>>>>>>>> I’ll create the JIRA ticket.
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>> I think that config name will work. I’ll update the
> > KIP
> > > > >> >>>>>>>> accordingly.
> > > > >> >>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
> > > > >> >>> yuzhihong@gmail.com>
> > > > >> >>>>>>>> wrote:
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ?
> > > > >> >>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>> For the new config, how about naming it
> > > > >> >>>>>>>>>>>>>> production.exception.processor.class
> > > > >> >>>>>>>>>>>>>> ? This way it is clear that class name should be
> > > > >> >>> specified.
> > > > >> >>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>> Cheers
> > > > >> >>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
> > > > >> >>> matt@frmr.me>
> > > > >> >>>>>>>> wrote:
> > > > >> >>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>>> Hello everyone,
> > > > >> >>>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>>> This is the discussion thread for the KIP that I
> > just
> > > > >> >>> filed
> > > > >> >>>>>>>> here:
> > > > >> >>>>>>>>>>>>>>>
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > >> >>>>>>>>>>>>>>> 210+-+Provide+for+custom+
> > error+handling++when+Kafka+
> > > > >> >>>>>>>>>>>>>>> Streams+fails+to+produce
> > > > >> >>>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>>> Looking forward to getting some feedback from
> folks
> > > > >> >> about
> > > > >> >>>>> this
> > > > >> >>>>>>>> idea
> > > > >> >>>>>>>>>>>> and
> > > > >> >>>>>>>>>>>>>>> working toward a solution we can contribute back.
> :)
> > > > >> >>>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>>> Cheers,
> > > > >> >>>>>>>>>>>>>>> Matt Farmer
> > > > >> >>>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>>>>
> > > > >> >>>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>>
> > > > >> >>>>>>>> --
> > > > >> >>>>>>>> -- Guozhang
> > > > >> >>>>>>>>
> > > > >> >>>>>>>
> > > > >> >>>>>>
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> --
> > > > >> >>>>> -- Guozhang
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >> >> --
> > > > >> >> -- Guozhang
> > > > >> >>
> > > > >> >
> > > > >>
> > > > >>
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Guozhang Wang <wa...@gmail.com>.
That makes sense.


Guozhang

On Sun, Nov 5, 2017 at 12:33 PM, Matt Farmer <ma...@frmr.me> wrote:

> Interesting. I'm not sure I agree. I've been bitten many times by
> unintentionally shipping code that fails to properly implement logging. I
> always discover this at the exact *worst* moment, too. (Normally at 3 AM
> during an on-call shift. Hah.) However, if others feel the same way I could
> probably be convinced to remove it.
>
> We could also meet halfway and say that when a customized
> ProductionExceptionHandler instructs Streams to CONTINUE, we log at DEBUG
> level instead of WARN level. Would that alternative be appealing to you?
>
> On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wa...@gmail.com> wrote:
>
> > Thanks for the updates. I made a pass over the wiki again and it looks
> > good.
> >
> > About whether record collector should still internally log the error in
> > addition to what the customized ProductionExceptionHandler does. I
> > personally would prefer only to log if the returned value is FAIL to
> > indicate that this thread is going to shutdown and trigger the exception
> > handler.
> >
> >
> > Guozhang
> >
> >
> > On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me> wrote:
> >
> > > Hello, a bit later than I'd anticipated, but I've updated this KIP as
> > > outlined above. The updated KIP is now ready for review again!
> > >
> > > On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me> wrote:
> > >
> > > > Ah. I actually created both of those in the PR and forgot to mention
> > them
> > > > by name in the KIP! Thanks for pointing out the oversight.
> > > >
> > > > I’ll revise the KIP this afternoon accordingly.
> > > >
> > > > The logging is actually provided for in the record collector.
> Whenever
> > a
> > > > handler continues it’ll log a warning to ensure that it’s
> *impossible*
> > to
> > > > write a handler that totally suppresses production exceptions from
> the
> > > log.
> > > > As such, the default continue handler just returns the continue
> value.
> > I
> > > > can add details about those semantics to the KIP as well.
> > > > On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <
> matthias@confluent.io
> > >
> > > > wrote:
> > > >
> > > >> One more comment.
> > > >>
> > > >> You mention a default implementation for the handler that fails. I
> > > >> think, this should be part of the public API and thus should have a
> > > >> proper defined named that is mentioned in the KIP.
> > > >>
> > > >> We could also add a second implementation for the log-and-move-on
> > > >> strategy, as both are the two most common cases. This class should
> > also
> > > >> be part of public API (so users can just set in the config) with a
> > > >> proper name.
> > > >>
> > > >>
> > > >> Otherwise, I like the KIP a lot! Thanks.
> > > >>
> > > >>
> > > >> -Matthias
> > > >>
> > > >>
> > > >> On 11/1/17 12:23 AM, Matt Farmer wrote:
> > > >> > Thanks for the heads up. Yes, I think my changes are compatible
> with
> > > >> that
> > > >> > PR, but there will be a merge conflict that happens whenever one
> of
> > > the
> > > >> PRs
> > > >> > is merged. Happy to reconcile the changes in my PR if 4148 goes in
> > > >> first. :)
> > > >> >
> > > >> > On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <wangguoz@gmail.com
> >
> > > >> wrote:
> > > >> >
> > > >> >> That sounds reasonable, thanks Matt.
> > > >> >>
> > > >> >> As for the implementation, please note that there is another
> > ongoing
> > > PR
> > > >> >> that may touch the same classes that you are working on:
> > > >> >> https://github.com/apache/kafka/pull/4148
> > > >> >>
> > > >> >> So it may help if you can also take a look at that PR and see if
> it
> > > is
> > > >> >> compatible with your changes.
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> Guozhang
> > > >> >>
> > > >> >>
> > > >> >> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me>
> > wrote:
> > > >> >>
> > > >> >>> I've opened this pull request to implement the KIP as currently
> > > >> written:
> > > >> >>> https://github.com/apache/kafka/pull/4165. It still needs some
> > > tests
> > > >> >>> added,
> > > >> >>> but largely represents the shape I was going for.
> > > >> >>>
> > > >> >>> If there are more points that folks would like to discuss,
> please
> > > let
> > > >> me
> > > >> >>> know. If I don't hear anything by tomorrow afternoon I'll
> probably
> > > >> start
> > > >> >> a
> > > >> >>> [VOTE] thread.
> > > >> >>>
> > > >> >>> Thanks,
> > > >> >>> Matt
> > > >> >>>
> > > >> >>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
> wrote:
> > > >> >>>
> > > >> >>>> I can’t think of a reason that would be problematic.
> > > >> >>>>
> > > >> >>>> Most of the time I would write a handler like this, I either
> want
> > > to
> > > >> >>>> ignore the error or fail and bring everything down so that I
> can
> > > spin
> > > >> >> it
> > > >> >>>> back up later and resume from earlier offsets. When we start up
> > > after
> > > >> >>>> crashing we’ll eventually try to process the message we failed
> to
> > > >> >> produce
> > > >> >>>> again.
> > > >> >>>>
> > > >> >>>> I’m concerned that “putting in a queue for later” opens you up
> to
> > > >> >> putting
> > > >> >>>> messages into the destination topic in an unexpected order.
> > However
> > > >> if
> > > >> >>>> others feel differently, I’m happy to talk about it.
> > > >> >>>>
> > > >> >>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
> > wangguoz@gmail.com>
> > > >> >>> wrote:
> > > >> >>>>
> > > >> >>>>>> Please correct me if I'm wrong, but my understanding is that
> > the
> > > >> >>> record
> > > >> >>>>>> metadata is always null if an exception occurred while trying
> > to
> > > >> >>>>> produce.
> > > >> >>>>>
> > > >> >>>>> That is right. Thanks.
> > > >> >>>>>
> > > >> >>>>> I looked at the example code, and one thing I realized that
> > since
> > > we
> > > >> >> are
> > > >> >>>>> not passing the context in the handle function, we may not be
> > > >> >> implement
> > > >> >>>>> the
> > > >> >>>>> logic to send the fail records into another queue for future
> > > >> >> processing.
> > > >> >>>>> Would people think that would be a big issue?
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> Guozhang
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <ma...@frmr.me>
> > > wrote:
> > > >> >>>>>
> > > >> >>>>>> Hello all,
> > > >> >>>>>>
> > > >> >>>>>> I've updated the KIP based on this conversation, and made it
> so
> > > >> that
> > > >> >>> its
> > > >> >>>>>> interface, config setting, and parameters line up more
> closely
> > > with
> > > >> >>> the
> > > >> >>>>>> interface in KIP-161 (deserialization handler).
> > > >> >>>>>>
> > > >> >>>>>> I believe there are a few specific questions I need to reply
> > > >> to.....
> > > >> >>>>>>
> > > >> >>>>>>> The question I had about then handle parameters are around
> the
> > > >> >>> record,
> > > >> >>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
> generics
> > of
> > > >> >>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> > `ProducerRecord<?
> > > >> >>>>> extends
> > > >> >>>>>>> Object, ? extends Object>`?
> > > >> >>>>>>
> > > >> >>>>>> At this point in the code we're guaranteed that this is a
> > > >> >>>>>> ProducerRecord<byte[], byte[]>, so the generics would just
> make
> > > it
> > > >> >>>>> harder
> > > >> >>>>>> to work with the key and value.
> > > >> >>>>>>
> > > >> >>>>>>> Also, should the handle function include the
> `RecordMetadata`
> > as
> > > >> >>> well
> > > >> >>>>> in
> > > >> >>>>>>> case it is not null?
> > > >> >>>>>>
> > > >> >>>>>> Please correct me if I'm wrong, but my understanding is that
> > the
> > > >> >>> record
> > > >> >>>>>> metadata is always null if an exception occurred while trying
> > to
> > > >> >>>>> produce.
> > > >> >>>>>>
> > > >> >>>>>>> We may probably try to write down at least the following
> > > handling
> > > >> >>>>> logic
> > > >> >>>>>> and
> > > >> >>>>>>> see if the given API is sufficient for it
> > > >> >>>>>>
> > > >> >>>>>> I've added some examples to the KIP. Let me know what you
> > think.
> > > >> >>>>>>
> > > >> >>>>>> Cheers,
> > > >> >>>>>> Matt
> > > >> >>>>>>
> > > >> >>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me>
> > > wrote:
> > > >> >>>>>>
> > > >> >>>>>>> Thanks for this feedback. I’m at a conference right now and
> am
> > > >> >>>>> planning
> > > >> >>>>>> on
> > > >> >>>>>>> updating the KIP again with details from this conversation
> > later
> > > >> >>> this
> > > >> >>>>>> week.
> > > >> >>>>>>>
> > > >> >>>>>>> I’ll shoot you a more detailed response then! :)
> > > >> >>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
> > > wangguoz@gmail.com
> > > >> >>>
> > > >> >>>>>> wrote:
> > > >> >>>>>>>
> > > >> >>>>>>>> Thanks for the KIP Matt.
> > > >> >>>>>>>>
> > > >> >>>>>>>> Regarding the handle interface of
> > > ProductionExceptionHandlerResp
> > > >> >>> onse,
> > > >> >>>>>>>> could
> > > >> >>>>>>>> you write it on the wiki also, along with the actual added
> > > config
> > > >> >>>>> names
> > > >> >>>>>>>> (e.g. what
> > > >> >>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>
> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 161%3A+streams+
> > > >> >>>>>> deserialization+exception+handlers
> > > >> >>>>>>>> described).
> > > >> >>>>>>>>
> > > >> >>>>>>>> The question I had about then handle parameters are around
> > the
> > > >> >>>>> record,
> > > >> >>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be
> generics
> > > of
> > > >> >>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> > `ProducerRecord<?
> > > >> >>>>> extends
> > > >> >>>>>>>> Object, ? extends Object>`?
> > > >> >>>>>>>>
> > > >> >>>>>>>> Also, should the handle function include the
> `RecordMetadata`
> > > as
> > > >> >>>>> well in
> > > >> >>>>>>>> case it is not null?
> > > >> >>>>>>>>
> > > >> >>>>>>>> We may probably try to write down at least the following
> > > handling
> > > >> >>>>> logic
> > > >> >>>>>>>> and
> > > >> >>>>>>>> see if the given API is sufficient for it: 1) throw
> exception
> > > >> >>>>>> immediately
> > > >> >>>>>>>> to fail fast and stop the world, 2) log the error and drop
> > > record
> > > >> >>> and
> > > >> >>>>>>>> proceed silently, 3) send such errors to a specific "error"
> > > Kafka
> > > >> >>>>> topic,
> > > >> >>>>>>>> or
> > > >> >>>>>>>> record it as an app-level metrics (
> > > >> >>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_
> > > monitoring
> > > >> >> )
> > > >> >>>>> for
> > > >> >>>>>>>> monitoring.
> > > >> >>>>>>>>
> > > >> >>>>>>>> Guozhang
> > > >> >>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <matt@frmr.me
> >
> > > >> >> wrote:
> > > >> >>>>>>>>
> > > >> >>>>>>>>> I did some more digging tonight.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> @Ted: It looks like the deserialization handler uses
> > > >> >>>>>>>>> "default.deserialization.exception.handler" for the
> config
> > > >> >>> name. No
> > > >> >>>>>>>>> ".class" on the end. I'm inclined to think this should use
> > > >> >>>>>>>>> "default.production.exception.handler".
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <matt@frmr.me
> >
> > > >> >>> wrote:
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>> Okay, I've dug into this a little bit.
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> I think getting access to the serialized record is
> > possible,
> > > >> >>> and
> > > >> >>>>>>>> changing
> > > >> >>>>>>>>>> the naming and return type is certainly doable. However,
> > > >> >>> because
> > > >> >>>>>> we're
> > > >> >>>>>>>>>> hooking into the onCompletion callback we have no
> guarantee
> > > >> >>> that
> > > >> >>>>> the
> > > >> >>>>>>>>>> ProcessorContext state hasn't changed by the time this
> > > >> >>> particular
> > > >> >>>>>>>> handler
> > > >> >>>>>>>>>> runs. So I think the signature would change to something
> > > >> >> like:
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> ProductionExceptionHandlerResponse handle(final
> > > >> >>>>> ProducerRecord<..>
> > > >> >>>>>>>>> record,
> > > >> >>>>>>>>>> final Exception exception)
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> Would this be acceptable?
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <
> matt@frmr.me>
> > > >> >>>>> wrote:
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and return
> > type
> > > >> >>> but
> > > >> >>>>>> I’m
> > > >> >>>>>>>> not
> > > >> >>>>>>>>>>> sure if I can get my hands on the context and the record
> > > >> >>> itself
> > > >> >>>>>>>> without
> > > >> >>>>>>>>>>> other changes.
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> Let me dig in and follow up here tomorrow.
> > > >> >>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
> > > >> >>>>>>>> matthias@confluent.io>
> > > >> >>>>>>>>>>> wrote:
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>>> Thanks for the KIP.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> Are you familiar with KIP-161?
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>
> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 161%3A+streams+
> > > >> >>>>>>>>> deserialization+exception+handlers
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> I thinks, we should align the design (parameter naming,
> > > >> >>> return
> > > >> >>>>>>>> types,
> > > >> >>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a unified
> > > >> >> user
> > > >> >>>>>>>>> experience.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> -Matthias
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> > > >> >>>>>>>>>>>>> I’ll create the JIRA ticket.
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>> I think that config name will work. I’ll update the
> KIP
> > > >> >>>>>>>> accordingly.
> > > >> >>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
> > > >> >>> yuzhihong@gmail.com>
> > > >> >>>>>>>> wrote:
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ?
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>> For the new config, how about naming it
> > > >> >>>>>>>>>>>>>> production.exception.processor.class
> > > >> >>>>>>>>>>>>>> ? This way it is clear that class name should be
> > > >> >>> specified.
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>> Cheers
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
> > > >> >>> matt@frmr.me>
> > > >> >>>>>>>> wrote:
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>>> Hello everyone,
> > > >> >>>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>>> This is the discussion thread for the KIP that I
> just
> > > >> >>> filed
> > > >> >>>>>>>> here:
> > > >> >>>>>>>>>>>>>>>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >> >>>>>>>>>>>>>>> 210+-+Provide+for+custom+
> error+handling++when+Kafka+
> > > >> >>>>>>>>>>>>>>> Streams+fails+to+produce
> > > >> >>>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>>> Looking forward to getting some feedback from folks
> > > >> >> about
> > > >> >>>>> this
> > > >> >>>>>>>> idea
> > > >> >>>>>>>>>>>> and
> > > >> >>>>>>>>>>>>>>> working toward a solution we can contribute back. :)
> > > >> >>>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>>> Cheers,
> > > >> >>>>>>>>>>>>>>> Matt Farmer
> > > >> >>>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>>
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>> --
> > > >> >>>>>>>> -- Guozhang
> > > >> >>>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> --
> > > >> >>>>> -- Guozhang
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >> -- Guozhang
> > > >> >>
> > > >> >
> > > >>
> > > >>
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
Interesting. I'm not sure I agree. I've been bitten many times by
unintentionally shipping code that fails to properly implement logging. I
always discover this at the exact *worst* moment, too. (Normally at 3 AM
during an on-call shift. Hah.) However, if others feel the same way I could
probably be convinced to remove it.

We could also meet halfway and say that when a customized
ProductionExceptionHandler instructs Streams to CONTINUE, we log at DEBUG
level instead of WARN level. Would that alternative be appealing to you?

On Sun, Nov 5, 2017 at 12:32 PM Guozhang Wang <wa...@gmail.com> wrote:

> Thanks for the updates. I made a pass over the wiki again and it looks
> good.
>
> About whether record collector should still internally log the error in
> addition to what the customized ProductionExceptionHandler does. I
> personally would prefer only to log if the returned value is FAIL to
> indicate that this thread is going to shutdown and trigger the exception
> handler.
>
>
> Guozhang
>
>
> On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me> wrote:
>
> > Hello, a bit later than I'd anticipated, but I've updated this KIP as
> > outlined above. The updated KIP is now ready for review again!
> >
> > On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me> wrote:
> >
> > > Ah. I actually created both of those in the PR and forgot to mention
> them
> > > by name in the KIP! Thanks for pointing out the oversight.
> > >
> > > I’ll revise the KIP this afternoon accordingly.
> > >
> > > The logging is actually provided for in the record collector. Whenever
> a
> > > handler continues it’ll log a warning to ensure that it’s *impossible*
> to
> > > write a handler that totally suppresses production exceptions from the
> > log.
> > > As such, the default continue handler just returns the continue value.
> I
> > > can add details about those semantics to the KIP as well.
> > > On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <matthias@confluent.io
> >
> > > wrote:
> > >
> > >> One more comment.
> > >>
> > >> You mention a default implementation for the handler that fails. I
> > >> think, this should be part of the public API and thus should have a
> > >> proper defined named that is mentioned in the KIP.
> > >>
> > >> We could also add a second implementation for the log-and-move-on
> > >> strategy, as both are the two most common cases. This class should
> also
> > >> be part of public API (so users can just set in the config) with a
> > >> proper name.
> > >>
> > >>
> > >> Otherwise, I like the KIP a lot! Thanks.
> > >>
> > >>
> > >> -Matthias
> > >>
> > >>
> > >> On 11/1/17 12:23 AM, Matt Farmer wrote:
> > >> > Thanks for the heads up. Yes, I think my changes are compatible with
> > >> that
> > >> > PR, but there will be a merge conflict that happens whenever one of
> > the
> > >> PRs
> > >> > is merged. Happy to reconcile the changes in my PR if 4148 goes in
> > >> first. :)
> > >> >
> > >> > On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <wa...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> That sounds reasonable, thanks Matt.
> > >> >>
> > >> >> As for the implementation, please note that there is another
> ongoing
> > PR
> > >> >> that may touch the same classes that you are working on:
> > >> >> https://github.com/apache/kafka/pull/4148
> > >> >>
> > >> >> So it may help if you can also take a look at that PR and see if it
> > is
> > >> >> compatible with your changes.
> > >> >>
> > >> >>
> > >> >>
> > >> >> Guozhang
> > >> >>
> > >> >>
> > >> >> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me>
> wrote:
> > >> >>
> > >> >>> I've opened this pull request to implement the KIP as currently
> > >> written:
> > >> >>> https://github.com/apache/kafka/pull/4165. It still needs some
> > tests
> > >> >>> added,
> > >> >>> but largely represents the shape I was going for.
> > >> >>>
> > >> >>> If there are more points that folks would like to discuss, please
> > let
> > >> me
> > >> >>> know. If I don't hear anything by tomorrow afternoon I'll probably
> > >> start
> > >> >> a
> > >> >>> [VOTE] thread.
> > >> >>>
> > >> >>> Thanks,
> > >> >>> Matt
> > >> >>>
> > >> >>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:
> > >> >>>
> > >> >>>> I can’t think of a reason that would be problematic.
> > >> >>>>
> > >> >>>> Most of the time I would write a handler like this, I either want
> > to
> > >> >>>> ignore the error or fail and bring everything down so that I can
> > spin
> > >> >> it
> > >> >>>> back up later and resume from earlier offsets. When we start up
> > after
> > >> >>>> crashing we’ll eventually try to process the message we failed to
> > >> >> produce
> > >> >>>> again.
> > >> >>>>
> > >> >>>> I’m concerned that “putting in a queue for later” opens you up to
> > >> >> putting
> > >> >>>> messages into the destination topic in an unexpected order.
> However
> > >> if
> > >> >>>> others feel differently, I’m happy to talk about it.
> > >> >>>>
> > >> >>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <
> wangguoz@gmail.com>
> > >> >>> wrote:
> > >> >>>>
> > >> >>>>>> Please correct me if I'm wrong, but my understanding is that
> the
> > >> >>> record
> > >> >>>>>> metadata is always null if an exception occurred while trying
> to
> > >> >>>>> produce.
> > >> >>>>>
> > >> >>>>> That is right. Thanks.
> > >> >>>>>
> > >> >>>>> I looked at the example code, and one thing I realized that
> since
> > we
> > >> >> are
> > >> >>>>> not passing the context in the handle function, we may not be
> > >> >> implement
> > >> >>>>> the
> > >> >>>>> logic to send the fail records into another queue for future
> > >> >> processing.
> > >> >>>>> Would people think that would be a big issue?
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Guozhang
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <ma...@frmr.me>
> > wrote:
> > >> >>>>>
> > >> >>>>>> Hello all,
> > >> >>>>>>
> > >> >>>>>> I've updated the KIP based on this conversation, and made it so
> > >> that
> > >> >>> its
> > >> >>>>>> interface, config setting, and parameters line up more closely
> > with
> > >> >>> the
> > >> >>>>>> interface in KIP-161 (deserialization handler).
> > >> >>>>>>
> > >> >>>>>> I believe there are a few specific questions I need to reply
> > >> to.....
> > >> >>>>>>
> > >> >>>>>>> The question I had about then handle parameters are around the
> > >> >>> record,
> > >> >>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be generics
> of
> > >> >>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> `ProducerRecord<?
> > >> >>>>> extends
> > >> >>>>>>> Object, ? extends Object>`?
> > >> >>>>>>
> > >> >>>>>> At this point in the code we're guaranteed that this is a
> > >> >>>>>> ProducerRecord<byte[], byte[]>, so the generics would just make
> > it
> > >> >>>>> harder
> > >> >>>>>> to work with the key and value.
> > >> >>>>>>
> > >> >>>>>>> Also, should the handle function include the `RecordMetadata`
> as
> > >> >>> well
> > >> >>>>> in
> > >> >>>>>>> case it is not null?
> > >> >>>>>>
> > >> >>>>>> Please correct me if I'm wrong, but my understanding is that
> the
> > >> >>> record
> > >> >>>>>> metadata is always null if an exception occurred while trying
> to
> > >> >>>>> produce.
> > >> >>>>>>
> > >> >>>>>>> We may probably try to write down at least the following
> > handling
> > >> >>>>> logic
> > >> >>>>>> and
> > >> >>>>>>> see if the given API is sufficient for it
> > >> >>>>>>
> > >> >>>>>> I've added some examples to the KIP. Let me know what you
> think.
> > >> >>>>>>
> > >> >>>>>> Cheers,
> > >> >>>>>> Matt
> > >> >>>>>>
> > >> >>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me>
> > wrote:
> > >> >>>>>>
> > >> >>>>>>> Thanks for this feedback. I’m at a conference right now and am
> > >> >>>>> planning
> > >> >>>>>> on
> > >> >>>>>>> updating the KIP again with details from this conversation
> later
> > >> >>> this
> > >> >>>>>> week.
> > >> >>>>>>>
> > >> >>>>>>> I’ll shoot you a more detailed response then! :)
> > >> >>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
> > wangguoz@gmail.com
> > >> >>>
> > >> >>>>>> wrote:
> > >> >>>>>>>
> > >> >>>>>>>> Thanks for the KIP Matt.
> > >> >>>>>>>>
> > >> >>>>>>>> Regarding the handle interface of
> > ProductionExceptionHandlerResp
> > >> >>> onse,
> > >> >>>>>>>> could
> > >> >>>>>>>> you write it on the wiki also, along with the actual added
> > config
> > >> >>>>> names
> > >> >>>>>>>> (e.g. what
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> > >> >>>>>> deserialization+exception+handlers
> > >> >>>>>>>> described).
> > >> >>>>>>>>
> > >> >>>>>>>> The question I had about then handle parameters are around
> the
> > >> >>>>> record,
> > >> >>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be generics
> > of
> > >> >>>>>>>> `ProducerRecord<? extends K, ? extends V>` or
> `ProducerRecord<?
> > >> >>>>> extends
> > >> >>>>>>>> Object, ? extends Object>`?
> > >> >>>>>>>>
> > >> >>>>>>>> Also, should the handle function include the `RecordMetadata`
> > as
> > >> >>>>> well in
> > >> >>>>>>>> case it is not null?
> > >> >>>>>>>>
> > >> >>>>>>>> We may probably try to write down at least the following
> > handling
> > >> >>>>> logic
> > >> >>>>>>>> and
> > >> >>>>>>>> see if the given API is sufficient for it: 1) throw exception
> > >> >>>>>> immediately
> > >> >>>>>>>> to fail fast and stop the world, 2) log the error and drop
> > record
> > >> >>> and
> > >> >>>>>>>> proceed silently, 3) send such errors to a specific "error"
> > Kafka
> > >> >>>>> topic,
> > >> >>>>>>>> or
> > >> >>>>>>>> record it as an app-level metrics (
> > >> >>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_
> > monitoring
> > >> >> )
> > >> >>>>> for
> > >> >>>>>>>> monitoring.
> > >> >>>>>>>>
> > >> >>>>>>>> Guozhang
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <ma...@frmr.me>
> > >> >> wrote:
> > >> >>>>>>>>
> > >> >>>>>>>>> I did some more digging tonight.
> > >> >>>>>>>>>
> > >> >>>>>>>>> @Ted: It looks like the deserialization handler uses
> > >> >>>>>>>>> "default.deserialization.exception.handler" for the config
> > >> >>> name. No
> > >> >>>>>>>>> ".class" on the end. I'm inclined to think this should use
> > >> >>>>>>>>> "default.production.exception.handler".
> > >> >>>>>>>>>
> > >> >>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me>
> > >> >>> wrote:
> > >> >>>>>>>>>
> > >> >>>>>>>>>> Okay, I've dug into this a little bit.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> I think getting access to the serialized record is
> possible,
> > >> >>> and
> > >> >>>>>>>> changing
> > >> >>>>>>>>>> the naming and return type is certainly doable. However,
> > >> >>> because
> > >> >>>>>> we're
> > >> >>>>>>>>>> hooking into the onCompletion callback we have no guarantee
> > >> >>> that
> > >> >>>>> the
> > >> >>>>>>>>>> ProcessorContext state hasn't changed by the time this
> > >> >>> particular
> > >> >>>>>>>> handler
> > >> >>>>>>>>>> runs. So I think the signature would change to something
> > >> >> like:
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> ProductionExceptionHandlerResponse handle(final
> > >> >>>>> ProducerRecord<..>
> > >> >>>>>>>>> record,
> > >> >>>>>>>>>> final Exception exception)
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> Would this be acceptable?
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
> > >> >>>>> wrote:
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and return
> type
> > >> >>> but
> > >> >>>>>> I’m
> > >> >>>>>>>> not
> > >> >>>>>>>>>>> sure if I can get my hands on the context and the record
> > >> >>> itself
> > >> >>>>>>>> without
> > >> >>>>>>>>>>> other changes.
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> Let me dig in and follow up here tomorrow.
> > >> >>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
> > >> >>>>>>>> matthias@confluent.io>
> > >> >>>>>>>>>>> wrote:
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>>> Thanks for the KIP.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> Are you familiar with KIP-161?
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> > >> >>>>>>>>> deserialization+exception+handlers
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> I thinks, we should align the design (parameter naming,
> > >> >>> return
> > >> >>>>>>>> types,
> > >> >>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a unified
> > >> >> user
> > >> >>>>>>>>> experience.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> -Matthias
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> > >> >>>>>>>>>>>>> I’ll create the JIRA ticket.
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> I think that config name will work. I’ll update the KIP
> > >> >>>>>>>> accordingly.
> > >> >>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
> > >> >>> yuzhihong@gmail.com>
> > >> >>>>>>>> wrote:
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ?
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> For the new config, how about naming it
> > >> >>>>>>>>>>>>>> production.exception.processor.class
> > >> >>>>>>>>>>>>>> ? This way it is clear that class name should be
> > >> >>> specified.
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> Cheers
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
> > >> >>> matt@frmr.me>
> > >> >>>>>>>> wrote:
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>> Hello everyone,
> > >> >>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>> This is the discussion thread for the KIP that I just
> > >> >>> filed
> > >> >>>>>>>> here:
> > >> >>>>>>>>>>>>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> >>>>>>>>>>>>>>> 210+-+Provide+for+custom+error+handling++when+Kafka+
> > >> >>>>>>>>>>>>>>> Streams+fails+to+produce
> > >> >>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>> Looking forward to getting some feedback from folks
> > >> >> about
> > >> >>>>> this
> > >> >>>>>>>> idea
> > >> >>>>>>>>>>>> and
> > >> >>>>>>>>>>>>>>> working toward a solution we can contribute back. :)
> > >> >>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>> Cheers,
> > >> >>>>>>>>>>>>>>> Matt Farmer
> > >> >>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>> --
> > >> >>>>>>>> -- Guozhang
> > >> >>>>>>>>
> > >> >>>>>>>
> > >> >>>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> --
> > >> >>>>> -- Guozhang
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> -- Guozhang
> > >> >>
> > >> >
> > >>
> > >>
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Guozhang Wang <wa...@gmail.com>.
Thanks for the updates. I made a pass over the wiki again and it looks good.

About whether record collector should still internally log the error in
addition to what the customized ProductionExceptionHandler does. I
personally would prefer only to log if the returned value is FAIL to
indicate that this thread is going to shutdown and trigger the exception
handler.


Guozhang


On Sun, Nov 5, 2017 at 6:09 AM, Matt Farmer <ma...@frmr.me> wrote:

> Hello, a bit later than I'd anticipated, but I've updated this KIP as
> outlined above. The updated KIP is now ready for review again!
>
> On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me> wrote:
>
> > Ah. I actually created both of those in the PR and forgot to mention them
> > by name in the KIP! Thanks for pointing out the oversight.
> >
> > I’ll revise the KIP this afternoon accordingly.
> >
> > The logging is actually provided for in the record collector. Whenever a
> > handler continues it’ll log a warning to ensure that it’s *impossible* to
> > write a handler that totally suppresses production exceptions from the
> log.
> > As such, the default continue handler just returns the continue value. I
> > can add details about those semantics to the KIP as well.
> > On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> >> One more comment.
> >>
> >> You mention a default implementation for the handler that fails. I
> >> think, this should be part of the public API and thus should have a
> >> proper defined named that is mentioned in the KIP.
> >>
> >> We could also add a second implementation for the log-and-move-on
> >> strategy, as both are the two most common cases. This class should also
> >> be part of public API (so users can just set in the config) with a
> >> proper name.
> >>
> >>
> >> Otherwise, I like the KIP a lot! Thanks.
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 11/1/17 12:23 AM, Matt Farmer wrote:
> >> > Thanks for the heads up. Yes, I think my changes are compatible with
> >> that
> >> > PR, but there will be a merge conflict that happens whenever one of
> the
> >> PRs
> >> > is merged. Happy to reconcile the changes in my PR if 4148 goes in
> >> first. :)
> >> >
> >> > On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <wa...@gmail.com>
> >> wrote:
> >> >
> >> >> That sounds reasonable, thanks Matt.
> >> >>
> >> >> As for the implementation, please note that there is another ongoing
> PR
> >> >> that may touch the same classes that you are working on:
> >> >> https://github.com/apache/kafka/pull/4148
> >> >>
> >> >> So it may help if you can also take a look at that PR and see if it
> is
> >> >> compatible with your changes.
> >> >>
> >> >>
> >> >>
> >> >> Guozhang
> >> >>
> >> >>
> >> >> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me> wrote:
> >> >>
> >> >>> I've opened this pull request to implement the KIP as currently
> >> written:
> >> >>> https://github.com/apache/kafka/pull/4165. It still needs some
> tests
> >> >>> added,
> >> >>> but largely represents the shape I was going for.
> >> >>>
> >> >>> If there are more points that folks would like to discuss, please
> let
> >> me
> >> >>> know. If I don't hear anything by tomorrow afternoon I'll probably
> >> start
> >> >> a
> >> >>> [VOTE] thread.
> >> >>>
> >> >>> Thanks,
> >> >>> Matt
> >> >>>
> >> >>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:
> >> >>>
> >> >>>> I can’t think of a reason that would be problematic.
> >> >>>>
> >> >>>> Most of the time I would write a handler like this, I either want
> to
> >> >>>> ignore the error or fail and bring everything down so that I can
> spin
> >> >> it
> >> >>>> back up later and resume from earlier offsets. When we start up
> after
> >> >>>> crashing we’ll eventually try to process the message we failed to
> >> >> produce
> >> >>>> again.
> >> >>>>
> >> >>>> I’m concerned that “putting in a queue for later” opens you up to
> >> >> putting
> >> >>>> messages into the destination topic in an unexpected order. However
> >> if
> >> >>>> others feel differently, I’m happy to talk about it.
> >> >>>>
> >> >>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <wa...@gmail.com>
> >> >>> wrote:
> >> >>>>
> >> >>>>>> Please correct me if I'm wrong, but my understanding is that the
> >> >>> record
> >> >>>>>> metadata is always null if an exception occurred while trying to
> >> >>>>> produce.
> >> >>>>>
> >> >>>>> That is right. Thanks.
> >> >>>>>
> >> >>>>> I looked at the example code, and one thing I realized that since
> we
> >> >> are
> >> >>>>> not passing the context in the handle function, we may not be
> >> >> implement
> >> >>>>> the
> >> >>>>> logic to send the fail records into another queue for future
> >> >> processing.
> >> >>>>> Would people think that would be a big issue?
> >> >>>>>
> >> >>>>>
> >> >>>>> Guozhang
> >> >>>>>
> >> >>>>>
> >> >>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <ma...@frmr.me>
> wrote:
> >> >>>>>
> >> >>>>>> Hello all,
> >> >>>>>>
> >> >>>>>> I've updated the KIP based on this conversation, and made it so
> >> that
> >> >>> its
> >> >>>>>> interface, config setting, and parameters line up more closely
> with
> >> >>> the
> >> >>>>>> interface in KIP-161 (deserialization handler).
> >> >>>>>>
> >> >>>>>> I believe there are a few specific questions I need to reply
> >> to.....
> >> >>>>>>
> >> >>>>>>> The question I had about then handle parameters are around the
> >> >>> record,
> >> >>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> >> >>>>>>> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
> >> >>>>> extends
> >> >>>>>>> Object, ? extends Object>`?
> >> >>>>>>
> >> >>>>>> At this point in the code we're guaranteed that this is a
> >> >>>>>> ProducerRecord<byte[], byte[]>, so the generics would just make
> it
> >> >>>>> harder
> >> >>>>>> to work with the key and value.
> >> >>>>>>
> >> >>>>>>> Also, should the handle function include the `RecordMetadata` as
> >> >>> well
> >> >>>>> in
> >> >>>>>>> case it is not null?
> >> >>>>>>
> >> >>>>>> Please correct me if I'm wrong, but my understanding is that the
> >> >>> record
> >> >>>>>> metadata is always null if an exception occurred while trying to
> >> >>>>> produce.
> >> >>>>>>
> >> >>>>>>> We may probably try to write down at least the following
> handling
> >> >>>>> logic
> >> >>>>>> and
> >> >>>>>>> see if the given API is sufficient for it
> >> >>>>>>
> >> >>>>>> I've added some examples to the KIP. Let me know what you think.
> >> >>>>>>
> >> >>>>>> Cheers,
> >> >>>>>> Matt
> >> >>>>>>
> >> >>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me>
> wrote:
> >> >>>>>>
> >> >>>>>>> Thanks for this feedback. I’m at a conference right now and am
> >> >>>>> planning
> >> >>>>>> on
> >> >>>>>>> updating the KIP again with details from this conversation later
> >> >>> this
> >> >>>>>> week.
> >> >>>>>>>
> >> >>>>>>> I’ll shoot you a more detailed response then! :)
> >> >>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <
> wangguoz@gmail.com
> >> >>>
> >> >>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>>> Thanks for the KIP Matt.
> >> >>>>>>>>
> >> >>>>>>>> Regarding the handle interface of
> ProductionExceptionHandlerResp
> >> >>> onse,
> >> >>>>>>>> could
> >> >>>>>>>> you write it on the wiki also, along with the actual added
> config
> >> >>>>> names
> >> >>>>>>>> (e.g. what
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> >> >>>>>> deserialization+exception+handlers
> >> >>>>>>>> described).
> >> >>>>>>>>
> >> >>>>>>>> The question I had about then handle parameters are around the
> >> >>>>> record,
> >> >>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be generics
> of
> >> >>>>>>>> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
> >> >>>>> extends
> >> >>>>>>>> Object, ? extends Object>`?
> >> >>>>>>>>
> >> >>>>>>>> Also, should the handle function include the `RecordMetadata`
> as
> >> >>>>> well in
> >> >>>>>>>> case it is not null?
> >> >>>>>>>>
> >> >>>>>>>> We may probably try to write down at least the following
> handling
> >> >>>>> logic
> >> >>>>>>>> and
> >> >>>>>>>> see if the given API is sufficient for it: 1) throw exception
> >> >>>>>> immediately
> >> >>>>>>>> to fail fast and stop the world, 2) log the error and drop
> record
> >> >>> and
> >> >>>>>>>> proceed silently, 3) send such errors to a specific "error"
> Kafka
> >> >>>>> topic,
> >> >>>>>>>> or
> >> >>>>>>>> record it as an app-level metrics (
> >> >>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_
> monitoring
> >> >> )
> >> >>>>> for
> >> >>>>>>>> monitoring.
> >> >>>>>>>>
> >> >>>>>>>> Guozhang
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <ma...@frmr.me>
> >> >> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> I did some more digging tonight.
> >> >>>>>>>>>
> >> >>>>>>>>> @Ted: It looks like the deserialization handler uses
> >> >>>>>>>>> "default.deserialization.exception.handler" for the config
> >> >>> name. No
> >> >>>>>>>>> ".class" on the end. I'm inclined to think this should use
> >> >>>>>>>>> "default.production.exception.handler".
> >> >>>>>>>>>
> >> >>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me>
> >> >>> wrote:
> >> >>>>>>>>>
> >> >>>>>>>>>> Okay, I've dug into this a little bit.
> >> >>>>>>>>>>
> >> >>>>>>>>>> I think getting access to the serialized record is possible,
> >> >>> and
> >> >>>>>>>> changing
> >> >>>>>>>>>> the naming and return type is certainly doable. However,
> >> >>> because
> >> >>>>>> we're
> >> >>>>>>>>>> hooking into the onCompletion callback we have no guarantee
> >> >>> that
> >> >>>>> the
> >> >>>>>>>>>> ProcessorContext state hasn't changed by the time this
> >> >>> particular
> >> >>>>>>>> handler
> >> >>>>>>>>>> runs. So I think the signature would change to something
> >> >> like:
> >> >>>>>>>>>>
> >> >>>>>>>>>> ProductionExceptionHandlerResponse handle(final
> >> >>>>> ProducerRecord<..>
> >> >>>>>>>>> record,
> >> >>>>>>>>>> final Exception exception)
> >> >>>>>>>>>>
> >> >>>>>>>>>> Would this be acceptable?
> >> >>>>>>>>>>
> >> >>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
> >> >>>>> wrote:
> >> >>>>>>>>>>
> >> >>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and return type
> >> >>> but
> >> >>>>>> I’m
> >> >>>>>>>> not
> >> >>>>>>>>>>> sure if I can get my hands on the context and the record
> >> >>> itself
> >> >>>>>>>> without
> >> >>>>>>>>>>> other changes.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Let me dig in and follow up here tomorrow.
> >> >>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
> >> >>>>>>>> matthias@confluent.io>
> >> >>>>>>>>>>> wrote:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>> Thanks for the KIP.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Are you familiar with KIP-161?
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>
> >> >>>>>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> >> >>>>>>>>> deserialization+exception+handlers
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> I thinks, we should align the design (parameter naming,
> >> >>> return
> >> >>>>>>>> types,
> >> >>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a unified
> >> >> user
> >> >>>>>>>>> experience.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> -Matthias
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> >> >>>>>>>>>>>>> I’ll create the JIRA ticket.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> I think that config name will work. I’ll update the KIP
> >> >>>>>>>> accordingly.
> >> >>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
> >> >>> yuzhihong@gmail.com>
> >> >>>>>>>> wrote:
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ?
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> For the new config, how about naming it
> >> >>>>>>>>>>>>>> production.exception.processor.class
> >> >>>>>>>>>>>>>> ? This way it is clear that class name should be
> >> >>> specified.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Cheers
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
> >> >>> matt@frmr.me>
> >> >>>>>>>> wrote:
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> Hello everyone,
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> This is the discussion thread for the KIP that I just
> >> >>> filed
> >> >>>>>>>> here:
> >> >>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> >>>>>>>>>>>>>>> 210+-+Provide+for+custom+error+handling++when+Kafka+
> >> >>>>>>>>>>>>>>> Streams+fails+to+produce
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> Looking forward to getting some feedback from folks
> >> >> about
> >> >>>>> this
> >> >>>>>>>> idea
> >> >>>>>>>>>>>> and
> >> >>>>>>>>>>>>>>> working toward a solution we can contribute back. :)
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> Cheers,
> >> >>>>>>>>>>>>>>> Matt Farmer
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> --
> >> >>>>>>>> -- Guozhang
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>> -- Guozhang
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> -- Guozhang
> >> >>
> >> >
> >>
> >>
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
Hello, a bit later than I'd anticipated, but I've updated this KIP as
outlined above. The updated KIP is now ready for review again!

On Sat, Nov 4, 2017 at 1:03 PM Matt Farmer <ma...@frmr.me> wrote:

> Ah. I actually created both of those in the PR and forgot to mention them
> by name in the KIP! Thanks for pointing out the oversight.
>
> I’ll revise the KIP this afternoon accordingly.
>
> The logging is actually provided for in the record collector. Whenever a
> handler continues it’ll log a warning to ensure that it’s *impossible* to
> write a handler that totally suppresses production exceptions from the log.
> As such, the default continue handler just returns the continue value. I
> can add details about those semantics to the KIP as well.
> On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <ma...@confluent.io>
> wrote:
>
>> One more comment.
>>
>> You mention a default implementation for the handler that fails. I
>> think, this should be part of the public API and thus should have a
>> proper defined named that is mentioned in the KIP.
>>
>> We could also add a second implementation for the log-and-move-on
>> strategy, as both are the two most common cases. This class should also
>> be part of public API (so users can just set in the config) with a
>> proper name.
>>
>>
>> Otherwise, I like the KIP a lot! Thanks.
>>
>>
>> -Matthias
>>
>>
>> On 11/1/17 12:23 AM, Matt Farmer wrote:
>> > Thanks for the heads up. Yes, I think my changes are compatible with
>> that
>> > PR, but there will be a merge conflict that happens whenever one of the
>> PRs
>> > is merged. Happy to reconcile the changes in my PR if 4148 goes in
>> first. :)
>> >
>> > On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <wa...@gmail.com>
>> wrote:
>> >
>> >> That sounds reasonable, thanks Matt.
>> >>
>> >> As for the implementation, please note that there is another ongoing PR
>> >> that may touch the same classes that you are working on:
>> >> https://github.com/apache/kafka/pull/4148
>> >>
>> >> So it may help if you can also take a look at that PR and see if it is
>> >> compatible with your changes.
>> >>
>> >>
>> >>
>> >> Guozhang
>> >>
>> >>
>> >> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me> wrote:
>> >>
>> >>> I've opened this pull request to implement the KIP as currently
>> written:
>> >>> https://github.com/apache/kafka/pull/4165. It still needs some tests
>> >>> added,
>> >>> but largely represents the shape I was going for.
>> >>>
>> >>> If there are more points that folks would like to discuss, please let
>> me
>> >>> know. If I don't hear anything by tomorrow afternoon I'll probably
>> start
>> >> a
>> >>> [VOTE] thread.
>> >>>
>> >>> Thanks,
>> >>> Matt
>> >>>
>> >>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:
>> >>>
>> >>>> I can’t think of a reason that would be problematic.
>> >>>>
>> >>>> Most of the time I would write a handler like this, I either want to
>> >>>> ignore the error or fail and bring everything down so that I can spin
>> >> it
>> >>>> back up later and resume from earlier offsets. When we start up after
>> >>>> crashing we’ll eventually try to process the message we failed to
>> >> produce
>> >>>> again.
>> >>>>
>> >>>> I’m concerned that “putting in a queue for later” opens you up to
>> >> putting
>> >>>> messages into the destination topic in an unexpected order. However
>> if
>> >>>> others feel differently, I’m happy to talk about it.
>> >>>>
>> >>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <wa...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>>>> Please correct me if I'm wrong, but my understanding is that the
>> >>> record
>> >>>>>> metadata is always null if an exception occurred while trying to
>> >>>>> produce.
>> >>>>>
>> >>>>> That is right. Thanks.
>> >>>>>
>> >>>>> I looked at the example code, and one thing I realized that since we
>> >> are
>> >>>>> not passing the context in the handle function, we may not be
>> >> implement
>> >>>>> the
>> >>>>> logic to send the fail records into another queue for future
>> >> processing.
>> >>>>> Would people think that would be a big issue?
>> >>>>>
>> >>>>>
>> >>>>> Guozhang
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <ma...@frmr.me> wrote:
>> >>>>>
>> >>>>>> Hello all,
>> >>>>>>
>> >>>>>> I've updated the KIP based on this conversation, and made it so
>> that
>> >>> its
>> >>>>>> interface, config setting, and parameters line up more closely with
>> >>> the
>> >>>>>> interface in KIP-161 (deserialization handler).
>> >>>>>>
>> >>>>>> I believe there are a few specific questions I need to reply
>> to.....
>> >>>>>>
>> >>>>>>> The question I had about then handle parameters are around the
>> >>> record,
>> >>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
>> >>>>>>> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
>> >>>>> extends
>> >>>>>>> Object, ? extends Object>`?
>> >>>>>>
>> >>>>>> At this point in the code we're guaranteed that this is a
>> >>>>>> ProducerRecord<byte[], byte[]>, so the generics would just make it
>> >>>>> harder
>> >>>>>> to work with the key and value.
>> >>>>>>
>> >>>>>>> Also, should the handle function include the `RecordMetadata` as
>> >>> well
>> >>>>> in
>> >>>>>>> case it is not null?
>> >>>>>>
>> >>>>>> Please correct me if I'm wrong, but my understanding is that the
>> >>> record
>> >>>>>> metadata is always null if an exception occurred while trying to
>> >>>>> produce.
>> >>>>>>
>> >>>>>>> We may probably try to write down at least the following handling
>> >>>>> logic
>> >>>>>> and
>> >>>>>>> see if the given API is sufficient for it
>> >>>>>>
>> >>>>>> I've added some examples to the KIP. Let me know what you think.
>> >>>>>>
>> >>>>>> Cheers,
>> >>>>>> Matt
>> >>>>>>
>> >>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me> wrote:
>> >>>>>>
>> >>>>>>> Thanks for this feedback. I’m at a conference right now and am
>> >>>>> planning
>> >>>>>> on
>> >>>>>>> updating the KIP again with details from this conversation later
>> >>> this
>> >>>>>> week.
>> >>>>>>>
>> >>>>>>> I’ll shoot you a more detailed response then! :)
>> >>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wangguoz@gmail.com
>> >>>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> Thanks for the KIP Matt.
>> >>>>>>>>
>> >>>>>>>> Regarding the handle interface of ProductionExceptionHandlerResp
>> >>> onse,
>> >>>>>>>> could
>> >>>>>>>> you write it on the wiki also, along with the actual added config
>> >>>>> names
>> >>>>>>>> (e.g. what
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
>> >>>>>> deserialization+exception+handlers
>> >>>>>>>> described).
>> >>>>>>>>
>> >>>>>>>> The question I had about then handle parameters are around the
>> >>>>> record,
>> >>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
>> >>>>>>>> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
>> >>>>> extends
>> >>>>>>>> Object, ? extends Object>`?
>> >>>>>>>>
>> >>>>>>>> Also, should the handle function include the `RecordMetadata` as
>> >>>>> well in
>> >>>>>>>> case it is not null?
>> >>>>>>>>
>> >>>>>>>> We may probably try to write down at least the following handling
>> >>>>> logic
>> >>>>>>>> and
>> >>>>>>>> see if the given API is sufficient for it: 1) throw exception
>> >>>>>> immediately
>> >>>>>>>> to fail fast and stop the world, 2) log the error and drop record
>> >>> and
>> >>>>>>>> proceed silently, 3) send such errors to a specific "error" Kafka
>> >>>>> topic,
>> >>>>>>>> or
>> >>>>>>>> record it as an app-level metrics (
>> >>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_monitoring
>> >> )
>> >>>>> for
>> >>>>>>>> monitoring.
>> >>>>>>>>
>> >>>>>>>> Guozhang
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <ma...@frmr.me>
>> >> wrote:
>> >>>>>>>>
>> >>>>>>>>> I did some more digging tonight.
>> >>>>>>>>>
>> >>>>>>>>> @Ted: It looks like the deserialization handler uses
>> >>>>>>>>> "default.deserialization.exception.handler" for the config
>> >>> name. No
>> >>>>>>>>> ".class" on the end. I'm inclined to think this should use
>> >>>>>>>>> "default.production.exception.handler".
>> >>>>>>>>>
>> >>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me>
>> >>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Okay, I've dug into this a little bit.
>> >>>>>>>>>>
>> >>>>>>>>>> I think getting access to the serialized record is possible,
>> >>> and
>> >>>>>>>> changing
>> >>>>>>>>>> the naming and return type is certainly doable. However,
>> >>> because
>> >>>>>> we're
>> >>>>>>>>>> hooking into the onCompletion callback we have no guarantee
>> >>> that
>> >>>>> the
>> >>>>>>>>>> ProcessorContext state hasn't changed by the time this
>> >>> particular
>> >>>>>>>> handler
>> >>>>>>>>>> runs. So I think the signature would change to something
>> >> like:
>> >>>>>>>>>>
>> >>>>>>>>>> ProductionExceptionHandlerResponse handle(final
>> >>>>> ProducerRecord<..>
>> >>>>>>>>> record,
>> >>>>>>>>>> final Exception exception)
>> >>>>>>>>>>
>> >>>>>>>>>> Would this be acceptable?
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
>> >>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and return type
>> >>> but
>> >>>>>> I’m
>> >>>>>>>> not
>> >>>>>>>>>>> sure if I can get my hands on the context and the record
>> >>> itself
>> >>>>>>>> without
>> >>>>>>>>>>> other changes.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Let me dig in and follow up here tomorrow.
>> >>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>> >>>>>>>> matthias@confluent.io>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Thanks for the KIP.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Are you familiar with KIP-161?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>
>> >>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
>> >>>>>>>>> deserialization+exception+handlers
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I thinks, we should align the design (parameter naming,
>> >>> return
>> >>>>>>>> types,
>> >>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a unified
>> >> user
>> >>>>>>>>> experience.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> -Matthias
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>> >>>>>>>>>>>>> I’ll create the JIRA ticket.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I think that config name will work. I’ll update the KIP
>> >>>>>>>> accordingly.
>> >>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>> >>> yuzhihong@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ?
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> For the new config, how about naming it
>> >>>>>>>>>>>>>> production.exception.processor.class
>> >>>>>>>>>>>>>> ? This way it is clear that class name should be
>> >>> specified.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Cheers
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
>> >>> matt@frmr.me>
>> >>>>>>>> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Hello everyone,
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> This is the discussion thread for the KIP that I just
>> >>> filed
>> >>>>>>>> here:
>> >>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> >>>>>>>>>>>>>>> 210+-+Provide+for+custom+error+handling++when+Kafka+
>> >>>>>>>>>>>>>>> Streams+fails+to+produce
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Looking forward to getting some feedback from folks
>> >> about
>> >>>>> this
>> >>>>>>>> idea
>> >>>>>>>>>>>> and
>> >>>>>>>>>>>>>>> working toward a solution we can contribute back. :)
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>>> Matt Farmer
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> -- Guozhang
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> -- Guozhang
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> -- Guozhang
>> >>
>> >
>>
>>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
Ah. I actually created both of those in the PR and forgot to mention them
by name in the KIP! Thanks for pointing out the oversight.

I’ll revise the KIP this afternoon accordingly.

The logging is actually provided for in the record collector. Whenever a
handler continues it’ll log a warning to ensure that it’s *impossible* to
write a handler that totally suppresses production exceptions from the log.
As such, the default continue handler just returns the continue value. I
can add details about those semantics to the KIP as well.
On Sat, Nov 4, 2017 at 12:46 PM Matthias J. Sax <ma...@confluent.io>
wrote:

> One more comment.
>
> You mention a default implementation for the handler that fails. I
> think, this should be part of the public API and thus should have a
> proper defined named that is mentioned in the KIP.
>
> We could also add a second implementation for the log-and-move-on
> strategy, as both are the two most common cases. This class should also
> be part of public API (so users can just set in the config) with a
> proper name.
>
>
> Otherwise, I like the KIP a lot! Thanks.
>
>
> -Matthias
>
>
> On 11/1/17 12:23 AM, Matt Farmer wrote:
> > Thanks for the heads up. Yes, I think my changes are compatible with that
> > PR, but there will be a merge conflict that happens whenever one of the
> PRs
> > is merged. Happy to reconcile the changes in my PR if 4148 goes in
> first. :)
> >
> > On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> >> That sounds reasonable, thanks Matt.
> >>
> >> As for the implementation, please note that there is another ongoing PR
> >> that may touch the same classes that you are working on:
> >> https://github.com/apache/kafka/pull/4148
> >>
> >> So it may help if you can also take a look at that PR and see if it is
> >> compatible with your changes.
> >>
> >>
> >>
> >> Guozhang
> >>
> >>
> >> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me> wrote:
> >>
> >>> I've opened this pull request to implement the KIP as currently
> written:
> >>> https://github.com/apache/kafka/pull/4165. It still needs some tests
> >>> added,
> >>> but largely represents the shape I was going for.
> >>>
> >>> If there are more points that folks would like to discuss, please let
> me
> >>> know. If I don't hear anything by tomorrow afternoon I'll probably
> start
> >> a
> >>> [VOTE] thread.
> >>>
> >>> Thanks,
> >>> Matt
> >>>
> >>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:
> >>>
> >>>> I can’t think of a reason that would be problematic.
> >>>>
> >>>> Most of the time I would write a handler like this, I either want to
> >>>> ignore the error or fail and bring everything down so that I can spin
> >> it
> >>>> back up later and resume from earlier offsets. When we start up after
> >>>> crashing we’ll eventually try to process the message we failed to
> >> produce
> >>>> again.
> >>>>
> >>>> I’m concerned that “putting in a queue for later” opens you up to
> >> putting
> >>>> messages into the destination topic in an unexpected order. However if
> >>>> others feel differently, I’m happy to talk about it.
> >>>>
> >>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <wa...@gmail.com>
> >>> wrote:
> >>>>
> >>>>>> Please correct me if I'm wrong, but my understanding is that the
> >>> record
> >>>>>> metadata is always null if an exception occurred while trying to
> >>>>> produce.
> >>>>>
> >>>>> That is right. Thanks.
> >>>>>
> >>>>> I looked at the example code, and one thing I realized that since we
> >> are
> >>>>> not passing the context in the handle function, we may not be
> >> implement
> >>>>> the
> >>>>> logic to send the fail records into another queue for future
> >> processing.
> >>>>> Would people think that would be a big issue?
> >>>>>
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>>
> >>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <ma...@frmr.me> wrote:
> >>>>>
> >>>>>> Hello all,
> >>>>>>
> >>>>>> I've updated the KIP based on this conversation, and made it so that
> >>> its
> >>>>>> interface, config setting, and parameters line up more closely with
> >>> the
> >>>>>> interface in KIP-161 (deserialization handler).
> >>>>>>
> >>>>>> I believe there are a few specific questions I need to reply to.....
> >>>>>>
> >>>>>>> The question I had about then handle parameters are around the
> >>> record,
> >>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> >>>>>>> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
> >>>>> extends
> >>>>>>> Object, ? extends Object>`?
> >>>>>>
> >>>>>> At this point in the code we're guaranteed that this is a
> >>>>>> ProducerRecord<byte[], byte[]>, so the generics would just make it
> >>>>> harder
> >>>>>> to work with the key and value.
> >>>>>>
> >>>>>>> Also, should the handle function include the `RecordMetadata` as
> >>> well
> >>>>> in
> >>>>>>> case it is not null?
> >>>>>>
> >>>>>> Please correct me if I'm wrong, but my understanding is that the
> >>> record
> >>>>>> metadata is always null if an exception occurred while trying to
> >>>>> produce.
> >>>>>>
> >>>>>>> We may probably try to write down at least the following handling
> >>>>> logic
> >>>>>> and
> >>>>>>> see if the given API is sufficient for it
> >>>>>>
> >>>>>> I've added some examples to the KIP. Let me know what you think.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Matt
> >>>>>>
> >>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me> wrote:
> >>>>>>
> >>>>>>> Thanks for this feedback. I’m at a conference right now and am
> >>>>> planning
> >>>>>> on
> >>>>>>> updating the KIP again with details from this conversation later
> >>> this
> >>>>>> week.
> >>>>>>>
> >>>>>>> I’ll shoot you a more detailed response then! :)
> >>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wangguoz@gmail.com
> >>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks for the KIP Matt.
> >>>>>>>>
> >>>>>>>> Regarding the handle interface of ProductionExceptionHandlerResp
> >>> onse,
> >>>>>>>> could
> >>>>>>>> you write it on the wiki also, along with the actual added config
> >>>>> names
> >>>>>>>> (e.g. what
> >>>>>>>>
> >>>>>>>>
> >>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> >>>>>> deserialization+exception+handlers
> >>>>>>>> described).
> >>>>>>>>
> >>>>>>>> The question I had about then handle parameters are around the
> >>>>> record,
> >>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> >>>>>>>> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
> >>>>> extends
> >>>>>>>> Object, ? extends Object>`?
> >>>>>>>>
> >>>>>>>> Also, should the handle function include the `RecordMetadata` as
> >>>>> well in
> >>>>>>>> case it is not null?
> >>>>>>>>
> >>>>>>>> We may probably try to write down at least the following handling
> >>>>> logic
> >>>>>>>> and
> >>>>>>>> see if the given API is sufficient for it: 1) throw exception
> >>>>>> immediately
> >>>>>>>> to fail fast and stop the world, 2) log the error and drop record
> >>> and
> >>>>>>>> proceed silently, 3) send such errors to a specific "error" Kafka
> >>>>> topic,
> >>>>>>>> or
> >>>>>>>> record it as an app-level metrics (
> >>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_monitoring
> >> )
> >>>>> for
> >>>>>>>> monitoring.
> >>>>>>>>
> >>>>>>>> Guozhang
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <ma...@frmr.me>
> >> wrote:
> >>>>>>>>
> >>>>>>>>> I did some more digging tonight.
> >>>>>>>>>
> >>>>>>>>> @Ted: It looks like the deserialization handler uses
> >>>>>>>>> "default.deserialization.exception.handler" for the config
> >>> name. No
> >>>>>>>>> ".class" on the end. I'm inclined to think this should use
> >>>>>>>>> "default.production.exception.handler".
> >>>>>>>>>
> >>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me>
> >>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Okay, I've dug into this a little bit.
> >>>>>>>>>>
> >>>>>>>>>> I think getting access to the serialized record is possible,
> >>> and
> >>>>>>>> changing
> >>>>>>>>>> the naming and return type is certainly doable. However,
> >>> because
> >>>>>> we're
> >>>>>>>>>> hooking into the onCompletion callback we have no guarantee
> >>> that
> >>>>> the
> >>>>>>>>>> ProcessorContext state hasn't changed by the time this
> >>> particular
> >>>>>>>> handler
> >>>>>>>>>> runs. So I think the signature would change to something
> >> like:
> >>>>>>>>>>
> >>>>>>>>>> ProductionExceptionHandlerResponse handle(final
> >>>>> ProducerRecord<..>
> >>>>>>>>> record,
> >>>>>>>>>> final Exception exception)
> >>>>>>>>>>
> >>>>>>>>>> Would this be acceptable?
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
> >>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and return type
> >>> but
> >>>>>> I’m
> >>>>>>>> not
> >>>>>>>>>>> sure if I can get my hands on the context and the record
> >>> itself
> >>>>>>>> without
> >>>>>>>>>>> other changes.
> >>>>>>>>>>>
> >>>>>>>>>>> Let me dig in and follow up here tomorrow.
> >>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
> >>>>>>>> matthias@confluent.io>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks for the KIP.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Are you familiar with KIP-161?
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> >>>>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> >>>>>>>>> deserialization+exception+handlers
> >>>>>>>>>>>>
> >>>>>>>>>>>> I thinks, we should align the design (parameter naming,
> >>> return
> >>>>>>>> types,
> >>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a unified
> >> user
> >>>>>>>>> experience.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> >>>>>>>>>>>>> I’ll create the JIRA ticket.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think that config name will work. I’ll update the KIP
> >>>>>>>> accordingly.
> >>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
> >>> yuzhihong@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> For the new config, how about naming it
> >>>>>>>>>>>>>> production.exception.processor.class
> >>>>>>>>>>>>>> ? This way it is clear that class name should be
> >>> specified.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
> >>> matt@frmr.me>
> >>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hello everyone,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This is the discussion thread for the KIP that I just
> >>> filed
> >>>>>>>> here:
> >>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>>>>>>>>> 210+-+Provide+for+custom+error+handling++when+Kafka+
> >>>>>>>>>>>>>>> Streams+fails+to+produce
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Looking forward to getting some feedback from folks
> >> about
> >>>>> this
> >>>>>>>> idea
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>> working toward a solution we can contribute back. :)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>> Matt Farmer
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> -- Guozhang
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> -- Guozhang
> >>>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
>
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by "Matthias J. Sax" <ma...@confluent.io>.
One more comment.

You mention a default implementation for the handler that fails. I
think, this should be part of the public API and thus should have a
proper defined named that is mentioned in the KIP.

We could also add a second implementation for the log-and-move-on
strategy, as both are the two most common cases. This class should also
be part of public API (so users can just set in the config) with a
proper name.


Otherwise, I like the KIP a lot! Thanks.


-Matthias


On 11/1/17 12:23 AM, Matt Farmer wrote:
> Thanks for the heads up. Yes, I think my changes are compatible with that
> PR, but there will be a merge conflict that happens whenever one of the PRs
> is merged. Happy to reconcile the changes in my PR if 4148 goes in first. :)
> 
> On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <wa...@gmail.com> wrote:
> 
>> That sounds reasonable, thanks Matt.
>>
>> As for the implementation, please note that there is another ongoing PR
>> that may touch the same classes that you are working on:
>> https://github.com/apache/kafka/pull/4148
>>
>> So it may help if you can also take a look at that PR and see if it is
>> compatible with your changes.
>>
>>
>>
>> Guozhang
>>
>>
>> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me> wrote:
>>
>>> I've opened this pull request to implement the KIP as currently written:
>>> https://github.com/apache/kafka/pull/4165. It still needs some tests
>>> added,
>>> but largely represents the shape I was going for.
>>>
>>> If there are more points that folks would like to discuss, please let me
>>> know. If I don't hear anything by tomorrow afternoon I'll probably start
>> a
>>> [VOTE] thread.
>>>
>>> Thanks,
>>> Matt
>>>
>>> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:
>>>
>>>> I can’t think of a reason that would be problematic.
>>>>
>>>> Most of the time I would write a handler like this, I either want to
>>>> ignore the error or fail and bring everything down so that I can spin
>> it
>>>> back up later and resume from earlier offsets. When we start up after
>>>> crashing we’ll eventually try to process the message we failed to
>> produce
>>>> again.
>>>>
>>>> I’m concerned that “putting in a queue for later” opens you up to
>> putting
>>>> messages into the destination topic in an unexpected order. However if
>>>> others feel differently, I’m happy to talk about it.
>>>>
>>>> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <wa...@gmail.com>
>>> wrote:
>>>>
>>>>>> Please correct me if I'm wrong, but my understanding is that the
>>> record
>>>>>> metadata is always null if an exception occurred while trying to
>>>>> produce.
>>>>>
>>>>> That is right. Thanks.
>>>>>
>>>>> I looked at the example code, and one thing I realized that since we
>> are
>>>>> not passing the context in the handle function, we may not be
>> implement
>>>>> the
>>>>> logic to send the fail records into another queue for future
>> processing.
>>>>> Would people think that would be a big issue?
>>>>>
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <ma...@frmr.me> wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> I've updated the KIP based on this conversation, and made it so that
>>> its
>>>>>> interface, config setting, and parameters line up more closely with
>>> the
>>>>>> interface in KIP-161 (deserialization handler).
>>>>>>
>>>>>> I believe there are a few specific questions I need to reply to.....
>>>>>>
>>>>>>> The question I had about then handle parameters are around the
>>> record,
>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
>>>>>>> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
>>>>> extends
>>>>>>> Object, ? extends Object>`?
>>>>>>
>>>>>> At this point in the code we're guaranteed that this is a
>>>>>> ProducerRecord<byte[], byte[]>, so the generics would just make it
>>>>> harder
>>>>>> to work with the key and value.
>>>>>>
>>>>>>> Also, should the handle function include the `RecordMetadata` as
>>> well
>>>>> in
>>>>>>> case it is not null?
>>>>>>
>>>>>> Please correct me if I'm wrong, but my understanding is that the
>>> record
>>>>>> metadata is always null if an exception occurred while trying to
>>>>> produce.
>>>>>>
>>>>>>> We may probably try to write down at least the following handling
>>>>> logic
>>>>>> and
>>>>>>> see if the given API is sufficient for it
>>>>>>
>>>>>> I've added some examples to the KIP. Let me know what you think.
>>>>>>
>>>>>> Cheers,
>>>>>> Matt
>>>>>>
>>>>>> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me> wrote:
>>>>>>
>>>>>>> Thanks for this feedback. I’m at a conference right now and am
>>>>> planning
>>>>>> on
>>>>>>> updating the KIP again with details from this conversation later
>>> this
>>>>>> week.
>>>>>>>
>>>>>>> I’ll shoot you a more detailed response then! :)
>>>>>>> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wangguoz@gmail.com
>>>
>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for the KIP Matt.
>>>>>>>>
>>>>>>>> Regarding the handle interface of ProductionExceptionHandlerResp
>>> onse,
>>>>>>>> could
>>>>>>>> you write it on the wiki also, along with the actual added config
>>>>> names
>>>>>>>> (e.g. what
>>>>>>>>
>>>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
>>>>>> deserialization+exception+handlers
>>>>>>>> described).
>>>>>>>>
>>>>>>>> The question I had about then handle parameters are around the
>>>>> record,
>>>>>>>> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
>>>>>>>> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
>>>>> extends
>>>>>>>> Object, ? extends Object>`?
>>>>>>>>
>>>>>>>> Also, should the handle function include the `RecordMetadata` as
>>>>> well in
>>>>>>>> case it is not null?
>>>>>>>>
>>>>>>>> We may probably try to write down at least the following handling
>>>>> logic
>>>>>>>> and
>>>>>>>> see if the given API is sufficient for it: 1) throw exception
>>>>>> immediately
>>>>>>>> to fail fast and stop the world, 2) log the error and drop record
>>> and
>>>>>>>> proceed silently, 3) send such errors to a specific "error" Kafka
>>>>> topic,
>>>>>>>> or
>>>>>>>> record it as an app-level metrics (
>>>>>>>> https://kafka.apache.org/documentation/#kafka_streams_monitoring
>> )
>>>>> for
>>>>>>>> monitoring.
>>>>>>>>
>>>>>>>> Guozhang
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <ma...@frmr.me>
>> wrote:
>>>>>>>>
>>>>>>>>> I did some more digging tonight.
>>>>>>>>>
>>>>>>>>> @Ted: It looks like the deserialization handler uses
>>>>>>>>> "default.deserialization.exception.handler" for the config
>>> name. No
>>>>>>>>> ".class" on the end. I'm inclined to think this should use
>>>>>>>>> "default.production.exception.handler".
>>>>>>>>>
>>>>>>>>> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me>
>>> wrote:
>>>>>>>>>
>>>>>>>>>> Okay, I've dug into this a little bit.
>>>>>>>>>>
>>>>>>>>>> I think getting access to the serialized record is possible,
>>> and
>>>>>>>> changing
>>>>>>>>>> the naming and return type is certainly doable. However,
>>> because
>>>>>> we're
>>>>>>>>>> hooking into the onCompletion callback we have no guarantee
>>> that
>>>>> the
>>>>>>>>>> ProcessorContext state hasn't changed by the time this
>>> particular
>>>>>>>> handler
>>>>>>>>>> runs. So I think the signature would change to something
>> like:
>>>>>>>>>>
>>>>>>>>>> ProductionExceptionHandlerResponse handle(final
>>>>> ProducerRecord<..>
>>>>>>>>> record,
>>>>>>>>>> final Exception exception)
>>>>>>>>>>
>>>>>>>>>> Would this be acceptable?
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Ah good idea. Hmmm. I can line up the naming and return type
>>> but
>>>>>> I’m
>>>>>>>> not
>>>>>>>>>>> sure if I can get my hands on the context and the record
>>> itself
>>>>>>>> without
>>>>>>>>>>> other changes.
>>>>>>>>>>>
>>>>>>>>>>> Let me dig in and follow up here tomorrow.
>>>>>>>>>>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>>>>>>>> matthias@confluent.io>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the KIP.
>>>>>>>>>>>>
>>>>>>>>>>>> Are you familiar with KIP-161?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
>>>>>>>>> deserialization+exception+handlers
>>>>>>>>>>>>
>>>>>>>>>>>> I thinks, we should align the design (parameter naming,
>>> return
>>>>>>>> types,
>>>>>>>>>>>> class names etc) of KIP-210 to KIP-161 to get a unified
>> user
>>>>>>>>> experience.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>>>>>>>>>>>>> I’ll create the JIRA ticket.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think that config name will work. I’ll update the KIP
>>>>>>>> accordingly.
>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
>>> yuzhihong@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you create JIRA that corresponds to the KIP ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For the new config, how about naming it
>>>>>>>>>>>>>> production.exception.processor.class
>>>>>>>>>>>>>> ? This way it is clear that class name should be
>>> specified.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
>>> matt@frmr.me>
>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is the discussion thread for the KIP that I just
>>> filed
>>>>>>>> here:
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>> 210+-+Provide+for+custom+error+handling++when+Kafka+
>>>>>>>>>>>>>>> Streams+fails+to+produce
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looking forward to getting some feedback from folks
>> about
>>>>> this
>>>>>>>> idea
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> working toward a solution we can contribute back. :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Matt Farmer
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -- Guozhang
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -- Guozhang
>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> -- Guozhang
>>
> 


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
Thanks for the heads up. Yes, I think my changes are compatible with that
PR, but there will be a merge conflict that happens whenever one of the PRs
is merged. Happy to reconcile the changes in my PR if 4148 goes in first. :)

On Tue, Oct 31, 2017 at 6:44 PM Guozhang Wang <wa...@gmail.com> wrote:

> That sounds reasonable, thanks Matt.
>
> As for the implementation, please note that there is another ongoing PR
> that may touch the same classes that you are working on:
> https://github.com/apache/kafka/pull/4148
>
> So it may help if you can also take a look at that PR and see if it is
> compatible with your changes.
>
>
>
> Guozhang
>
>
> On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me> wrote:
>
> > I've opened this pull request to implement the KIP as currently written:
> > https://github.com/apache/kafka/pull/4165. It still needs some tests
> > added,
> > but largely represents the shape I was going for.
> >
> > If there are more points that folks would like to discuss, please let me
> > know. If I don't hear anything by tomorrow afternoon I'll probably start
> a
> > [VOTE] thread.
> >
> > Thanks,
> > Matt
> >
> > On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:
> >
> > > I can’t think of a reason that would be problematic.
> > >
> > > Most of the time I would write a handler like this, I either want to
> > > ignore the error or fail and bring everything down so that I can spin
> it
> > > back up later and resume from earlier offsets. When we start up after
> > > crashing we’ll eventually try to process the message we failed to
> produce
> > > again.
> > >
> > > I’m concerned that “putting in a queue for later” opens you up to
> putting
> > > messages into the destination topic in an unexpected order. However if
> > > others feel differently, I’m happy to talk about it.
> > >
> > > On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > >> > Please correct me if I'm wrong, but my understanding is that the
> > record
> > >> > metadata is always null if an exception occurred while trying to
> > >> produce.
> > >>
> > >> That is right. Thanks.
> > >>
> > >> I looked at the example code, and one thing I realized that since we
> are
> > >> not passing the context in the handle function, we may not be
> implement
> > >> the
> > >> logic to send the fail records into another queue for future
> processing.
> > >> Would people think that would be a big issue?
> > >>
> > >>
> > >> Guozhang
> > >>
> > >>
> > >> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <ma...@frmr.me> wrote:
> > >>
> > >> > Hello all,
> > >> >
> > >> > I've updated the KIP based on this conversation, and made it so that
> > its
> > >> > interface, config setting, and parameters line up more closely with
> > the
> > >> > interface in KIP-161 (deserialization handler).
> > >> >
> > >> > I believe there are a few specific questions I need to reply to.....
> > >> >
> > >> > > The question I had about then handle parameters are around the
> > record,
> > >> > > should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> > >> > > `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
> > >> extends
> > >> > > Object, ? extends Object>`?
> > >> >
> > >> > At this point in the code we're guaranteed that this is a
> > >> > ProducerRecord<byte[], byte[]>, so the generics would just make it
> > >> harder
> > >> > to work with the key and value.
> > >> >
> > >> > > Also, should the handle function include the `RecordMetadata` as
> > well
> > >> in
> > >> > > case it is not null?
> > >> >
> > >> > Please correct me if I'm wrong, but my understanding is that the
> > record
> > >> > metadata is always null if an exception occurred while trying to
> > >> produce.
> > >> >
> > >> > > We may probably try to write down at least the following handling
> > >> logic
> > >> > and
> > >> > > see if the given API is sufficient for it
> > >> >
> > >> > I've added some examples to the KIP. Let me know what you think.
> > >> >
> > >> > Cheers,
> > >> > Matt
> > >> >
> > >> > On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me> wrote:
> > >> >
> > >> > > Thanks for this feedback. I’m at a conference right now and am
> > >> planning
> > >> > on
> > >> > > updating the KIP again with details from this conversation later
> > this
> > >> > week.
> > >> > >
> > >> > > I’ll shoot you a more detailed response then! :)
> > >> > > On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wangguoz@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > >> Thanks for the KIP Matt.
> > >> > >>
> > >> > >> Regarding the handle interface of ProductionExceptionHandlerResp
> > onse,
> > >> > >> could
> > >> > >> you write it on the wiki also, along with the actual added config
> > >> names
> > >> > >> (e.g. what
> > >> > >>
> > >> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> > >> > deserialization+exception+handlers
> > >> > >> described).
> > >> > >>
> > >> > >> The question I had about then handle parameters are around the
> > >> record,
> > >> > >> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> > >> > >> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
> > >> extends
> > >> > >> Object, ? extends Object>`?
> > >> > >>
> > >> > >> Also, should the handle function include the `RecordMetadata` as
> > >> well in
> > >> > >> case it is not null?
> > >> > >>
> > >> > >> We may probably try to write down at least the following handling
> > >> logic
> > >> > >> and
> > >> > >> see if the given API is sufficient for it: 1) throw exception
> > >> > immediately
> > >> > >> to fail fast and stop the world, 2) log the error and drop record
> > and
> > >> > >> proceed silently, 3) send such errors to a specific "error" Kafka
> > >> topic,
> > >> > >> or
> > >> > >> record it as an app-level metrics (
> > >> > >> https://kafka.apache.org/documentation/#kafka_streams_monitoring
> )
> > >> for
> > >> > >> monitoring.
> > >> > >>
> > >> > >> Guozhang
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <ma...@frmr.me>
> wrote:
> > >> > >>
> > >> > >> > I did some more digging tonight.
> > >> > >> >
> > >> > >> > @Ted: It looks like the deserialization handler uses
> > >> > >> > "default.deserialization.exception.handler" for the config
> > name. No
> > >> > >> > ".class" on the end. I'm inclined to think this should use
> > >> > >> > "default.production.exception.handler".
> > >> > >> >
> > >> > >> > On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me>
> > wrote:
> > >> > >> >
> > >> > >> > > Okay, I've dug into this a little bit.
> > >> > >> > >
> > >> > >> > > I think getting access to the serialized record is possible,
> > and
> > >> > >> changing
> > >> > >> > > the naming and return type is certainly doable. However,
> > because
> > >> > we're
> > >> > >> > > hooking into the onCompletion callback we have no guarantee
> > that
> > >> the
> > >> > >> > > ProcessorContext state hasn't changed by the time this
> > particular
> > >> > >> handler
> > >> > >> > > runs. So I think the signature would change to something
> like:
> > >> > >> > >
> > >> > >> > > ProductionExceptionHandlerResponse handle(final
> > >> ProducerRecord<..>
> > >> > >> > record,
> > >> > >> > > final Exception exception)
> > >> > >> > >
> > >> > >> > > Would this be acceptable?
> > >> > >> > >
> > >> > >> > > On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
> > >> wrote:
> > >> > >> > >
> > >> > >> > >> Ah good idea. Hmmm. I can line up the naming and return type
> > but
> > >> > I’m
> > >> > >> not
> > >> > >> > >> sure if I can get my hands on the context and the record
> > itself
> > >> > >> without
> > >> > >> > >> other changes.
> > >> > >> > >>
> > >> > >> > >> Let me dig in and follow up here tomorrow.
> > >> > >> > >> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
> > >> > >> matthias@confluent.io>
> > >> > >> > >> wrote:
> > >> > >> > >>
> > >> > >> > >>> Thanks for the KIP.
> > >> > >> > >>>
> > >> > >> > >>> Are you familiar with KIP-161?
> > >> > >> > >>>
> > >> > >> > >>>
> > >> > >> > >>>
> > >> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> > >> > >> > deserialization+exception+handlers
> > >> > >> > >>>
> > >> > >> > >>> I thinks, we should align the design (parameter naming,
> > return
> > >> > >> types,
> > >> > >> > >>> class names etc) of KIP-210 to KIP-161 to get a unified
> user
> > >> > >> > experience.
> > >> > >> > >>>
> > >> > >> > >>>
> > >> > >> > >>>
> > >> > >> > >>> -Matthias
> > >> > >> > >>>
> > >> > >> > >>>
> > >> > >> > >>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> > >> > >> > >>> > I’ll create the JIRA ticket.
> > >> > >> > >>> >
> > >> > >> > >>> > I think that config name will work. I’ll update the KIP
> > >> > >> accordingly.
> > >> > >> > >>> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
> > yuzhihong@gmail.com>
> > >> > >> wrote:
> > >> > >> > >>> >
> > >> > >> > >>> >> Can you create JIRA that corresponds to the KIP ?
> > >> > >> > >>> >>
> > >> > >> > >>> >> For the new config, how about naming it
> > >> > >> > >>> >> production.exception.processor.class
> > >> > >> > >>> >> ? This way it is clear that class name should be
> > specified.
> > >> > >> > >>> >>
> > >> > >> > >>> >> Cheers
> > >> > >> > >>> >>
> > >> > >> > >>> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
> > matt@frmr.me>
> > >> > >> wrote:
> > >> > >> > >>> >>
> > >> > >> > >>> >>> Hello everyone,
> > >> > >> > >>> >>>
> > >> > >> > >>> >>> This is the discussion thread for the KIP that I just
> > filed
> > >> > >> here:
> > >> > >> > >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > >> > >>> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
> > >> > >> > >>> >>> Streams+fails+to+produce
> > >> > >> > >>> >>>
> > >> > >> > >>> >>> Looking forward to getting some feedback from folks
> about
> > >> this
> > >> > >> idea
> > >> > >> > >>> and
> > >> > >> > >>> >>> working toward a solution we can contribute back. :)
> > >> > >> > >>> >>>
> > >> > >> > >>> >>> Cheers,
> > >> > >> > >>> >>> Matt Farmer
> > >> > >> > >>> >>>
> > >> > >> > >>> >>
> > >> > >> > >>> >
> > >> > >> > >>>
> > >> > >> > >>>
> > >> > >> >
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> --
> > >> > >> -- Guozhang
> > >> > >>
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Guozhang Wang <wa...@gmail.com>.
That sounds reasonable, thanks Matt.

As for the implementation, please note that there is another ongoing PR
that may touch the same classes that you are working on:
https://github.com/apache/kafka/pull/4148

So it may help if you can also take a look at that PR and see if it is
compatible with your changes.



Guozhang


On Tue, Oct 31, 2017 at 10:59 AM, Matt Farmer <ma...@frmr.me> wrote:

> I've opened this pull request to implement the KIP as currently written:
> https://github.com/apache/kafka/pull/4165. It still needs some tests
> added,
> but largely represents the shape I was going for.
>
> If there are more points that folks would like to discuss, please let me
> know. If I don't hear anything by tomorrow afternoon I'll probably start a
> [VOTE] thread.
>
> Thanks,
> Matt
>
> On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:
>
> > I can’t think of a reason that would be problematic.
> >
> > Most of the time I would write a handler like this, I either want to
> > ignore the error or fail and bring everything down so that I can spin it
> > back up later and resume from earlier offsets. When we start up after
> > crashing we’ll eventually try to process the message we failed to produce
> > again.
> >
> > I’m concerned that “putting in a queue for later” opens you up to putting
> > messages into the destination topic in an unexpected order. However if
> > others feel differently, I’m happy to talk about it.
> >
> > On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> >> > Please correct me if I'm wrong, but my understanding is that the
> record
> >> > metadata is always null if an exception occurred while trying to
> >> produce.
> >>
> >> That is right. Thanks.
> >>
> >> I looked at the example code, and one thing I realized that since we are
> >> not passing the context in the handle function, we may not be implement
> >> the
> >> logic to send the fail records into another queue for future processing.
> >> Would people think that would be a big issue?
> >>
> >>
> >> Guozhang
> >>
> >>
> >> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <ma...@frmr.me> wrote:
> >>
> >> > Hello all,
> >> >
> >> > I've updated the KIP based on this conversation, and made it so that
> its
> >> > interface, config setting, and parameters line up more closely with
> the
> >> > interface in KIP-161 (deserialization handler).
> >> >
> >> > I believe there are a few specific questions I need to reply to.....
> >> >
> >> > > The question I had about then handle parameters are around the
> record,
> >> > > should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> >> > > `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
> >> extends
> >> > > Object, ? extends Object>`?
> >> >
> >> > At this point in the code we're guaranteed that this is a
> >> > ProducerRecord<byte[], byte[]>, so the generics would just make it
> >> harder
> >> > to work with the key and value.
> >> >
> >> > > Also, should the handle function include the `RecordMetadata` as
> well
> >> in
> >> > > case it is not null?
> >> >
> >> > Please correct me if I'm wrong, but my understanding is that the
> record
> >> > metadata is always null if an exception occurred while trying to
> >> produce.
> >> >
> >> > > We may probably try to write down at least the following handling
> >> logic
> >> > and
> >> > > see if the given API is sufficient for it
> >> >
> >> > I've added some examples to the KIP. Let me know what you think.
> >> >
> >> > Cheers,
> >> > Matt
> >> >
> >> > On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me> wrote:
> >> >
> >> > > Thanks for this feedback. I’m at a conference right now and am
> >> planning
> >> > on
> >> > > updating the KIP again with details from this conversation later
> this
> >> > week.
> >> > >
> >> > > I’ll shoot you a more detailed response then! :)
> >> > > On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wa...@gmail.com>
> >> > wrote:
> >> > >
> >> > >> Thanks for the KIP Matt.
> >> > >>
> >> > >> Regarding the handle interface of ProductionExceptionHandlerResp
> onse,
> >> > >> could
> >> > >> you write it on the wiki also, along with the actual added config
> >> names
> >> > >> (e.g. what
> >> > >>
> >> > >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> >> > deserialization+exception+handlers
> >> > >> described).
> >> > >>
> >> > >> The question I had about then handle parameters are around the
> >> record,
> >> > >> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> >> > >> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
> >> extends
> >> > >> Object, ? extends Object>`?
> >> > >>
> >> > >> Also, should the handle function include the `RecordMetadata` as
> >> well in
> >> > >> case it is not null?
> >> > >>
> >> > >> We may probably try to write down at least the following handling
> >> logic
> >> > >> and
> >> > >> see if the given API is sufficient for it: 1) throw exception
> >> > immediately
> >> > >> to fail fast and stop the world, 2) log the error and drop record
> and
> >> > >> proceed silently, 3) send such errors to a specific "error" Kafka
> >> topic,
> >> > >> or
> >> > >> record it as an app-level metrics (
> >> > >> https://kafka.apache.org/documentation/#kafka_streams_monitoring)
> >> for
> >> > >> monitoring.
> >> > >>
> >> > >> Guozhang
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <ma...@frmr.me> wrote:
> >> > >>
> >> > >> > I did some more digging tonight.
> >> > >> >
> >> > >> > @Ted: It looks like the deserialization handler uses
> >> > >> > "default.deserialization.exception.handler" for the config
> name. No
> >> > >> > ".class" on the end. I'm inclined to think this should use
> >> > >> > "default.production.exception.handler".
> >> > >> >
> >> > >> > On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me>
> wrote:
> >> > >> >
> >> > >> > > Okay, I've dug into this a little bit.
> >> > >> > >
> >> > >> > > I think getting access to the serialized record is possible,
> and
> >> > >> changing
> >> > >> > > the naming and return type is certainly doable. However,
> because
> >> > we're
> >> > >> > > hooking into the onCompletion callback we have no guarantee
> that
> >> the
> >> > >> > > ProcessorContext state hasn't changed by the time this
> particular
> >> > >> handler
> >> > >> > > runs. So I think the signature would change to something like:
> >> > >> > >
> >> > >> > > ProductionExceptionHandlerResponse handle(final
> >> ProducerRecord<..>
> >> > >> > record,
> >> > >> > > final Exception exception)
> >> > >> > >
> >> > >> > > Would this be acceptable?
> >> > >> > >
> >> > >> > > On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
> >> wrote:
> >> > >> > >
> >> > >> > >> Ah good idea. Hmmm. I can line up the naming and return type
> but
> >> > I’m
> >> > >> not
> >> > >> > >> sure if I can get my hands on the context and the record
> itself
> >> > >> without
> >> > >> > >> other changes.
> >> > >> > >>
> >> > >> > >> Let me dig in and follow up here tomorrow.
> >> > >> > >> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
> >> > >> matthias@confluent.io>
> >> > >> > >> wrote:
> >> > >> > >>
> >> > >> > >>> Thanks for the KIP.
> >> > >> > >>>
> >> > >> > >>> Are you familiar with KIP-161?
> >> > >> > >>>
> >> > >> > >>>
> >> > >> > >>>
> >> > >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> >> > >> > deserialization+exception+handlers
> >> > >> > >>>
> >> > >> > >>> I thinks, we should align the design (parameter naming,
> return
> >> > >> types,
> >> > >> > >>> class names etc) of KIP-210 to KIP-161 to get a unified user
> >> > >> > experience.
> >> > >> > >>>
> >> > >> > >>>
> >> > >> > >>>
> >> > >> > >>> -Matthias
> >> > >> > >>>
> >> > >> > >>>
> >> > >> > >>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> >> > >> > >>> > I’ll create the JIRA ticket.
> >> > >> > >>> >
> >> > >> > >>> > I think that config name will work. I’ll update the KIP
> >> > >> accordingly.
> >> > >> > >>> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <
> yuzhihong@gmail.com>
> >> > >> wrote:
> >> > >> > >>> >
> >> > >> > >>> >> Can you create JIRA that corresponds to the KIP ?
> >> > >> > >>> >>
> >> > >> > >>> >> For the new config, how about naming it
> >> > >> > >>> >> production.exception.processor.class
> >> > >> > >>> >> ? This way it is clear that class name should be
> specified.
> >> > >> > >>> >>
> >> > >> > >>> >> Cheers
> >> > >> > >>> >>
> >> > >> > >>> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <
> matt@frmr.me>
> >> > >> wrote:
> >> > >> > >>> >>
> >> > >> > >>> >>> Hello everyone,
> >> > >> > >>> >>>
> >> > >> > >>> >>> This is the discussion thread for the KIP that I just
> filed
> >> > >> here:
> >> > >> > >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > >> > >>> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
> >> > >> > >>> >>> Streams+fails+to+produce
> >> > >> > >>> >>>
> >> > >> > >>> >>> Looking forward to getting some feedback from folks about
> >> this
> >> > >> idea
> >> > >> > >>> and
> >> > >> > >>> >>> working toward a solution we can contribute back. :)
> >> > >> > >>> >>>
> >> > >> > >>> >>> Cheers,
> >> > >> > >>> >>> Matt Farmer
> >> > >> > >>> >>>
> >> > >> > >>> >>
> >> > >> > >>> >
> >> > >> > >>>
> >> > >> > >>>
> >> > >> >
> >> > >>
> >> > >>
> >> > >>
> >> > >> --
> >> > >> -- Guozhang
> >> > >>
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
I've opened this pull request to implement the KIP as currently written:
https://github.com/apache/kafka/pull/4165. It still needs some tests added,
but largely represents the shape I was going for.

If there are more points that folks would like to discuss, please let me
know. If I don't hear anything by tomorrow afternoon I'll probably start a
[VOTE] thread.

Thanks,
Matt

On Fri, Oct 27, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:

> I can’t think of a reason that would be problematic.
>
> Most of the time I would write a handler like this, I either want to
> ignore the error or fail and bring everything down so that I can spin it
> back up later and resume from earlier offsets. When we start up after
> crashing we’ll eventually try to process the message we failed to produce
> again.
>
> I’m concerned that “putting in a queue for later” opens you up to putting
> messages into the destination topic in an unexpected order. However if
> others feel differently, I’m happy to talk about it.
>
> On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <wa...@gmail.com> wrote:
>
>> > Please correct me if I'm wrong, but my understanding is that the record
>> > metadata is always null if an exception occurred while trying to
>> produce.
>>
>> That is right. Thanks.
>>
>> I looked at the example code, and one thing I realized that since we are
>> not passing the context in the handle function, we may not be implement
>> the
>> logic to send the fail records into another queue for future processing.
>> Would people think that would be a big issue?
>>
>>
>> Guozhang
>>
>>
>> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <ma...@frmr.me> wrote:
>>
>> > Hello all,
>> >
>> > I've updated the KIP based on this conversation, and made it so that its
>> > interface, config setting, and parameters line up more closely with the
>> > interface in KIP-161 (deserialization handler).
>> >
>> > I believe there are a few specific questions I need to reply to.....
>> >
>> > > The question I had about then handle parameters are around the record,
>> > > should it be `ProducerRecord<byte[], byte[]>`, or be generics of
>> > > `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
>> extends
>> > > Object, ? extends Object>`?
>> >
>> > At this point in the code we're guaranteed that this is a
>> > ProducerRecord<byte[], byte[]>, so the generics would just make it
>> harder
>> > to work with the key and value.
>> >
>> > > Also, should the handle function include the `RecordMetadata` as well
>> in
>> > > case it is not null?
>> >
>> > Please correct me if I'm wrong, but my understanding is that the record
>> > metadata is always null if an exception occurred while trying to
>> produce.
>> >
>> > > We may probably try to write down at least the following handling
>> logic
>> > and
>> > > see if the given API is sufficient for it
>> >
>> > I've added some examples to the KIP. Let me know what you think.
>> >
>> > Cheers,
>> > Matt
>> >
>> > On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me> wrote:
>> >
>> > > Thanks for this feedback. I’m at a conference right now and am
>> planning
>> > on
>> > > updating the KIP again with details from this conversation later this
>> > week.
>> > >
>> > > I’ll shoot you a more detailed response then! :)
>> > > On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wa...@gmail.com>
>> > wrote:
>> > >
>> > >> Thanks for the KIP Matt.
>> > >>
>> > >> Regarding the handle interface of ProductionExceptionHandlerResponse,
>> > >> could
>> > >> you write it on the wiki also, along with the actual added config
>> names
>> > >> (e.g. what
>> > >>
>> > >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
>> > deserialization+exception+handlers
>> > >> described).
>> > >>
>> > >> The question I had about then handle parameters are around the
>> record,
>> > >> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
>> > >> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
>> extends
>> > >> Object, ? extends Object>`?
>> > >>
>> > >> Also, should the handle function include the `RecordMetadata` as
>> well in
>> > >> case it is not null?
>> > >>
>> > >> We may probably try to write down at least the following handling
>> logic
>> > >> and
>> > >> see if the given API is sufficient for it: 1) throw exception
>> > immediately
>> > >> to fail fast and stop the world, 2) log the error and drop record and
>> > >> proceed silently, 3) send such errors to a specific "error" Kafka
>> topic,
>> > >> or
>> > >> record it as an app-level metrics (
>> > >> https://kafka.apache.org/documentation/#kafka_streams_monitoring)
>> for
>> > >> monitoring.
>> > >>
>> > >> Guozhang
>> > >>
>> > >>
>> > >>
>> > >> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <ma...@frmr.me> wrote:
>> > >>
>> > >> > I did some more digging tonight.
>> > >> >
>> > >> > @Ted: It looks like the deserialization handler uses
>> > >> > "default.deserialization.exception.handler" for the config name. No
>> > >> > ".class" on the end. I'm inclined to think this should use
>> > >> > "default.production.exception.handler".
>> > >> >
>> > >> > On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me> wrote:
>> > >> >
>> > >> > > Okay, I've dug into this a little bit.
>> > >> > >
>> > >> > > I think getting access to the serialized record is possible, and
>> > >> changing
>> > >> > > the naming and return type is certainly doable. However, because
>> > we're
>> > >> > > hooking into the onCompletion callback we have no guarantee that
>> the
>> > >> > > ProcessorContext state hasn't changed by the time this particular
>> > >> handler
>> > >> > > runs. So I think the signature would change to something like:
>> > >> > >
>> > >> > > ProductionExceptionHandlerResponse handle(final
>> ProducerRecord<..>
>> > >> > record,
>> > >> > > final Exception exception)
>> > >> > >
>> > >> > > Would this be acceptable?
>> > >> > >
>> > >> > > On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me>
>> wrote:
>> > >> > >
>> > >> > >> Ah good idea. Hmmm. I can line up the naming and return type but
>> > I’m
>> > >> not
>> > >> > >> sure if I can get my hands on the context and the record itself
>> > >> without
>> > >> > >> other changes.
>> > >> > >>
>> > >> > >> Let me dig in and follow up here tomorrow.
>> > >> > >> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>> > >> matthias@confluent.io>
>> > >> > >> wrote:
>> > >> > >>
>> > >> > >>> Thanks for the KIP.
>> > >> > >>>
>> > >> > >>> Are you familiar with KIP-161?
>> > >> > >>>
>> > >> > >>>
>> > >> > >>>
>> > >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
>> > >> > deserialization+exception+handlers
>> > >> > >>>
>> > >> > >>> I thinks, we should align the design (parameter naming, return
>> > >> types,
>> > >> > >>> class names etc) of KIP-210 to KIP-161 to get a unified user
>> > >> > experience.
>> > >> > >>>
>> > >> > >>>
>> > >> > >>>
>> > >> > >>> -Matthias
>> > >> > >>>
>> > >> > >>>
>> > >> > >>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>> > >> > >>> > I’ll create the JIRA ticket.
>> > >> > >>> >
>> > >> > >>> > I think that config name will work. I’ll update the KIP
>> > >> accordingly.
>> > >> > >>> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yu...@gmail.com>
>> > >> wrote:
>> > >> > >>> >
>> > >> > >>> >> Can you create JIRA that corresponds to the KIP ?
>> > >> > >>> >>
>> > >> > >>> >> For the new config, how about naming it
>> > >> > >>> >> production.exception.processor.class
>> > >> > >>> >> ? This way it is clear that class name should be specified.
>> > >> > >>> >>
>> > >> > >>> >> Cheers
>> > >> > >>> >>
>> > >> > >>> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <ma...@frmr.me>
>> > >> wrote:
>> > >> > >>> >>
>> > >> > >>> >>> Hello everyone,
>> > >> > >>> >>>
>> > >> > >>> >>> This is the discussion thread for the KIP that I just filed
>> > >> here:
>> > >> > >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > >> > >>> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
>> > >> > >>> >>> Streams+fails+to+produce
>> > >> > >>> >>>
>> > >> > >>> >>> Looking forward to getting some feedback from folks about
>> this
>> > >> idea
>> > >> > >>> and
>> > >> > >>> >>> working toward a solution we can contribute back. :)
>> > >> > >>> >>>
>> > >> > >>> >>> Cheers,
>> > >> > >>> >>> Matt Farmer
>> > >> > >>> >>>
>> > >> > >>> >>
>> > >> > >>> >
>> > >> > >>>
>> > >> > >>>
>> > >> >
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> -- Guozhang
>> > >>
>> > >
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
I can’t think of a reason that would be problematic.

Most of the time I would write a handler like this, I either want to ignore
the error or fail and bring everything down so that I can spin it back up
later and resume from earlier offsets. When we start up after crashing
we’ll eventually try to process the message we failed to produce again.

I’m concerned that “putting in a queue for later” opens you up to putting
messages into the destination topic in an unexpected order. However if
others feel differently, I’m happy to talk about it.

On Fri, Oct 27, 2017 at 7:10 PM Guozhang Wang <wa...@gmail.com> wrote:

> > Please correct me if I'm wrong, but my understanding is that the record
> > metadata is always null if an exception occurred while trying to produce.
>
> That is right. Thanks.
>
> I looked at the example code, and one thing I realized that since we are
> not passing the context in the handle function, we may not be implement the
> logic to send the fail records into another queue for future processing.
> Would people think that would be a big issue?
>
>
> Guozhang
>
>
> On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <ma...@frmr.me> wrote:
>
> > Hello all,
> >
> > I've updated the KIP based on this conversation, and made it so that its
> > interface, config setting, and parameters line up more closely with the
> > interface in KIP-161 (deserialization handler).
> >
> > I believe there are a few specific questions I need to reply to.....
> >
> > > The question I had about then handle parameters are around the record,
> > > should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> > > `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<? extends
> > > Object, ? extends Object>`?
> >
> > At this point in the code we're guaranteed that this is a
> > ProducerRecord<byte[], byte[]>, so the generics would just make it harder
> > to work with the key and value.
> >
> > > Also, should the handle function include the `RecordMetadata` as well
> in
> > > case it is not null?
> >
> > Please correct me if I'm wrong, but my understanding is that the record
> > metadata is always null if an exception occurred while trying to produce.
> >
> > > We may probably try to write down at least the following handling logic
> > and
> > > see if the given API is sufficient for it
> >
> > I've added some examples to the KIP. Let me know what you think.
> >
> > Cheers,
> > Matt
> >
> > On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me> wrote:
> >
> > > Thanks for this feedback. I’m at a conference right now and am planning
> > on
> > > updating the KIP again with details from this conversation later this
> > week.
> > >
> > > I’ll shoot you a more detailed response then! :)
> > > On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > >> Thanks for the KIP Matt.
> > >>
> > >> Regarding the handle interface of ProductionExceptionHandlerResponse,
> > >> could
> > >> you write it on the wiki also, along with the actual added config
> names
> > >> (e.g. what
> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> > deserialization+exception+handlers
> > >> described).
> > >>
> > >> The question I had about then handle parameters are around the record,
> > >> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> > >> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<?
> extends
> > >> Object, ? extends Object>`?
> > >>
> > >> Also, should the handle function include the `RecordMetadata` as well
> in
> > >> case it is not null?
> > >>
> > >> We may probably try to write down at least the following handling
> logic
> > >> and
> > >> see if the given API is sufficient for it: 1) throw exception
> > immediately
> > >> to fail fast and stop the world, 2) log the error and drop record and
> > >> proceed silently, 3) send such errors to a specific "error" Kafka
> topic,
> > >> or
> > >> record it as an app-level metrics (
> > >> https://kafka.apache.org/documentation/#kafka_streams_monitoring) for
> > >> monitoring.
> > >>
> > >> Guozhang
> > >>
> > >>
> > >>
> > >> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <ma...@frmr.me> wrote:
> > >>
> > >> > I did some more digging tonight.
> > >> >
> > >> > @Ted: It looks like the deserialization handler uses
> > >> > "default.deserialization.exception.handler" for the config name. No
> > >> > ".class" on the end. I'm inclined to think this should use
> > >> > "default.production.exception.handler".
> > >> >
> > >> > On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me> wrote:
> > >> >
> > >> > > Okay, I've dug into this a little bit.
> > >> > >
> > >> > > I think getting access to the serialized record is possible, and
> > >> changing
> > >> > > the naming and return type is certainly doable. However, because
> > we're
> > >> > > hooking into the onCompletion callback we have no guarantee that
> the
> > >> > > ProcessorContext state hasn't changed by the time this particular
> > >> handler
> > >> > > runs. So I think the signature would change to something like:
> > >> > >
> > >> > > ProductionExceptionHandlerResponse handle(final ProducerRecord<..>
> > >> > record,
> > >> > > final Exception exception)
> > >> > >
> > >> > > Would this be acceptable?
> > >> > >
> > >> > > On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:
> > >> > >
> > >> > >> Ah good idea. Hmmm. I can line up the naming and return type but
> > I’m
> > >> not
> > >> > >> sure if I can get my hands on the context and the record itself
> > >> without
> > >> > >> other changes.
> > >> > >>
> > >> > >> Let me dig in and follow up here tomorrow.
> > >> > >> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
> > >> matthias@confluent.io>
> > >> > >> wrote:
> > >> > >>
> > >> > >>> Thanks for the KIP.
> > >> > >>>
> > >> > >>> Are you familiar with KIP-161?
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> > >> > deserialization+exception+handlers
> > >> > >>>
> > >> > >>> I thinks, we should align the design (parameter naming, return
> > >> types,
> > >> > >>> class names etc) of KIP-210 to KIP-161 to get a unified user
> > >> > experience.
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> -Matthias
> > >> > >>>
> > >> > >>>
> > >> > >>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> > >> > >>> > I’ll create the JIRA ticket.
> > >> > >>> >
> > >> > >>> > I think that config name will work. I’ll update the KIP
> > >> accordingly.
> > >> > >>> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yu...@gmail.com>
> > >> wrote:
> > >> > >>> >
> > >> > >>> >> Can you create JIRA that corresponds to the KIP ?
> > >> > >>> >>
> > >> > >>> >> For the new config, how about naming it
> > >> > >>> >> production.exception.processor.class
> > >> > >>> >> ? This way it is clear that class name should be specified.
> > >> > >>> >>
> > >> > >>> >> Cheers
> > >> > >>> >>
> > >> > >>> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <ma...@frmr.me>
> > >> wrote:
> > >> > >>> >>
> > >> > >>> >>> Hello everyone,
> > >> > >>> >>>
> > >> > >>> >>> This is the discussion thread for the KIP that I just filed
> > >> here:
> > >> > >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > >>> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
> > >> > >>> >>> Streams+fails+to+produce
> > >> > >>> >>>
> > >> > >>> >>> Looking forward to getting some feedback from folks about
> this
> > >> idea
> > >> > >>> and
> > >> > >>> >>> working toward a solution we can contribute back. :)
> > >> > >>> >>>
> > >> > >>> >>> Cheers,
> > >> > >>> >>> Matt Farmer
> > >> > >>> >>>
> > >> > >>> >>
> > >> > >>> >
> > >> > >>>
> > >> > >>>
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Guozhang Wang <wa...@gmail.com>.
> Please correct me if I'm wrong, but my understanding is that the record
> metadata is always null if an exception occurred while trying to produce.

That is right. Thanks.

I looked at the example code, and one thing I realized that since we are
not passing the context in the handle function, we may not be implement the
logic to send the fail records into another queue for future processing.
Would people think that would be a big issue?


Guozhang


On Thu, Oct 26, 2017 at 12:14 PM, Matt Farmer <ma...@frmr.me> wrote:

> Hello all,
>
> I've updated the KIP based on this conversation, and made it so that its
> interface, config setting, and parameters line up more closely with the
> interface in KIP-161 (deserialization handler).
>
> I believe there are a few specific questions I need to reply to.....
>
> > The question I had about then handle parameters are around the record,
> > should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> > `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<? extends
> > Object, ? extends Object>`?
>
> At this point in the code we're guaranteed that this is a
> ProducerRecord<byte[], byte[]>, so the generics would just make it harder
> to work with the key and value.
>
> > Also, should the handle function include the `RecordMetadata` as well in
> > case it is not null?
>
> Please correct me if I'm wrong, but my understanding is that the record
> metadata is always null if an exception occurred while trying to produce.
>
> > We may probably try to write down at least the following handling logic
> and
> > see if the given API is sufficient for it
>
> I've added some examples to the KIP. Let me know what you think.
>
> Cheers,
> Matt
>
> On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me> wrote:
>
> > Thanks for this feedback. I’m at a conference right now and am planning
> on
> > updating the KIP again with details from this conversation later this
> week.
> >
> > I’ll shoot you a more detailed response then! :)
> > On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> >> Thanks for the KIP Matt.
> >>
> >> Regarding the handle interface of ProductionExceptionHandlerResponse,
> >> could
> >> you write it on the wiki also, along with the actual added config names
> >> (e.g. what
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> deserialization+exception+handlers
> >> described).
> >>
> >> The question I had about then handle parameters are around the record,
> >> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> >> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<? extends
> >> Object, ? extends Object>`?
> >>
> >> Also, should the handle function include the `RecordMetadata` as well in
> >> case it is not null?
> >>
> >> We may probably try to write down at least the following handling logic
> >> and
> >> see if the given API is sufficient for it: 1) throw exception
> immediately
> >> to fail fast and stop the world, 2) log the error and drop record and
> >> proceed silently, 3) send such errors to a specific "error" Kafka topic,
> >> or
> >> record it as an app-level metrics (
> >> https://kafka.apache.org/documentation/#kafka_streams_monitoring) for
> >> monitoring.
> >>
> >> Guozhang
> >>
> >>
> >>
> >> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <ma...@frmr.me> wrote:
> >>
> >> > I did some more digging tonight.
> >> >
> >> > @Ted: It looks like the deserialization handler uses
> >> > "default.deserialization.exception.handler" for the config name. No
> >> > ".class" on the end. I'm inclined to think this should use
> >> > "default.production.exception.handler".
> >> >
> >> > On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me> wrote:
> >> >
> >> > > Okay, I've dug into this a little bit.
> >> > >
> >> > > I think getting access to the serialized record is possible, and
> >> changing
> >> > > the naming and return type is certainly doable. However, because
> we're
> >> > > hooking into the onCompletion callback we have no guarantee that the
> >> > > ProcessorContext state hasn't changed by the time this particular
> >> handler
> >> > > runs. So I think the signature would change to something like:
> >> > >
> >> > > ProductionExceptionHandlerResponse handle(final ProducerRecord<..>
> >> > record,
> >> > > final Exception exception)
> >> > >
> >> > > Would this be acceptable?
> >> > >
> >> > > On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:
> >> > >
> >> > >> Ah good idea. Hmmm. I can line up the naming and return type but
> I’m
> >> not
> >> > >> sure if I can get my hands on the context and the record itself
> >> without
> >> > >> other changes.
> >> > >>
> >> > >> Let me dig in and follow up here tomorrow.
> >> > >> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
> >> matthias@confluent.io>
> >> > >> wrote:
> >> > >>
> >> > >>> Thanks for the KIP.
> >> > >>>
> >> > >>> Are you familiar with KIP-161?
> >> > >>>
> >> > >>>
> >> > >>>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> >> > deserialization+exception+handlers
> >> > >>>
> >> > >>> I thinks, we should align the design (parameter naming, return
> >> types,
> >> > >>> class names etc) of KIP-210 to KIP-161 to get a unified user
> >> > experience.
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> -Matthias
> >> > >>>
> >> > >>>
> >> > >>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> >> > >>> > I’ll create the JIRA ticket.
> >> > >>> >
> >> > >>> > I think that config name will work. I’ll update the KIP
> >> accordingly.
> >> > >>> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yu...@gmail.com>
> >> wrote:
> >> > >>> >
> >> > >>> >> Can you create JIRA that corresponds to the KIP ?
> >> > >>> >>
> >> > >>> >> For the new config, how about naming it
> >> > >>> >> production.exception.processor.class
> >> > >>> >> ? This way it is clear that class name should be specified.
> >> > >>> >>
> >> > >>> >> Cheers
> >> > >>> >>
> >> > >>> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <ma...@frmr.me>
> >> wrote:
> >> > >>> >>
> >> > >>> >>> Hello everyone,
> >> > >>> >>>
> >> > >>> >>> This is the discussion thread for the KIP that I just filed
> >> here:
> >> > >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > >>> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
> >> > >>> >>> Streams+fails+to+produce
> >> > >>> >>>
> >> > >>> >>> Looking forward to getting some feedback from folks about this
> >> idea
> >> > >>> and
> >> > >>> >>> working toward a solution we can contribute back. :)
> >> > >>> >>>
> >> > >>> >>> Cheers,
> >> > >>> >>> Matt Farmer
> >> > >>> >>>
> >> > >>> >>
> >> > >>> >
> >> > >>>
> >> > >>>
> >> >
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
Hello all,

I've updated the KIP based on this conversation, and made it so that its
interface, config setting, and parameters line up more closely with the
interface in KIP-161 (deserialization handler).

I believe there are a few specific questions I need to reply to.....

> The question I had about then handle parameters are around the record,
> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<? extends
> Object, ? extends Object>`?

At this point in the code we're guaranteed that this is a
ProducerRecord<byte[], byte[]>, so the generics would just make it harder
to work with the key and value.

> Also, should the handle function include the `RecordMetadata` as well in
> case it is not null?

Please correct me if I'm wrong, but my understanding is that the record
metadata is always null if an exception occurred while trying to produce.

> We may probably try to write down at least the following handling logic
and
> see if the given API is sufficient for it

I've added some examples to the KIP. Let me know what you think.

Cheers,
Matt

On Mon, Oct 23, 2017 at 9:00 PM Matt Farmer <ma...@frmr.me> wrote:

> Thanks for this feedback. I’m at a conference right now and am planning on
> updating the KIP again with details from this conversation later this week.
>
> I’ll shoot you a more detailed response then! :)
> On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wa...@gmail.com> wrote:
>
>> Thanks for the KIP Matt.
>>
>> Regarding the handle interface of ProductionExceptionHandlerResponse,
>> could
>> you write it on the wiki also, along with the actual added config names
>> (e.g. what
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
>> described).
>>
>> The question I had about then handle parameters are around the record,
>> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
>> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<? extends
>> Object, ? extends Object>`?
>>
>> Also, should the handle function include the `RecordMetadata` as well in
>> case it is not null?
>>
>> We may probably try to write down at least the following handling logic
>> and
>> see if the given API is sufficient for it: 1) throw exception immediately
>> to fail fast and stop the world, 2) log the error and drop record and
>> proceed silently, 3) send such errors to a specific "error" Kafka topic,
>> or
>> record it as an app-level metrics (
>> https://kafka.apache.org/documentation/#kafka_streams_monitoring) for
>> monitoring.
>>
>> Guozhang
>>
>>
>>
>> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <ma...@frmr.me> wrote:
>>
>> > I did some more digging tonight.
>> >
>> > @Ted: It looks like the deserialization handler uses
>> > "default.deserialization.exception.handler" for the config name. No
>> > ".class" on the end. I'm inclined to think this should use
>> > "default.production.exception.handler".
>> >
>> > On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me> wrote:
>> >
>> > > Okay, I've dug into this a little bit.
>> > >
>> > > I think getting access to the serialized record is possible, and
>> changing
>> > > the naming and return type is certainly doable. However, because we're
>> > > hooking into the onCompletion callback we have no guarantee that the
>> > > ProcessorContext state hasn't changed by the time this particular
>> handler
>> > > runs. So I think the signature would change to something like:
>> > >
>> > > ProductionExceptionHandlerResponse handle(final ProducerRecord<..>
>> > record,
>> > > final Exception exception)
>> > >
>> > > Would this be acceptable?
>> > >
>> > > On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:
>> > >
>> > >> Ah good idea. Hmmm. I can line up the naming and return type but I’m
>> not
>> > >> sure if I can get my hands on the context and the record itself
>> without
>> > >> other changes.
>> > >>
>> > >> Let me dig in and follow up here tomorrow.
>> > >> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
>> matthias@confluent.io>
>> > >> wrote:
>> > >>
>> > >>> Thanks for the KIP.
>> > >>>
>> > >>> Are you familiar with KIP-161?
>> > >>>
>> > >>>
>> > >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
>> > deserialization+exception+handlers
>> > >>>
>> > >>> I thinks, we should align the design (parameter naming, return
>> types,
>> > >>> class names etc) of KIP-210 to KIP-161 to get a unified user
>> > experience.
>> > >>>
>> > >>>
>> > >>>
>> > >>> -Matthias
>> > >>>
>> > >>>
>> > >>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>> > >>> > I’ll create the JIRA ticket.
>> > >>> >
>> > >>> > I think that config name will work. I’ll update the KIP
>> accordingly.
>> > >>> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yu...@gmail.com>
>> wrote:
>> > >>> >
>> > >>> >> Can you create JIRA that corresponds to the KIP ?
>> > >>> >>
>> > >>> >> For the new config, how about naming it
>> > >>> >> production.exception.processor.class
>> > >>> >> ? This way it is clear that class name should be specified.
>> > >>> >>
>> > >>> >> Cheers
>> > >>> >>
>> > >>> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <ma...@frmr.me>
>> wrote:
>> > >>> >>
>> > >>> >>> Hello everyone,
>> > >>> >>>
>> > >>> >>> This is the discussion thread for the KIP that I just filed
>> here:
>> > >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > >>> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
>> > >>> >>> Streams+fails+to+produce
>> > >>> >>>
>> > >>> >>> Looking forward to getting some feedback from folks about this
>> idea
>> > >>> and
>> > >>> >>> working toward a solution we can contribute back. :)
>> > >>> >>>
>> > >>> >>> Cheers,
>> > >>> >>> Matt Farmer
>> > >>> >>>
>> > >>> >>
>> > >>> >
>> > >>>
>> > >>>
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
Thanks for this feedback. I’m at a conference right now and am planning on
updating the KIP again with details from this conversation later this week.

I’ll shoot you a more detailed response then! :)
On Mon, Oct 23, 2017 at 8:16 PM Guozhang Wang <wa...@gmail.com> wrote:

> Thanks for the KIP Matt.
>
> Regarding the handle interface of ProductionExceptionHandlerResponse, could
> you write it on the wiki also, along with the actual added config names
> (e.g. what
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
> described).
>
> The question I had about then handle parameters are around the record,
> should it be `ProducerRecord<byte[], byte[]>`, or be generics of
> `ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<? extends
> Object, ? extends Object>`?
>
> Also, should the handle function include the `RecordMetadata` as well in
> case it is not null?
>
> We may probably try to write down at least the following handling logic and
> see if the given API is sufficient for it: 1) throw exception immediately
> to fail fast and stop the world, 2) log the error and drop record and
> proceed silently, 3) send such errors to a specific "error" Kafka topic, or
> record it as an app-level metrics (
> https://kafka.apache.org/documentation/#kafka_streams_monitoring) for
> monitoring.
>
> Guozhang
>
>
>
> On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <ma...@frmr.me> wrote:
>
> > I did some more digging tonight.
> >
> > @Ted: It looks like the deserialization handler uses
> > "default.deserialization.exception.handler" for the config name. No
> > ".class" on the end. I'm inclined to think this should use
> > "default.production.exception.handler".
> >
> > On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me> wrote:
> >
> > > Okay, I've dug into this a little bit.
> > >
> > > I think getting access to the serialized record is possible, and
> changing
> > > the naming and return type is certainly doable. However, because we're
> > > hooking into the onCompletion callback we have no guarantee that the
> > > ProcessorContext state hasn't changed by the time this particular
> handler
> > > runs. So I think the signature would change to something like:
> > >
> > > ProductionExceptionHandlerResponse handle(final ProducerRecord<..>
> > record,
> > > final Exception exception)
> > >
> > > Would this be acceptable?
> > >
> > > On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:
> > >
> > >> Ah good idea. Hmmm. I can line up the naming and return type but I’m
> not
> > >> sure if I can get my hands on the context and the record itself
> without
> > >> other changes.
> > >>
> > >> Let me dig in and follow up here tomorrow.
> > >> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <
> matthias@confluent.io>
> > >> wrote:
> > >>
> > >>> Thanks for the KIP.
> > >>>
> > >>> Are you familiar with KIP-161?
> > >>>
> > >>>
> > >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> > deserialization+exception+handlers
> > >>>
> > >>> I thinks, we should align the design (parameter naming, return types,
> > >>> class names etc) of KIP-210 to KIP-161 to get a unified user
> > experience.
> > >>>
> > >>>
> > >>>
> > >>> -Matthias
> > >>>
> > >>>
> > >>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> > >>> > I’ll create the JIRA ticket.
> > >>> >
> > >>> > I think that config name will work. I’ll update the KIP
> accordingly.
> > >>> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yu...@gmail.com>
> wrote:
> > >>> >
> > >>> >> Can you create JIRA that corresponds to the KIP ?
> > >>> >>
> > >>> >> For the new config, how about naming it
> > >>> >> production.exception.processor.class
> > >>> >> ? This way it is clear that class name should be specified.
> > >>> >>
> > >>> >> Cheers
> > >>> >>
> > >>> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <ma...@frmr.me>
> wrote:
> > >>> >>
> > >>> >>> Hello everyone,
> > >>> >>>
> > >>> >>> This is the discussion thread for the KIP that I just filed here:
> > >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
> > >>> >>> Streams+fails+to+produce
> > >>> >>>
> > >>> >>> Looking forward to getting some feedback from folks about this
> idea
> > >>> and
> > >>> >>> working toward a solution we can contribute back. :)
> > >>> >>>
> > >>> >>> Cheers,
> > >>> >>> Matt Farmer
> > >>> >>>
> > >>> >>
> > >>> >
> > >>>
> > >>>
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Guozhang Wang <wa...@gmail.com>.
Thanks for the KIP Matt.

Regarding the handle interface of ProductionExceptionHandlerResponse, could
you write it on the wiki also, along with the actual added config names
(e.g. what
https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
described).

The question I had about then handle parameters are around the record,
should it be `ProducerRecord<byte[], byte[]>`, or be generics of
`ProducerRecord<? extends K, ? extends V>` or `ProducerRecord<? extends
Object, ? extends Object>`?

Also, should the handle function include the `RecordMetadata` as well in
case it is not null?

We may probably try to write down at least the following handling logic and
see if the given API is sufficient for it: 1) throw exception immediately
to fail fast and stop the world, 2) log the error and drop record and
proceed silently, 3) send such errors to a specific "error" Kafka topic, or
record it as an app-level metrics (
https://kafka.apache.org/documentation/#kafka_streams_monitoring) for
monitoring.

Guozhang



On Fri, Oct 20, 2017 at 5:47 PM, Matt Farmer <ma...@frmr.me> wrote:

> I did some more digging tonight.
>
> @Ted: It looks like the deserialization handler uses
> "default.deserialization.exception.handler" for the config name. No
> ".class" on the end. I'm inclined to think this should use
> "default.production.exception.handler".
>
> On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me> wrote:
>
> > Okay, I've dug into this a little bit.
> >
> > I think getting access to the serialized record is possible, and changing
> > the naming and return type is certainly doable. However, because we're
> > hooking into the onCompletion callback we have no guarantee that the
> > ProcessorContext state hasn't changed by the time this particular handler
> > runs. So I think the signature would change to something like:
> >
> > ProductionExceptionHandlerResponse handle(final ProducerRecord<..>
> record,
> > final Exception exception)
> >
> > Would this be acceptable?
> >
> > On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:
> >
> >> Ah good idea. Hmmm. I can line up the naming and return type but I’m not
> >> sure if I can get my hands on the context and the record itself without
> >> other changes.
> >>
> >> Let me dig in and follow up here tomorrow.
> >> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <ma...@confluent.io>
> >> wrote:
> >>
> >>> Thanks for the KIP.
> >>>
> >>> Are you familiar with KIP-161?
> >>>
> >>>
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+
> deserialization+exception+handlers
> >>>
> >>> I thinks, we should align the design (parameter naming, return types,
> >>> class names etc) of KIP-210 to KIP-161 to get a unified user
> experience.
> >>>
> >>>
> >>>
> >>> -Matthias
> >>>
> >>>
> >>> On 10/18/17 4:20 PM, Matt Farmer wrote:
> >>> > I’ll create the JIRA ticket.
> >>> >
> >>> > I think that config name will work. I’ll update the KIP accordingly.
> >>> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yu...@gmail.com> wrote:
> >>> >
> >>> >> Can you create JIRA that corresponds to the KIP ?
> >>> >>
> >>> >> For the new config, how about naming it
> >>> >> production.exception.processor.class
> >>> >> ? This way it is clear that class name should be specified.
> >>> >>
> >>> >> Cheers
> >>> >>
> >>> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <ma...@frmr.me> wrote:
> >>> >>
> >>> >>> Hello everyone,
> >>> >>>
> >>> >>> This is the discussion thread for the KIP that I just filed here:
> >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
> >>> >>> Streams+fails+to+produce
> >>> >>>
> >>> >>> Looking forward to getting some feedback from folks about this idea
> >>> and
> >>> >>> working toward a solution we can contribute back. :)
> >>> >>>
> >>> >>> Cheers,
> >>> >>> Matt Farmer
> >>> >>>
> >>> >>
> >>> >
> >>>
> >>>
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
I did some more digging tonight.

@Ted: It looks like the deserialization handler uses
"default.deserialization.exception.handler" for the config name. No
".class" on the end. I'm inclined to think this should use
"default.production.exception.handler".

On Fri, Oct 20, 2017 at 8:22 PM Matt Farmer <ma...@frmr.me> wrote:

> Okay, I've dug into this a little bit.
>
> I think getting access to the serialized record is possible, and changing
> the naming and return type is certainly doable. However, because we're
> hooking into the onCompletion callback we have no guarantee that the
> ProcessorContext state hasn't changed by the time this particular handler
> runs. So I think the signature would change to something like:
>
> ProductionExceptionHandlerResponse handle(final ProducerRecord<..> record,
> final Exception exception)
>
> Would this be acceptable?
>
> On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:
>
>> Ah good idea. Hmmm. I can line up the naming and return type but I’m not
>> sure if I can get my hands on the context and the record itself without
>> other changes.
>>
>> Let me dig in and follow up here tomorrow.
>> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>
>>> Thanks for the KIP.
>>>
>>> Are you familiar with KIP-161?
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
>>>
>>> I thinks, we should align the design (parameter naming, return types,
>>> class names etc) of KIP-210 to KIP-161 to get a unified user experience.
>>>
>>>
>>>
>>> -Matthias
>>>
>>>
>>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>>> > I’ll create the JIRA ticket.
>>> >
>>> > I think that config name will work. I’ll update the KIP accordingly.
>>> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yu...@gmail.com> wrote:
>>> >
>>> >> Can you create JIRA that corresponds to the KIP ?
>>> >>
>>> >> For the new config, how about naming it
>>> >> production.exception.processor.class
>>> >> ? This way it is clear that class name should be specified.
>>> >>
>>> >> Cheers
>>> >>
>>> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <ma...@frmr.me> wrote:
>>> >>
>>> >>> Hello everyone,
>>> >>>
>>> >>> This is the discussion thread for the KIP that I just filed here:
>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
>>> >>> Streams+fails+to+produce
>>> >>>
>>> >>> Looking forward to getting some feedback from folks about this idea
>>> and
>>> >>> working toward a solution we can contribute back. :)
>>> >>>
>>> >>> Cheers,
>>> >>> Matt Farmer
>>> >>>
>>> >>
>>> >
>>>
>>>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
Okay, I've dug into this a little bit.

I think getting access to the serialized record is possible, and changing
the naming and return type is certainly doable. However, because we're
hooking into the onCompletion callback we have no guarantee that the
ProcessorContext state hasn't changed by the time this particular handler
runs. So I think the signature would change to something like:

ProductionExceptionHandlerResponse handle(final ProducerRecord<..> record,
final Exception exception)

Would this be acceptable?

On Thu, Oct 19, 2017 at 7:33 PM Matt Farmer <ma...@frmr.me> wrote:

> Ah good idea. Hmmm. I can line up the naming and return type but I’m not
> sure if I can get my hands on the context and the record itself without
> other changes.
>
> Let me dig in and follow up here tomorrow.
> On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <ma...@confluent.io>
> wrote:
>
>> Thanks for the KIP.
>>
>> Are you familiar with KIP-161?
>>
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
>>
>> I thinks, we should align the design (parameter naming, return types,
>> class names etc) of KIP-210 to KIP-161 to get a unified user experience.
>>
>>
>>
>> -Matthias
>>
>>
>> On 10/18/17 4:20 PM, Matt Farmer wrote:
>> > I’ll create the JIRA ticket.
>> >
>> > I think that config name will work. I’ll update the KIP accordingly.
>> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yu...@gmail.com> wrote:
>> >
>> >> Can you create JIRA that corresponds to the KIP ?
>> >>
>> >> For the new config, how about naming it
>> >> production.exception.processor.class
>> >> ? This way it is clear that class name should be specified.
>> >>
>> >> Cheers
>> >>
>> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <ma...@frmr.me> wrote:
>> >>
>> >>> Hello everyone,
>> >>>
>> >>> This is the discussion thread for the KIP that I just filed here:
>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
>> >>> Streams+fails+to+produce
>> >>>
>> >>> Looking forward to getting some feedback from folks about this idea
>> and
>> >>> working toward a solution we can contribute back. :)
>> >>>
>> >>> Cheers,
>> >>> Matt Farmer
>> >>>
>> >>
>> >
>>
>>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
Ah good idea. Hmmm. I can line up the naming and return type but I’m not
sure if I can get my hands on the context and the record itself without
other changes.

Let me dig in and follow up here tomorrow.
On Thu, Oct 19, 2017 at 7:14 PM Matthias J. Sax <ma...@confluent.io>
wrote:

> Thanks for the KIP.
>
> Are you familiar with KIP-161?
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers
>
> I thinks, we should align the design (parameter naming, return types,
> class names etc) of KIP-210 to KIP-161 to get a unified user experience.
>
>
>
> -Matthias
>
>
> On 10/18/17 4:20 PM, Matt Farmer wrote:
> > I’ll create the JIRA ticket.
> >
> > I think that config name will work. I’ll update the KIP accordingly.
> > On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yu...@gmail.com> wrote:
> >
> >> Can you create JIRA that corresponds to the KIP ?
> >>
> >> For the new config, how about naming it
> >> production.exception.processor.class
> >> ? This way it is clear that class name should be specified.
> >>
> >> Cheers
> >>
> >> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <ma...@frmr.me> wrote:
> >>
> >>> Hello everyone,
> >>>
> >>> This is the discussion thread for the KIP that I just filed here:
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>> 210+-+Provide+for+custom+error+handling++when+Kafka+
> >>> Streams+fails+to+produce
> >>>
> >>> Looking forward to getting some feedback from folks about this idea and
> >>> working toward a solution we can contribute back. :)
> >>>
> >>> Cheers,
> >>> Matt Farmer
> >>>
> >>
> >
>
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Thanks for the KIP.

Are you familiar with KIP-161?

https://cwiki.apache.org/confluence/display/KAFKA/KIP-161%3A+streams+deserialization+exception+handlers

I thinks, we should align the design (parameter naming, return types,
class names etc) of KIP-210 to KIP-161 to get a unified user experience.



-Matthias


On 10/18/17 4:20 PM, Matt Farmer wrote:
> I’ll create the JIRA ticket.
> 
> I think that config name will work. I’ll update the KIP accordingly.
> On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yu...@gmail.com> wrote:
> 
>> Can you create JIRA that corresponds to the KIP ?
>>
>> For the new config, how about naming it
>> production.exception.processor.class
>> ? This way it is clear that class name should be specified.
>>
>> Cheers
>>
>> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <ma...@frmr.me> wrote:
>>
>>> Hello everyone,
>>>
>>> This is the discussion thread for the KIP that I just filed here:
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 210+-+Provide+for+custom+error+handling++when+Kafka+
>>> Streams+fails+to+produce
>>>
>>> Looking forward to getting some feedback from folks about this idea and
>>> working toward a solution we can contribute back. :)
>>>
>>> Cheers,
>>> Matt Farmer
>>>
>>
> 


Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Matt Farmer <ma...@frmr.me>.
I’ll create the JIRA ticket.

I think that config name will work. I’ll update the KIP accordingly.
On Wed, Oct 18, 2017 at 6:09 PM Ted Yu <yu...@gmail.com> wrote:

> Can you create JIRA that corresponds to the KIP ?
>
> For the new config, how about naming it
> production.exception.processor.class
> ? This way it is clear that class name should be specified.
>
> Cheers
>
> On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <ma...@frmr.me> wrote:
>
> > Hello everyone,
> >
> > This is the discussion thread for the KIP that I just filed here:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 210+-+Provide+for+custom+error+handling++when+Kafka+
> > Streams+fails+to+produce
> >
> > Looking forward to getting some feedback from folks about this idea and
> > working toward a solution we can contribute back. :)
> >
> > Cheers,
> > Matt Farmer
> >
>

Re: [DISCUSS] KIP-210: Provide for custom error handling when Kafka Streams fails to produce

Posted by Ted Yu <yu...@gmail.com>.
Can you create JIRA that corresponds to the KIP ?

For the new config, how about naming it production.exception.processor.class
? This way it is clear that class name should be specified.

Cheers

On Wed, Oct 18, 2017 at 2:40 PM, Matt Farmer <ma...@frmr.me> wrote:

> Hello everyone,
>
> This is the discussion thread for the KIP that I just filed here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 210+-+Provide+for+custom+error+handling++when+Kafka+
> Streams+fails+to+produce
>
> Looking forward to getting some feedback from folks about this idea and
> working toward a solution we can contribute back. :)
>
> Cheers,
> Matt Farmer
>