You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Guozhang Wang <wa...@gmail.com> on 2019/05/11 19:34:16 UTC

[DISCUSS] KIP-467: Augment ProduceResponse error messaging

Hello everyone,

I'd like to start a discussion thread on this newly created KIP to improve
error communication and handling for producer response:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records

Thanks,
-- 
-- Guozhang

Re: [DISCUSS] KIP-467: Augment ProduceResponse error messaging

Posted by Magnus Edenhill <ma...@edenhill.se>.
And a question regarding transactions:

what is the effect of a failing record on a transaction, do we rely on the
application to abort the transaction as needed?
This is a behavioural change since today an application can rely on the
underlying idempotent producer
to be sure that all messages in a transaction were indeed successfully
produced (if not a fatal error is raised),
but with this change messages may be skipped, requiring an application to
add logic to call abortTransactions()
if messages fail to produce. Failing to do so will risk transaction
atomicity.

/Magnus




Den mån 9 sep. 2019 kl 08:49 skrev Magnus Edenhill <ma...@edenhill.se>:

> Hey Guozhang,
>
> I'm late to the game, again, but if it is not too late I'd recommend that
> instead
> of having a single error code for all failed records, instead provide
> per-record
> error codes by having an array of [relative_offset, error_code], where
> they're
> both varint-encoded for space-efficiency.
>
> Regards,
> Magnus
> just a couple of months late.
>
> Den ons 17 juli 2019 kl 00:18 skrev Guozhang Wang <wa...@gmail.com>:
>
>> Hello folks,
>>
>> I've updated the wiki a bit more and here are the changes:
>>
>> 1. *InvalidRecordException* would inherit from ApiException (which is
>> already non-retriable), not KafkaException.
>>
>> 2. Explained how to set the error codes if there are multiple error
>> scenarios on the broker-side.
>>
>>
>> I'm going to start the voting thread soon.
>>
>> Guozhang
>>
>>
>> On Thu, Jun 20, 2019 at 4:24 PM Guozhang Wang <wa...@gmail.com> wrote:
>>
>> > Hi Jun,
>> >
>> > Thanks for your comments.
>> >
>> > 1. Yeah I think APIException would not make a distinct call here
>> anymore,
>> > and what really matters is the RetriableException. Updated the wiki.
>> >
>> > 2. Makes sense. Updated the wiki.
>> >
>> > 3. My current thoughts is to return the first ever hit error for that
>> > partition, and also the encoded error_records would only includes the
>> > relative offsets of records that hitting that exact error as well.
>> >
>> >
>> > Guozhang
>> >
>> > On Wed, Jun 12, 2019 at 3:38 PM Jun Rao <ju...@confluent.io> wrote:
>> >
>> >> Hi, Guozhang,
>> >>
>> >> Thanks for the KIP. A few comments below.
>> >>
>> >> 1. "If the error_records is not empty and the error code is not API
>> >> exception and is not retriable, still retry by creating a new batch ".
>> >> InvalidTimestampException
>> >> is an ApiException. It seems we should still retry the non-error
>> records
>> >> in
>> >> the batch.
>> >>
>> >> 2. error_records => [INT64] : Since we don't have more than 2 billion
>> >> messages per batch, we can just use INT32. It would also be useful to
>> >> describe what those numbers are. I guess they are the relative offset
>> in
>> >> the batch?
>> >>
>> >> 3. It's possible that a batch of records hit more than one type of
>> error
>> >> for different records, which error code and error message should the
>> >> server
>> >> return to the client?
>> >>
>> >> Jun
>> >>
>> >> On Sat, May 11, 2019 at 12:44 PM Guozhang Wang <wa...@gmail.com>
>> >> wrote:
>> >>
>> >> > Hello everyone,
>> >> >
>> >> > I'd like to start a discussion thread on this newly created KIP to
>> >> improve
>> >> > error communication and handling for producer response:
>> >> >
>> >> >
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records
>> >> >
>> >> > Thanks,
>> >> > --
>> >> > -- Guozhang
>> >> >
>> >>
>> >
>> >
>> > --
>> > -- Guozhang
>> >
>>
>>
>> --
>> -- Guozhang
>>
>

Re: [DISCUSS] KIP-467: Augment ProduceResponse error messaging

Posted by Magnus Edenhill <ma...@edenhill.se>.
Hey Guozhang,

I'm late to the game, again, but if it is not too late I'd recommend that
instead
of having a single error code for all failed records, instead provide
per-record
error codes by having an array of [relative_offset, error_code], where
they're
both varint-encoded for space-efficiency.

Regards,
Magnus
just a couple of months late.

Den ons 17 juli 2019 kl 00:18 skrev Guozhang Wang <wa...@gmail.com>:

> Hello folks,
>
> I've updated the wiki a bit more and here are the changes:
>
> 1. *InvalidRecordException* would inherit from ApiException (which is
> already non-retriable), not KafkaException.
>
> 2. Explained how to set the error codes if there are multiple error
> scenarios on the broker-side.
>
>
> I'm going to start the voting thread soon.
>
> Guozhang
>
>
> On Thu, Jun 20, 2019 at 4:24 PM Guozhang Wang <wa...@gmail.com> wrote:
>
> > Hi Jun,
> >
> > Thanks for your comments.
> >
> > 1. Yeah I think APIException would not make a distinct call here anymore,
> > and what really matters is the RetriableException. Updated the wiki.
> >
> > 2. Makes sense. Updated the wiki.
> >
> > 3. My current thoughts is to return the first ever hit error for that
> > partition, and also the encoded error_records would only includes the
> > relative offsets of records that hitting that exact error as well.
> >
> >
> > Guozhang
> >
> > On Wed, Jun 12, 2019 at 3:38 PM Jun Rao <ju...@confluent.io> wrote:
> >
> >> Hi, Guozhang,
> >>
> >> Thanks for the KIP. A few comments below.
> >>
> >> 1. "If the error_records is not empty and the error code is not API
> >> exception and is not retriable, still retry by creating a new batch ".
> >> InvalidTimestampException
> >> is an ApiException. It seems we should still retry the non-error records
> >> in
> >> the batch.
> >>
> >> 2. error_records => [INT64] : Since we don't have more than 2 billion
> >> messages per batch, we can just use INT32. It would also be useful to
> >> describe what those numbers are. I guess they are the relative offset in
> >> the batch?
> >>
> >> 3. It's possible that a batch of records hit more than one type of error
> >> for different records, which error code and error message should the
> >> server
> >> return to the client?
> >>
> >> Jun
> >>
> >> On Sat, May 11, 2019 at 12:44 PM Guozhang Wang <wa...@gmail.com>
> >> wrote:
> >>
> >> > Hello everyone,
> >> >
> >> > I'd like to start a discussion thread on this newly created KIP to
> >> improve
> >> > error communication and handling for producer response:
> >> >
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records
> >> >
> >> > Thanks,
> >> > --
> >> > -- Guozhang
> >> >
> >>
> >
> >
> > --
> > -- Guozhang
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-467: Augment ProduceResponse error messaging

Posted by Guozhang Wang <wa...@gmail.com>.
Hello folks,

I've updated the wiki a bit more and here are the changes:

1. *InvalidRecordException* would inherit from ApiException (which is
already non-retriable), not KafkaException.

2. Explained how to set the error codes if there are multiple error
scenarios on the broker-side.


I'm going to start the voting thread soon.

Guozhang


On Thu, Jun 20, 2019 at 4:24 PM Guozhang Wang <wa...@gmail.com> wrote:

> Hi Jun,
>
> Thanks for your comments.
>
> 1. Yeah I think APIException would not make a distinct call here anymore,
> and what really matters is the RetriableException. Updated the wiki.
>
> 2. Makes sense. Updated the wiki.
>
> 3. My current thoughts is to return the first ever hit error for that
> partition, and also the encoded error_records would only includes the
> relative offsets of records that hitting that exact error as well.
>
>
> Guozhang
>
> On Wed, Jun 12, 2019 at 3:38 PM Jun Rao <ju...@confluent.io> wrote:
>
>> Hi, Guozhang,
>>
>> Thanks for the KIP. A few comments below.
>>
>> 1. "If the error_records is not empty and the error code is not API
>> exception and is not retriable, still retry by creating a new batch ".
>> InvalidTimestampException
>> is an ApiException. It seems we should still retry the non-error records
>> in
>> the batch.
>>
>> 2. error_records => [INT64] : Since we don't have more than 2 billion
>> messages per batch, we can just use INT32. It would also be useful to
>> describe what those numbers are. I guess they are the relative offset in
>> the batch?
>>
>> 3. It's possible that a batch of records hit more than one type of error
>> for different records, which error code and error message should the
>> server
>> return to the client?
>>
>> Jun
>>
>> On Sat, May 11, 2019 at 12:44 PM Guozhang Wang <wa...@gmail.com>
>> wrote:
>>
>> > Hello everyone,
>> >
>> > I'd like to start a discussion thread on this newly created KIP to
>> improve
>> > error communication and handling for producer response:
>> >
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records
>> >
>> > Thanks,
>> > --
>> > -- Guozhang
>> >
>>
>
>
> --
> -- Guozhang
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-467: Augment ProduceResponse error messaging

Posted by Guozhang Wang <wa...@gmail.com>.
Hi Jun,

Thanks for your comments.

1. Yeah I think APIException would not make a distinct call here anymore,
and what really matters is the RetriableException. Updated the wiki.

2. Makes sense. Updated the wiki.

3. My current thoughts is to return the first ever hit error for that
partition, and also the encoded error_records would only includes the
relative offsets of records that hitting that exact error as well.


Guozhang

On Wed, Jun 12, 2019 at 3:38 PM Jun Rao <ju...@confluent.io> wrote:

> Hi, Guozhang,
>
> Thanks for the KIP. A few comments below.
>
> 1. "If the error_records is not empty and the error code is not API
> exception and is not retriable, still retry by creating a new batch ".
> InvalidTimestampException
> is an ApiException. It seems we should still retry the non-error records in
> the batch.
>
> 2. error_records => [INT64] : Since we don't have more than 2 billion
> messages per batch, we can just use INT32. It would also be useful to
> describe what those numbers are. I guess they are the relative offset in
> the batch?
>
> 3. It's possible that a batch of records hit more than one type of error
> for different records, which error code and error message should the server
> return to the client?
>
> Jun
>
> On Sat, May 11, 2019 at 12:44 PM Guozhang Wang <wa...@gmail.com> wrote:
>
> > Hello everyone,
> >
> > I'd like to start a discussion thread on this newly created KIP to
> improve
> > error communication and handling for producer response:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records
> >
> > Thanks,
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-467: Augment ProduceResponse error messaging

Posted by Jun Rao <ju...@confluent.io>.
Hi, Guozhang,

Thanks for the KIP. A few comments below.

1. "If the error_records is not empty and the error code is not API
exception and is not retriable, still retry by creating a new batch ".
InvalidTimestampException
is an ApiException. It seems we should still retry the non-error records in
the batch.

2. error_records => [INT64] : Since we don't have more than 2 billion
messages per batch, we can just use INT32. It would also be useful to
describe what those numbers are. I guess they are the relative offset in
the batch?

3. It's possible that a batch of records hit more than one type of error
for different records, which error code and error message should the server
return to the client?

Jun

On Sat, May 11, 2019 at 12:44 PM Guozhang Wang <wa...@gmail.com> wrote:

> Hello everyone,
>
> I'd like to start a discussion thread on this newly created KIP to improve
> error communication and handling for producer response:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records
>
> Thanks,
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-467: Augment ProduceResponse error messaging

Posted by Guozhang Wang <wa...@gmail.com>.
Hi Kamal,

I intend to scope KIP-467 for producer <--> brokers interaction only, so
that we can have a small enough one to discuss and tackle.

I've just read about KIP-334 as well and I feel it's still better to be
addressed independently by itself. LMK what you think.


Guozhang

On Mon, Jun 10, 2019 at 6:33 AM Kamal Chandraprakash <
kamal.chandraprakash@gmail.com> wrote:

> This KIP is inline with KIP-334
> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=87297793
> >
> in which it's proposed that the consumer will throw FaultyRecordException
> with corrupt/in-operative record offset and topic-partition on encountering
> the invalid record.
>
> In this KIP proposed changes, you've mentioned to skip the corrupted record
> while producing the RecordBatch,
> Will you also handle the same case while consuming the records by providing
> a callback to the consumer to
> either skip or halt the processing? (This is a follow-up idea of KIP-334
> which seems relevant to this one)
>
> On Sat, Jun 8, 2019 at 5:29 AM Guozhang Wang <wa...@gmail.com> wrote:
>
> > Bump up this thread again for whoever's interested.
> >
> > On Sat, May 11, 2019 at 12:34 PM Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> > > Hello everyone,
> > >
> > > I'd like to start a discussion thread on this newly created KIP to
> > improve
> > > error communication and handling for producer response:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records
> > >
> > > Thanks,
> > > --
> > > -- Guozhang
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-467: Augment ProduceResponse error messaging

Posted by Kamal Chandraprakash <ka...@gmail.com>.
This KIP is inline with KIP-334
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=87297793>
in which it's proposed that the consumer will throw FaultyRecordException
with corrupt/in-operative record offset and topic-partition on encountering
the invalid record.

In this KIP proposed changes, you've mentioned to skip the corrupted record
while producing the RecordBatch,
Will you also handle the same case while consuming the records by providing
a callback to the consumer to
either skip or halt the processing? (This is a follow-up idea of KIP-334
which seems relevant to this one)

On Sat, Jun 8, 2019 at 5:29 AM Guozhang Wang <wa...@gmail.com> wrote:

> Bump up this thread again for whoever's interested.
>
> On Sat, May 11, 2019 at 12:34 PM Guozhang Wang <wa...@gmail.com> wrote:
>
> > Hello everyone,
> >
> > I'd like to start a discussion thread on this newly created KIP to
> improve
> > error communication and handling for producer response:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records
> >
> > Thanks,
> > --
> > -- Guozhang
> >
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-467: Augment ProduceResponse error messaging

Posted by Guozhang Wang <wa...@gmail.com>.
Bump up this thread again for whoever's interested.

On Sat, May 11, 2019 at 12:34 PM Guozhang Wang <wa...@gmail.com> wrote:

> Hello everyone,
>
> I'd like to start a discussion thread on this newly created KIP to improve
> error communication and handling for producer response:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records
>
> Thanks,
> --
> -- Guozhang
>


-- 
-- Guozhang