You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jay Kreps <ja...@gmail.com> on 2012/09/14 18:10:45 UTC

Broker computes message crcs twice

Hey guys,

I noticed that our ByteBufferMessageSet iterator computes the message CRC
on iteration. Log.append also loops through all messages and checks the
same checksum. This means we are effectively computing the CRC32 twice
inside Log.append (we may actually be doing it more than that we would have
to look for other times we loop through the message set). This is actually
quite expensive. Does anyone remember why the second check was added? I am
not completely sure of the right fix--intuitively I think we should remove
the automatic check in the iterator and instead add a check in the
high-level consumer. I think it is okay for the simple consumer to be left
to manually call isValid or not since it is a lower-level interface.

Thoughts?

-Jay

Re: Broker computes message crcs twice

Posted by Jun Rao <ju...@gmail.com>.
Yes, we should remove the crc check in the iterator of
ByteBufferMessageSet. Log.append already checks crc, which protects the
server. Consumer, if wants to, could explicitly check the crc on each
message on its own.

Thanks,

Jun

On Fri, Sep 14, 2012 at 9:19 AM, Jay Kreps <ja...@gmail.com> wrote:

> Okay, I checked on this. The producer computes the checksum twice, which is
> once too many. The server computes the checksum an astounding 39 times in
> 0.8 branch! I think this answers the earlier questions: The right thing to
> do is clearly to remove the check from the iterator as people are iterating
> many times.
>
> -Jay
>
> On Fri, Sep 14, 2012 at 9:12 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > I actually think the validBytes() call is looping and hence computing
> CRCs
> > too. So actually at least three times.
> >
> > -Jay
> >
> >
> > On Fri, Sep 14, 2012 at 9:10 AM, Jay Kreps <ja...@gmail.com> wrote:
> >
> >> Hey guys,
> >>
> >> I noticed that our ByteBufferMessageSet iterator computes the message
> CRC
> >> on iteration. Log.append also loops through all messages and checks the
> >> same checksum. This means we are effectively computing the CRC32 twice
> >> inside Log.append (we may actually be doing it more than that we would
> have
> >> to look for other times we loop through the message set). This is
> actually
> >> quite expensive. Does anyone remember why the second check was added? I
> am
> >> not completely sure of the right fix--intuitively I think we should
> remove
> >> the automatic check in the iterator and instead add a check in the
> >> high-level consumer. I think it is okay for the simple consumer to be
> left
> >> to manually call isValid or not since it is a lower-level interface.
> >>
> >> Thoughts?
> >>
> >> -Jay
> >>
> >
> >
>

Re: Broker computes message crcs twice

Posted by Jay Kreps <ja...@gmail.com>.
Okay, I checked on this. The producer computes the checksum twice, which is
once too many. The server computes the checksum an astounding 39 times in
0.8 branch! I think this answers the earlier questions: The right thing to
do is clearly to remove the check from the iterator as people are iterating
many times.

-Jay

On Fri, Sep 14, 2012 at 9:12 AM, Jay Kreps <ja...@gmail.com> wrote:

> I actually think the validBytes() call is looping and hence computing CRCs
> too. So actually at least three times.
>
> -Jay
>
>
> On Fri, Sep 14, 2012 at 9:10 AM, Jay Kreps <ja...@gmail.com> wrote:
>
>> Hey guys,
>>
>> I noticed that our ByteBufferMessageSet iterator computes the message CRC
>> on iteration. Log.append also loops through all messages and checks the
>> same checksum. This means we are effectively computing the CRC32 twice
>> inside Log.append (we may actually be doing it more than that we would have
>> to look for other times we loop through the message set). This is actually
>> quite expensive. Does anyone remember why the second check was added? I am
>> not completely sure of the right fix--intuitively I think we should remove
>> the automatic check in the iterator and instead add a check in the
>> high-level consumer. I think it is okay for the simple consumer to be left
>> to manually call isValid or not since it is a lower-level interface.
>>
>> Thoughts?
>>
>> -Jay
>>
>
>

Re: Broker computes message crcs twice

Posted by Jay Kreps <ja...@gmail.com>.
I actually think the validBytes() call is looping and hence computing CRCs
too. So actually at least three times.

-Jay

On Fri, Sep 14, 2012 at 9:10 AM, Jay Kreps <ja...@gmail.com> wrote:

> Hey guys,
>
> I noticed that our ByteBufferMessageSet iterator computes the message CRC
> on iteration. Log.append also loops through all messages and checks the
> same checksum. This means we are effectively computing the CRC32 twice
> inside Log.append (we may actually be doing it more than that we would have
> to look for other times we loop through the message set). This is actually
> quite expensive. Does anyone remember why the second check was added? I am
> not completely sure of the right fix--intuitively I think we should remove
> the automatic check in the iterator and instead add a check in the
> high-level consumer. I think it is okay for the simple consumer to be left
> to manually call isValid or not since it is a lower-level interface.
>
> Thoughts?
>
> -Jay
>