You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Elias Levy <fe...@gmail.com> on 2016/06/10 22:03:57 UTC

Arrays values in keyBy

I would be useful if the documentation warned what type of equality it
expected of values used as keys in keyBy.  I just got bit in the ass by
converting a field from a string to a byte array.  All of the sudden the
windows were no longer aggregating.  So it seems Flink is not doing a deep
compare of arrays when comparing keys.

Re: Arrays values in keyBy

Posted by Robert Metzger <rm...@apache.org>.
I've filed a JIRA for this issue:
https://issues.apache.org/jira/browse/FLINK-5874

On Wed, Jul 20, 2016 at 4:32 PM, Stephan Ewen <se...@apache.org> wrote:

> I thing we can simply add this behavior when we use the TypeComparator in
> the keyBy() function. It can implement the hashCode() as a deepHashCode on
> array types.
>
> On Mon, Jun 13, 2016 at 12:30 PM, Ufuk Celebi <uc...@apache.org> wrote:
>
>> Would make sense to update the Javadocs for the next release.
>>
>> On Mon, Jun 13, 2016 at 11:19 AM, Aljoscha Krettek <al...@apache.org>
>> wrote:
>> > Yes, this is correct. Right now we're basically using <key>.hashCode()
>> for
>> > keying. (Which can be problematic in some cases.)
>> >
>> > Beam, for example, clearly specifies that the encoded form of a value
>> should
>> > be used for all comparisons/hashing. This is more well defined but can
>> lead
>> > to slow performance in some cases.
>> >
>> > On Sat, 11 Jun 2016 at 00:04 Elias Levy <fe...@gmail.com>
>> wrote:
>> >>
>> >> I would be useful if the documentation warned what type of equality it
>> >> expected of values used as keys in keyBy.  I just got bit in the ass by
>> >> converting a field from a string to a byte array.  All of the sudden
>> the
>> >> windows were no longer aggregating.  So it seems Flink is not doing a
>> deep
>> >> compare of arrays when comparing keys.
>>
>
>

Re: Arrays values in keyBy

Posted by Stephan Ewen <se...@apache.org>.
I thing we can simply add this behavior when we use the TypeComparator in
the keyBy() function. It can implement the hashCode() as a deepHashCode on
array types.

On Mon, Jun 13, 2016 at 12:30 PM, Ufuk Celebi <uc...@apache.org> wrote:

> Would make sense to update the Javadocs for the next release.
>
> On Mon, Jun 13, 2016 at 11:19 AM, Aljoscha Krettek <al...@apache.org>
> wrote:
> > Yes, this is correct. Right now we're basically using <key>.hashCode()
> for
> > keying. (Which can be problematic in some cases.)
> >
> > Beam, for example, clearly specifies that the encoded form of a value
> should
> > be used for all comparisons/hashing. This is more well defined but can
> lead
> > to slow performance in some cases.
> >
> > On Sat, 11 Jun 2016 at 00:04 Elias Levy <fe...@gmail.com>
> wrote:
> >>
> >> I would be useful if the documentation warned what type of equality it
> >> expected of values used as keys in keyBy.  I just got bit in the ass by
> >> converting a field from a string to a byte array.  All of the sudden the
> >> windows were no longer aggregating.  So it seems Flink is not doing a
> deep
> >> compare of arrays when comparing keys.
>

Re: Arrays values in keyBy

Posted by Ufuk Celebi <uc...@apache.org>.
Would make sense to update the Javadocs for the next release.

On Mon, Jun 13, 2016 at 11:19 AM, Aljoscha Krettek <al...@apache.org> wrote:
> Yes, this is correct. Right now we're basically using <key>.hashCode() for
> keying. (Which can be problematic in some cases.)
>
> Beam, for example, clearly specifies that the encoded form of a value should
> be used for all comparisons/hashing. This is more well defined but can lead
> to slow performance in some cases.
>
> On Sat, 11 Jun 2016 at 00:04 Elias Levy <fe...@gmail.com> wrote:
>>
>> I would be useful if the documentation warned what type of equality it
>> expected of values used as keys in keyBy.  I just got bit in the ass by
>> converting a field from a string to a byte array.  All of the sudden the
>> windows were no longer aggregating.  So it seems Flink is not doing a deep
>> compare of arrays when comparing keys.

Re: Arrays values in keyBy

Posted by Aljoscha Krettek <al...@apache.org>.
Yes, this is correct. Right now we're basically using <key>.hashCode() for
keying. (Which can be problematic in some cases.)

Beam, for example, clearly specifies that the encoded form of a value
should be used for all comparisons/hashing. This is more well defined but
can lead to slow performance in some cases.

On Sat, 11 Jun 2016 at 00:04 Elias Levy <fe...@gmail.com> wrote:

> I would be useful if the documentation warned what type of equality it
> expected of values used as keys in keyBy.  I just got bit in the ass by
> converting a field from a string to a byte array.  All of the sudden the
> windows were no longer aggregating.  So it seems Flink is not doing a deep
> compare of arrays when comparing keys.
>