You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Dan Hill <qu...@gmail.com> on 2021/09/23 23:42:45 UTC

byte array as keys in Flink

*Context*
I want to perform joins based on UUIDs.  String version is less efficient
so I figured I should use the byte[] version.  I did a shallow dive into
the Flink code I'm not sure it's safe to use byte[] as a key (since it uses
object equals/hashcode).

*Request*
How do other Flink devs do for byte[] keys? I want to use byte[] as a key
in a MapState.

Re: byte array as keys in Flink

Posted by Dan Hill <qu...@gmail.com>.
Thanks, everyone!

This all makes and why I sent the email.  I think UUID would work.  We also
talked about BigInteger and ByteBuffer.


On Fri, Sep 24, 2021 at 2:32 AM Schwalbe Matthias <
Matthias.Schwalbe@viseca.ch> wrote:

> Hi Dan,
>
>
>
> Did you consider using java.utils.UUID as key type? It consists of two
> longs which should perform well for use as key.
>
> TypeInformation will map to GenericTypeInfo, i.e. it uses KryoSerializer,
> unless you register a specific TypeInformation for this class …
>
>
>
> I didn’t give it a try … keep us posted if that works 😊
>
>
>
> Thias
>
>
>
>
>
> *From:* Guowei Ma <gu...@gmail.com>
> *Sent:* Freitag, 24. September 2021 09:34
> *To:* Caizhi Weng <ts...@gmail.com>
> *Cc:* Dan Hill <qu...@gmail.com>; user <us...@flink.apache.org>
> *Subject:* Re: byte array as keys in Flink
>
>
>
> Hi Hill
>
>
>
> As far as I know you could not use byte[] as a keyby. You could find more
> information from [1].
>
>
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/datastream/operators/overview/#keyby
>
>
>
> Best,
>
> Guowei
>
>
>
>
>
> On Fri, Sep 24, 2021 at 3:15 PM Caizhi Weng <ts...@gmail.com> wrote:
>
> Hi!
>
>
>
> It depends on the state backend you use. For example if you use a heap
> memory state backend it is backed by a hash map and it uses the hash code
> of byte[] to compare the two byte[] (see HeapMapState#put). However for
> rocksdb state backend it uses the serialized bytes (that is to say, the
> content of byte[]) to compare with the records and thus two byte[] with the
> same content can match (see RocksDBMapState#put).
>
>
>
> Dan Hill <qu...@gmail.com> 于2021年9月24日周五 上午7:43写道:
>
> *Context*
>
> I want to perform joins based on UUIDs.  String version is less efficient
> so I figured I should use the byte[] version.  I did a shallow dive into
> the Flink code I'm not sure it's safe to use byte[] as a key (since it uses
> object equals/hashcode).
>
>
>
> *Request*
>
> How do other Flink devs do for byte[] keys? I want to use byte[] as a key
> in a MapState.
>
>
>
>
>
> Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und
> beinhaltet unter Umständen vertrauliche Mitteilungen. Da die
> Vertraulichkeit von e-Mail-Nachrichten nicht gewährleistet werden kann,
> übernehmen wir keine Haftung für die Gewährung der Vertraulichkeit und
> Unversehrtheit dieser Mitteilung. Bei irrtümlicher Zustellung bitten wir
> Sie um Benachrichtigung per e-Mail und um Löschung dieser Nachricht sowie
> eventueller Anhänge. Jegliche unberechtigte Verwendung oder Verbreitung
> dieser Informationen ist streng verboten.
>
> This message is intended only for the named recipient and may contain
> confidential or privileged information. As the confidentiality of email
> communication cannot be guaranteed, we do not accept any responsibility for
> the confidentiality and the intactness of this message. If you have
> received it in error, please advise the sender by return e-mail and delete
> this message and any attachments. Any unauthorised use or dissemination of
> this information is strictly prohibited.
>

RE: byte array as keys in Flink

Posted by Schwalbe Matthias <Ma...@viseca.ch>.
Hi Dan,

Did you consider using java.utils.UUID as key type? It consists of two longs which should perform well for use as key.
TypeInformation will map to GenericTypeInfo, i.e. it uses KryoSerializer, unless you register a specific TypeInformation for this class …

I didn’t give it a try … keep us posted if that works 😊

Thias


From: Guowei Ma <gu...@gmail.com>
Sent: Freitag, 24. September 2021 09:34
To: Caizhi Weng <ts...@gmail.com>
Cc: Dan Hill <qu...@gmail.com>; user <us...@flink.apache.org>
Subject: Re: byte array as keys in Flink

Hi Hill

As far as I know you could not use byte[] as a keyby. You could find more information from [1].

[1] https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/datastream/operators/overview/#keyby

Best,
Guowei


On Fri, Sep 24, 2021 at 3:15 PM Caizhi Weng <ts...@gmail.com>> wrote:
Hi!

It depends on the state backend you use. For example if you use a heap memory state backend it is backed by a hash map and it uses the hash code of byte[] to compare the two byte[] (see HeapMapState#put). However for rocksdb state backend it uses the serialized bytes (that is to say, the content of byte[]) to compare with the records and thus two byte[] with the same content can match (see RocksDBMapState#put).

Dan Hill <qu...@gmail.com>> 于2021年9月24日周五 上午7:43写道:
Context
I want to perform joins based on UUIDs.  String version is less efficient so I figured I should use the byte[] version.  I did a shallow dive into the Flink code I'm not sure it's safe to use byte[] as a key (since it uses object equals/hashcode).

Request
How do other Flink devs do for byte[] keys? I want to use byte[] as a key in a MapState.


Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng verboten.

This message is intended only for the named recipient and may contain confidential or privileged information. As the confidentiality of email communication cannot be guaranteed, we do not accept any responsibility for the confidentiality and the intactness of this message. If you have received it in error, please advise the sender by return e-mail and delete this message and any attachments. Any unauthorised use or dissemination of this information is strictly prohibited.

Re: byte array as keys in Flink

Posted by Guowei Ma <gu...@gmail.com>.
Hi Hill

As far as I know you could not use byte[] as a keyby. You could find more
information from [1].

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/datastream/operators/overview/#keyby

Best,
Guowei


On Fri, Sep 24, 2021 at 3:15 PM Caizhi Weng <ts...@gmail.com> wrote:

> Hi!
>
> It depends on the state backend you use. For example if you use a heap
> memory state backend it is backed by a hash map and it uses the hash code
> of byte[] to compare the two byte[] (see HeapMapState#put). However for
> rocksdb state backend it uses the serialized bytes (that is to say, the
> content of byte[]) to compare with the records and thus two byte[] with the
> same content can match (see RocksDBMapState#put).
>
> Dan Hill <qu...@gmail.com> 于2021年9月24日周五 上午7:43写道:
>
>> *Context*
>> I want to perform joins based on UUIDs.  String version is less efficient
>> so I figured I should use the byte[] version.  I did a shallow dive into
>> the Flink code I'm not sure it's safe to use byte[] as a key (since it uses
>> object equals/hashcode).
>>
>> *Request*
>> How do other Flink devs do for byte[] keys? I want to use byte[] as a key
>> in a MapState.
>>
>>
>>

Re: byte array as keys in Flink

Posted by Caizhi Weng <ts...@gmail.com>.
Hi!

It depends on the state backend you use. For example if you use a heap
memory state backend it is backed by a hash map and it uses the hash code
of byte[] to compare the two byte[] (see HeapMapState#put). However for
rocksdb state backend it uses the serialized bytes (that is to say, the
content of byte[]) to compare with the records and thus two byte[] with the
same content can match (see RocksDBMapState#put).

Dan Hill <qu...@gmail.com> 于2021年9月24日周五 上午7:43写道:

> *Context*
> I want to perform joins based on UUIDs.  String version is less efficient
> so I figured I should use the byte[] version.  I did a shallow dive into
> the Flink code I'm not sure it's safe to use byte[] as a key (since it uses
> object equals/hashcode).
>
> *Request*
> How do other Flink devs do for byte[] keys? I want to use byte[] as a key
> in a MapState.
>
>
>