You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Lukasz Cwik <lc...@google.com> on 2019/07/12 15:20:20 UTC

Re: [Java] Using a complex datastructure as Key for KV

Additional coders would be useful. Note that we usually don't have coders
for specific collection types like ArrayList but prefer to have Coders for
their general counterparts like List, Map, Iterable, ....

There has been discussion in the past to make the MapCoder a deterministic
coder when a coder is required to be deterministic. There are a few people
working on schema support within Apache Beam that might be able to provide
guidance (+Reuven Lax <re...@google.com> +Brian Hulette
<bh...@google.com>).

On Fri, Jul 12, 2019 at 11:05 AM Shannon Duncan <jo...@liveramp.com>
wrote:

> I have a working TreeMapCoder now. Got it all setup and done, and the
> GroupByKey is accepting it.
>
> Thanks for all the help. I need to read up more on contributing guidelines
> then I'll PR the coder into the SDK. Also willing to write coders for
> things such as ArrayList etc if people want them.
>
> On Fri, Jul 12, 2019 at 9:31 AM Shannon Duncan <jo...@liveramp.com>
> wrote:
>
>> Aha, makes sense. Thanks!
>>
>> On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik <lc...@google.com> wrote:
>>
>>> TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of()));
>>>
>>> On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan <
>>> joseph.duncan@liveramp.com> wrote:
>>>
>>>> So I have my custom coder created for TreeMap and I'm ready to set it...
>>>>
>>>> So my Type is "TreeMap<String, ArrayList<Integer>>"
>>>>
>>>> What do I put for ".setCoder(TreeMapCoder.of(???, ???))"
>>>>
>>>> On Thu, Jul 11, 2019 at 8:21 PM Rui Wang <ru...@google.com> wrote:
>>>>
>>>>> Hi Shannon,  [1] will be a good start on coder in Java SDK.
>>>>>
>>>>>
>>>>> [1]
>>>>> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety
>>>>>
>>>>> Rui
>>>>>
>>>>> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <
>>>>> joseph.duncan@liveramp.com> wrote:
>>>>>
>>>>>> Was able to get it to use ArrayList by doing List<List<Integer>>
>>>>>> result = new ArrayList<List<Integer>>();
>>>>>>
>>>>>> Then storing my keys in a separate array that I'll pass in as a side
>>>>>> input to key for the list of lists.
>>>>>>
>>>>>> Thanks for the help, lemme know more in the future about how coders
>>>>>> work and instantiate and I'd love to help contribute by adding some new
>>>>>> coders.
>>>>>>
>>>>>> - Shannon
>>>>>>
>>>>>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <
>>>>>> joseph.duncan@liveramp.com> wrote:
>>>>>>
>>>>>>> Will do. Thanks. A new coder for deterministic Maps would be great
>>>>>>> in the future. Thank you!
>>>>>>>
>>>>>>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang <ru...@google.com> wrote:
>>>>>>>
>>>>>>>> I think Mike refers to ListCoder
>>>>>>>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/ListCoder.java> which
>>>>>>>> is deterministic if its element is the same. Maybe you can search the repo
>>>>>>>> for examples of ListCoder?
>>>>>>>>
>>>>>>>>
>>>>>>>> -Rui
>>>>>>>>
>>>>>>>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
>>>>>>>> joseph.duncan@liveramp.com> wrote:
>>>>>>>>
>>>>>>>>> So ArrayList doesn't work either, so just a standard List?
>>>>>>>>>
>>>>>>>>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang <ru...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Shannon, I agree with Mike on List is a good workaround if your
>>>>>>>>>> element within list is deterministic and you are eager to make your new
>>>>>>>>>> pipeline working.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Let me send back some pointers to adding new coder later.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Rui
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
>>>>>>>>>> joseph.duncan@liveramp.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I just started learning Java today to attempt to convert our
>>>>>>>>>>> python pipelines to Java to take advantage of key features that Java has. I
>>>>>>>>>>> have no idea how I would create a new coder and include it in for beam to
>>>>>>>>>>> recognize.
>>>>>>>>>>>
>>>>>>>>>>> If you can point me in the right direction of where it hooks
>>>>>>>>>>> together I might be able to figure that out. I can duplicate MapCoder and
>>>>>>>>>>> try to make changes, but how will beam know to pick up that coder for a
>>>>>>>>>>> groupByKey?
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> Shannon
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ru...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> It could be just straightforward to create a SortedMapCoder for
>>>>>>>>>>>> TreeMap. Just add checks on map instances and then change
>>>>>>>>>>>> verifyDeterministic.
>>>>>>>>>>>>
>>>>>>>>>>>> If this is a common need we could just submit it into Beam repo.
>>>>>>>>>>>>
>>>>>>>>>>>> [1]:
>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <
>>>>>>>>>>>> mike@mikepedersen.dk> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> There isn't a coder for deterministic maps in Beam, so even if
>>>>>>>>>>>>> your datastructure is deterministic, Beam will assume the serialized bytes
>>>>>>>>>>>>> aren't deterministic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> You could make one using the MapCoder as a guide:
>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
>>>>>>>>>>>>> Just change it such that the exception in VerifyDeterministic
>>>>>>>>>>>>> is removed and when decoding it instantiates a TreeMap or such instead of a
>>>>>>>>>>>>> HashMap.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Alternatively, you could just represent your key as a sorted
>>>>>>>>>>>>> list of KV pairs. Lookups could be done using binary search if necessary.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>
>>>>>>>>>>>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan <
>>>>>>>>>>>>> joseph.duncan@liveramp.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I'm working on essentially doing a word-count on a complex
>>>>>>>>>>>>>> data structure.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I tried just using a HashMap as the Structure, but that
>>>>>>>>>>>>>> didn't work because it is non-deterministic.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> However when Given a LinkedHashMap or TreeMap which is
>>>>>>>>>>>>>> deterministic the SDK complains that it's non-deterministic when trying to
>>>>>>>>>>>>>> use it as a key for GroupByKey.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What would be an appropriate Map style data structure that
>>>>>>>>>>>>>> would be deterministic enough for Apache Beam to accept it as a key?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Shannon
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Re: [Java] Using a complex datastructure as Key for KV

Posted by Shannon Duncan <jo...@liveramp.com>.
I tried to pass ArrayList in and it wouldn't generalize it to List. It
required me to convert my ArrayLists  to Lists.

On Fri, Jul 12, 2019 at 10:20 AM Lukasz Cwik <lc...@google.com> wrote:

> Additional coders would be useful. Note that we usually don't have coders
> for specific collection types like ArrayList but prefer to have Coders for
> their general counterparts like List, Map, Iterable, ....
>
> There has been discussion in the past to make the MapCoder a deterministic
> coder when a coder is required to be deterministic. There are a few people
> working on schema support within Apache Beam that might be able to provide
> guidance (+Reuven Lax <re...@google.com> +Brian Hulette
> <bh...@google.com>).
>
> On Fri, Jul 12, 2019 at 11:05 AM Shannon Duncan <
> joseph.duncan@liveramp.com> wrote:
>
>> I have a working TreeMapCoder now. Got it all setup and done, and the
>> GroupByKey is accepting it.
>>
>> Thanks for all the help. I need to read up more on contributing
>> guidelines then I'll PR the coder into the SDK. Also willing to write
>> coders for things such as ArrayList etc if people want them.
>>
>> On Fri, Jul 12, 2019 at 9:31 AM Shannon Duncan <
>> joseph.duncan@liveramp.com> wrote:
>>
>>> Aha, makes sense. Thanks!
>>>
>>> On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of()));
>>>>
>>>> On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan <
>>>> joseph.duncan@liveramp.com> wrote:
>>>>
>>>>> So I have my custom coder created for TreeMap and I'm ready to set
>>>>> it...
>>>>>
>>>>> So my Type is "TreeMap<String, ArrayList<Integer>>"
>>>>>
>>>>> What do I put for ".setCoder(TreeMapCoder.of(???, ???))"
>>>>>
>>>>> On Thu, Jul 11, 2019 at 8:21 PM Rui Wang <ru...@google.com> wrote:
>>>>>
>>>>>> Hi Shannon,  [1] will be a good start on coder in Java SDK.
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety
>>>>>>
>>>>>> Rui
>>>>>>
>>>>>> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <
>>>>>> joseph.duncan@liveramp.com> wrote:
>>>>>>
>>>>>>> Was able to get it to use ArrayList by doing List<List<Integer>>
>>>>>>> result = new ArrayList<List<Integer>>();
>>>>>>>
>>>>>>> Then storing my keys in a separate array that I'll pass in as a side
>>>>>>> input to key for the list of lists.
>>>>>>>
>>>>>>> Thanks for the help, lemme know more in the future about how coders
>>>>>>> work and instantiate and I'd love to help contribute by adding some new
>>>>>>> coders.
>>>>>>>
>>>>>>> - Shannon
>>>>>>>
>>>>>>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <
>>>>>>> joseph.duncan@liveramp.com> wrote:
>>>>>>>
>>>>>>>> Will do. Thanks. A new coder for deterministic Maps would be great
>>>>>>>> in the future. Thank you!
>>>>>>>>
>>>>>>>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang <ru...@google.com> wrote:
>>>>>>>>
>>>>>>>>> I think Mike refers to ListCoder
>>>>>>>>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/ListCoder.java> which
>>>>>>>>> is deterministic if its element is the same. Maybe you can search the repo
>>>>>>>>> for examples of ListCoder?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Rui
>>>>>>>>>
>>>>>>>>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
>>>>>>>>> joseph.duncan@liveramp.com> wrote:
>>>>>>>>>
>>>>>>>>>> So ArrayList doesn't work either, so just a standard List?
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang <ru...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Shannon, I agree with Mike on List is a good workaround if your
>>>>>>>>>>> element within list is deterministic and you are eager to make your new
>>>>>>>>>>> pipeline working.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Let me send back some pointers to adding new coder later.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Rui
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
>>>>>>>>>>> joseph.duncan@liveramp.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I just started learning Java today to attempt to convert our
>>>>>>>>>>>> python pipelines to Java to take advantage of key features that Java has. I
>>>>>>>>>>>> have no idea how I would create a new coder and include it in for beam to
>>>>>>>>>>>> recognize.
>>>>>>>>>>>>
>>>>>>>>>>>> If you can point me in the right direction of where it hooks
>>>>>>>>>>>> together I might be able to figure that out. I can duplicate MapCoder and
>>>>>>>>>>>> try to make changes, but how will beam know to pick up that coder for a
>>>>>>>>>>>> groupByKey?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Shannon
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ru...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> It could be just straightforward to create a SortedMapCoder
>>>>>>>>>>>>> for TreeMap. Just add checks on map instances and then change
>>>>>>>>>>>>> verifyDeterministic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If this is a common need we could just submit it into Beam
>>>>>>>>>>>>> repo.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]:
>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <
>>>>>>>>>>>>> mike@mikepedersen.dk> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> There isn't a coder for deterministic maps in Beam, so even
>>>>>>>>>>>>>> if your datastructure is deterministic, Beam will assume the serialized
>>>>>>>>>>>>>> bytes aren't deterministic.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You could make one using the MapCoder as a guide:
>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
>>>>>>>>>>>>>> Just change it such that the exception in VerifyDeterministic
>>>>>>>>>>>>>> is removed and when decoding it instantiates a TreeMap or such instead of a
>>>>>>>>>>>>>> HashMap.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alternatively, you could just represent your key as a sorted
>>>>>>>>>>>>>> list of KV pairs. Lookups could be done using binary search if necessary.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan <
>>>>>>>>>>>>>> joseph.duncan@liveramp.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So I'm working on essentially doing a word-count on a
>>>>>>>>>>>>>>> complex data structure.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I tried just using a HashMap as the Structure, but that
>>>>>>>>>>>>>>> didn't work because it is non-deterministic.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However when Given a LinkedHashMap or TreeMap which is
>>>>>>>>>>>>>>> deterministic the SDK complains that it's non-deterministic when trying to
>>>>>>>>>>>>>>> use it as a key for GroupByKey.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What would be an appropriate Map style data structure that
>>>>>>>>>>>>>>> would be deterministic enough for Apache Beam to accept it as a key?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Shannon
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Re: [Java] Using a complex datastructure as Key for KV

Posted by Shannon Duncan <jo...@liveramp.com>.
I tried to pass ArrayList in and it wouldn't generalize it to List. It
required me to convert my ArrayLists  to Lists.

On Fri, Jul 12, 2019 at 10:20 AM Lukasz Cwik <lc...@google.com> wrote:

> Additional coders would be useful. Note that we usually don't have coders
> for specific collection types like ArrayList but prefer to have Coders for
> their general counterparts like List, Map, Iterable, ....
>
> There has been discussion in the past to make the MapCoder a deterministic
> coder when a coder is required to be deterministic. There are a few people
> working on schema support within Apache Beam that might be able to provide
> guidance (+Reuven Lax <re...@google.com> +Brian Hulette
> <bh...@google.com>).
>
> On Fri, Jul 12, 2019 at 11:05 AM Shannon Duncan <
> joseph.duncan@liveramp.com> wrote:
>
>> I have a working TreeMapCoder now. Got it all setup and done, and the
>> GroupByKey is accepting it.
>>
>> Thanks for all the help. I need to read up more on contributing
>> guidelines then I'll PR the coder into the SDK. Also willing to write
>> coders for things such as ArrayList etc if people want them.
>>
>> On Fri, Jul 12, 2019 at 9:31 AM Shannon Duncan <
>> joseph.duncan@liveramp.com> wrote:
>>
>>> Aha, makes sense. Thanks!
>>>
>>> On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of()));
>>>>
>>>> On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan <
>>>> joseph.duncan@liveramp.com> wrote:
>>>>
>>>>> So I have my custom coder created for TreeMap and I'm ready to set
>>>>> it...
>>>>>
>>>>> So my Type is "TreeMap<String, ArrayList<Integer>>"
>>>>>
>>>>> What do I put for ".setCoder(TreeMapCoder.of(???, ???))"
>>>>>
>>>>> On Thu, Jul 11, 2019 at 8:21 PM Rui Wang <ru...@google.com> wrote:
>>>>>
>>>>>> Hi Shannon,  [1] will be a good start on coder in Java SDK.
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety
>>>>>>
>>>>>> Rui
>>>>>>
>>>>>> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <
>>>>>> joseph.duncan@liveramp.com> wrote:
>>>>>>
>>>>>>> Was able to get it to use ArrayList by doing List<List<Integer>>
>>>>>>> result = new ArrayList<List<Integer>>();
>>>>>>>
>>>>>>> Then storing my keys in a separate array that I'll pass in as a side
>>>>>>> input to key for the list of lists.
>>>>>>>
>>>>>>> Thanks for the help, lemme know more in the future about how coders
>>>>>>> work and instantiate and I'd love to help contribute by adding some new
>>>>>>> coders.
>>>>>>>
>>>>>>> - Shannon
>>>>>>>
>>>>>>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <
>>>>>>> joseph.duncan@liveramp.com> wrote:
>>>>>>>
>>>>>>>> Will do. Thanks. A new coder for deterministic Maps would be great
>>>>>>>> in the future. Thank you!
>>>>>>>>
>>>>>>>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang <ru...@google.com> wrote:
>>>>>>>>
>>>>>>>>> I think Mike refers to ListCoder
>>>>>>>>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/ListCoder.java> which
>>>>>>>>> is deterministic if its element is the same. Maybe you can search the repo
>>>>>>>>> for examples of ListCoder?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Rui
>>>>>>>>>
>>>>>>>>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
>>>>>>>>> joseph.duncan@liveramp.com> wrote:
>>>>>>>>>
>>>>>>>>>> So ArrayList doesn't work either, so just a standard List?
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang <ru...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Shannon, I agree with Mike on List is a good workaround if your
>>>>>>>>>>> element within list is deterministic and you are eager to make your new
>>>>>>>>>>> pipeline working.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Let me send back some pointers to adding new coder later.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Rui
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
>>>>>>>>>>> joseph.duncan@liveramp.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I just started learning Java today to attempt to convert our
>>>>>>>>>>>> python pipelines to Java to take advantage of key features that Java has. I
>>>>>>>>>>>> have no idea how I would create a new coder and include it in for beam to
>>>>>>>>>>>> recognize.
>>>>>>>>>>>>
>>>>>>>>>>>> If you can point me in the right direction of where it hooks
>>>>>>>>>>>> together I might be able to figure that out. I can duplicate MapCoder and
>>>>>>>>>>>> try to make changes, but how will beam know to pick up that coder for a
>>>>>>>>>>>> groupByKey?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Shannon
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <ru...@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> It could be just straightforward to create a SortedMapCoder
>>>>>>>>>>>>> for TreeMap. Just add checks on map instances and then change
>>>>>>>>>>>>> verifyDeterministic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If this is a common need we could just submit it into Beam
>>>>>>>>>>>>> repo.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]:
>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <
>>>>>>>>>>>>> mike@mikepedersen.dk> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> There isn't a coder for deterministic maps in Beam, so even
>>>>>>>>>>>>>> if your datastructure is deterministic, Beam will assume the serialized
>>>>>>>>>>>>>> bytes aren't deterministic.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You could make one using the MapCoder as a guide:
>>>>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
>>>>>>>>>>>>>> Just change it such that the exception in VerifyDeterministic
>>>>>>>>>>>>>> is removed and when decoding it instantiates a TreeMap or such instead of a
>>>>>>>>>>>>>> HashMap.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alternatively, you could just represent your key as a sorted
>>>>>>>>>>>>>> list of KV pairs. Lookups could be done using binary search if necessary.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mike
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan <
>>>>>>>>>>>>>> joseph.duncan@liveramp.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So I'm working on essentially doing a word-count on a
>>>>>>>>>>>>>>> complex data structure.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I tried just using a HashMap as the Structure, but that
>>>>>>>>>>>>>>> didn't work because it is non-deterministic.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However when Given a LinkedHashMap or TreeMap which is
>>>>>>>>>>>>>>> deterministic the SDK complains that it's non-deterministic when trying to
>>>>>>>>>>>>>>> use it as a key for GroupByKey.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What would be an appropriate Map style data structure that
>>>>>>>>>>>>>>> would be deterministic enough for Apache Beam to accept it as a key?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Shannon
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>