You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/03/14 16:29:16 UTC

[GitHub] [beam] robertwb commented on a change in pull request #17037: Add deterministic dict coding via key sorting.

robertwb commented on a change in pull request #17037:
URL: https://github.com/apache/beam/pull/17037#discussion_r826149518



##########
File path: sdks/python/apache_beam/coders/coder_impl.py
##########
@@ -651,18 +670,27 @@ class MapCoderImpl(StreamCoderImpl):
   def __init__(
       self,
       key_coder,  # type: CoderImpl
-      value_coder  # type: CoderImpl
+      value_coder,  # type: CoderImpl
+      is_deterministic = False
   ):
     self._key_coder = key_coder
     self._value_coder = value_coder
+    self._is_deterministic = is_deterministic
 
   def encode_to_stream(self, dict_value, out, nested):
     out.write_bigendian_int32(len(dict_value))
-    for key, value in dict_value.items():
-      # Note this implementation always uses nested context when encoding keys
-      # and values which differs from Java. See note in docstring.
-      self._key_coder.encode_to_stream(key, out, True)
-      self._value_coder.encode_to_stream(value, out, True)
+    # Note this implementation always uses nested context when encoding keys
+    # and values which differs from Java. See note in docstring.
+    if self._is_deterministic:
+      for key, value in sorted(dict_value.items()):
+        self._key_coder.encode_to_stream(key, out, True)
+        self._value_coder.encode_to_stream(value, out, True)
+    else:
+      # This loop is separate form the above so the dict.items() call will be

Review comment:
       Clarified here and below: it's the `for k, v in dict.items()` that is optimized for dictionaries. (Throwing the sorted in there breaks this pattern.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org