You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/08/26 18:10:41 UTC

[GitHub] [beam] lukecwik commented on a change in pull request #12678: [BEAM-10703] Add a step property for shardable states during Dataflow graph translation (Java)

lukecwik commented on a change in pull request #12678:
URL: https://github.com/apache/beam/pull/12678#discussion_r477492609



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java
##########
@@ -100,8 +99,7 @@ public long getBatchSize() {
         ParDo.of(new GroupIntoBatchesDoFn<>(batchSize, allowedLateness, keyCoder, valueCoder)));
   }
 
-  @VisibleForTesting
-  static class GroupIntoBatchesDoFn<K, InputT>
+  public static class GroupIntoBatchesDoFn<K, InputT>

Review comment:
       There have been a few ways this has been done in the past:
   * (easiest), record which transforms need this property within the DataflowRunner and then lookup this information during translation (e.g. [doesPCollectionRequireIndexedFormat](https://github.com/apache/beam/blob/b1849ed09fb236906ff0b83b0f394c08b05d4b3c/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L1260))
   * replace the public GroupIntoBatches transform with a Dataflow specific `primitive` that makes any additional information visible that is needed during translation (e.g. [DataflowRunner.CombineGroupedValues](https://github.com/apache/beam/blob/b1849ed09fb236906ff0b83b0f394c08b05d4b3c/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java#L799))




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org