You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/04/08 03:04:12 UTC

[GitHub] [beam] udim commented on a change in pull request #11241: [BEAM-5422] Document DynamicDestinations.getTable uniqueness requirement

udim commented on a change in pull request #11241: [BEAM-5422] Document DynamicDestinations.getTable uniqueness requirement
URL: https://github.com/apache/beam/pull/11241#discussion_r405231575
 
 

 ##########
 File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinations.java
 ##########
 @@ -142,7 +142,11 @@ void setSideInputAccessorFromProcessContext(DoFn<?, ?>.ProcessContext context) {
     return null;
   }
 
-  /** Returns a {@link TableDestination} object for the destination. May not return null. */
+  /**
+   * Returns a {@link TableDestination} object for the destination. May not return null. Return
+   * value needs to be unique to each destination: may not return the same {@link TableDestination}
+   * for different destinations.
 
 Review comment:
   TLDR: Pablo is right.
   
   In Python SDK, a user function translates an element to a TableReference.
   In Java SDK, a user DynamicDestinations instance translates an element to a DestinationT, and then to a TableDestination.
   
   Java does a reshuffle on (DestinationT, element) pairs, while Python does it on (TableReference, element) pairs.
   
   (Not sure why Java uses an intermediate DestinationT. Convenience? Better GBK performance? Lower resource use?)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services