You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/09/02 18:08:53 UTC

[GitHub] [beam] scwhittle commented on a change in pull request #15436: [WIP] Add option for FileIO to stage files directly for unbounded input with fixed windows

scwhittle commented on a change in pull request #15436:
URL: https://github.com/apache/beam/pull/15436#discussion_r701313898



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
##########
@@ -373,13 +397,70 @@ public void validate(PipelineOptions options) {
 
     boolean fixedSharding = getComputeNumShards() != null || getNumShardsProvider() != null;
     PCollection<List<FileResult<DestinationT>>> tempFileResults;
-    if (fixedSharding) {
+    if (getStageFilesDirectly()) {
+      checkArgument(
+          input.getWindowingStrategy().getWindowFn() instanceof FixedWindows,
+          "Must be a fixed window to promote window to key.");

Review comment:
       I definitely can be relaxed to accept non-merging windows.  We could accept merging windows as well but the behavior might less consistent because we won't run window merging before the groupbykey and thus only exact window matches would possibly stage directly. It might be better to just disallow for that reason.
   
   If we move this behavior to  the non-fixed sharding path using GroupIntoBatches, that also doesn't currently support windowing/triggering either:
   https://issues.apache.org/jira/browse/BEAM-12040
   
   I kind of agree with https://issues.apache.org/jira/browse/BEAM-4604 that windowing and triggering is a bit confusing when it is applied to composite transforms such as FileIO.  Unless you know the implementation does or does not use GroupByKey you don't know if the windowing/triggering will be have any effect.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org