You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Niel Markwick (JIRA)" <ji...@apache.org> on 2019/01/11 15:26:00 UTC

[jira] [Comment Edited] (BEAM-6407) regression: FileIO.writeDynamic() with side inputs fails in DirectRunner

    [ https://issues.apache.org/jira/browse/BEAM-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740480#comment-16740480 ] 

Niel Markwick edited comment on BEAM-6407 at 1/11/19 3:25 PM:
--------------------------------------------------------------

Attached reproduction code: beam-filewriter-demo.tgz
pom has profiles for beam 2.9.0 and 2.8.0.

mvn compile exec:java \
   -Dexec.mainClass=com.google.nielm.FileWriterTest \
   -Dexec.args="--runner=DirectRunner" -Pbeam-2-9-0

---- testFileContextfulNaming —  
---- SUCCESS
---- testFileWriterContextfulSideInputNaming — 
java.lang.IllegalStateException: All PCollectionViews that are consumed must be written by some WriteView PTransform: Missing [<unnamed> [RunnerPCollectionView
<snip>
---- FAIL 
---- testFileWriterContextfulSideInputNamingSingleShard —  
---- SUCCESS

mvn compile exec:java \
   -Dexec.mainClass=com.google.nielm.FileWriterTest \
   -Dexec.args="--runner=DirectRunner" -Pbeam-2-8-0 

---- testFileContextfulNaming — 
---- SUCCESS 
---- testFileWriterContextfulSideInputNaming — 
---- SUCCESS
---- testFileWriterContextfulSideInputNamingSingleShard — 
---- SUCCESS


was (Author: nielm):
Attached reproduction code: beam-filewriter-demo.tgz

 

{{mvn compile exec:java \}}
{{   -Dexec.mainClass=com.google.nielm.FileWriterTest \}}
{{   -Dexec.args="--runner=DirectRunner" -Pbeam-2-9-0}}

{{---- testFileContextfulNaming ---  }}
{{---- SUCCESS }}
{{---- testFileWriterContextfulSideInputNaming ---  }}
{{java.lang.IllegalStateException: All PCollectionViews that are consumed must be written by some WriteView PTransform: Missing [<unnamed> [RunnerPCollectionView}}
{{]]}}{{<snip>}}
{{ ---- FAIL }}
{{---- testFileWriterContextfulSideInputNamingSingleShard ---  }}
{{---- SUCCESS}}
{{mvn compile exec:java \}}
{{   -Dexec.mainClass=com.google.nielm.FileWriterTest \}}
{{   -Dexec.args="--runner=DirectRunner" -Pbeam-2-8-0}}
{{---- testFileContextfulNaming ---  }}
{{---- SUCCESS }}
{{---- testFileWriterContextfulSideInputNaming ---  }}
{{---- SUCCESS }}
{{---- testFileWriterContextfulSideInputNamingSingleShard ---  }}
{{---- SUCCESS}}

> regression: FileIO.writeDynamic() with side inputs fails in DirectRunner
> ------------------------------------------------------------------------
>
>                 Key: BEAM-6407
>                 URL: https://issues.apache.org/jira/browse/BEAM-6407
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model
>    Affects Versions: 2.9.0
>            Reporter: Niel Markwick
>            Assignee: Kenneth Knowles
>            Priority: Major
>              Labels: regression
>         Attachments: beam-filewriter-demo.tgz
>
>
> When FileIO.writeDynamic is used with automatic sharding and  a Contextful.Fn that uses side inputs for the file naming, DirectRunner (and TestPipeline) fail with: 
> {{java.lang.IllegalStateException: All PCollectionViews that are consumed must be written by some WriteView PTransform: Missing [<unnamed> [RunnerPCollectionView]]}}
>  
> Example code:  
> {\{PCollectionView<String> outputDirectoryname = }}
>  {{   pipeline.apply(}}
>  {{      "outputDir",}}
>  {{       Create.of("/tmp/testout")).apply(View.asSingleton());}}
>  {{Contextful.Fn<String, FileIO.Write.FileNaming> manifestNaming =}}
>  {{   (element, c) ->}}
>  \{{      (window, pane, numShards, shardIndex, compression) -> }}
>  {{         c.sideInput(outputDirectoryname);}}
> {{pipeline.apply(FileIO.<String, String>writeDynamic()}}
>  {{   .by(SerializableFunctions.constant(""))}}
>  {{   .withDestinationCoder(StringUtf8Coder.of())}}
>  {{   .via(TextIO.sink())}}
>  {{   .withTempDirectory("/tmp")}}
>  {{   .withNaming(Contextful.of(}}
>  {{      manifestNaming,}}
>  {{      Requirements.requiresSideInputs(outputDirectoryname))));}}
>  
> This does not occur in Dataflow-runner
> It does not occur if the ContextFul.Fn is not given side inputs.
> It does not occur if withNumShards(1) is set.
> It did not occur in 2.8.0, and does in 2.9.0
>  
> The cause appears to be due to the DirectRunner using TransformOverrides re-writing FileIO sinks to use runner-determined-sharding
> ( see [DirectRunner.java line 226|https://github.com/apache/beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java#L226] )
>  but I do not know why this started occuring in 2.9.0...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)