You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Jeff Klukas (JIRA)" <ji...@apache.org> on 2019/01/10 01:19:00 UTC

[jira] [Commented] (BEAM-6399) FileIO errors on unbounded input with nondefault trigger

    [ https://issues.apache.org/jira/browse/BEAM-6399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738883#comment-16738883 ] 

Jeff Klukas commented on BEAM-6399:
-----------------------------------

I previously thought this problem occurred only when a ValueProvider was passed for numShards, but it turns out I was using a non-ValueProvider pipeline parameter that defaulted to zero, thus unsharded. I modified the description to capture that.

> FileIO errors on unbounded input with nondefault trigger
> --------------------------------------------------------
>
>                 Key: BEAM-6399
>                 URL: https://issues.apache.org/jira/browse/BEAM-6399
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-files
>            Reporter: Jeff Klukas
>            Priority: Major
>
> {{In a pipeline with unbounded input, if a user defines a custom trigger and does not specify a specific non-zero withNumShards, they may see an IllegalArgumentException at runtime due to incompatible windows.}}
>   For example, consider this compound trigger:
> {{Window.into(new GlobalWindows())}}
>  {{  .triggering(Repeatedly.forever(AfterFirst.of(}}
>  {{    AfterPane.elementCountAtLeast(10000),}}
>  {{    AfterProcessingTime.pastFirstElementInPane()}}
>  {{              .plusDelayOf(Duration.standardMinutes(10)))))}}
> {{  .discardingFiredPanes()}}
>  
>  Using that windowing without specifying sharding yields:
>   
>  {{Inputs to Flatten had incompatible triggers:}}{{Repeatedly.forever(AfterFirst.of(AfterPane.elementCountAtLeast(10000), AfterProcessingTime.pastFirstElementInPane().plusDelayOf(1 minute))),}}{{Repeatedly.forever(AfterFirst.of(AfterPane.elementCountAtLeast(1), AfterSynchronizedProcessingTime.pastFirstElementInPane()))}}
>   
>  Without explicit sharding, WriteFiles creates both a sharded and unsharded collection; the first goes through one GroupByKey while the other goes through 2. These two collections are then flattened together and they have incompatible triggers due to the double-grouped collection using a continuation trigger.
>   
>  If the user instead specifies numShards, then a different code path is followed that avoids this incompatibility.
>   
>  It looks like WriteFiles may need to be implemented differently to avoid combining collections with potentially incompatible triggers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)