You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Daniel Halperin (JIRA)" <ji...@apache.org> on 2017/03/28 22:53:41 UTC

[jira] [Commented] (BEAM-1822) Improve handling of eventually-consistent filepatterns

    [ https://issues.apache.org/jira/browse/BEAM-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946122#comment-15946122 ] 

Daniel Halperin commented on BEAM-1822:
---------------------------------------

I was hoping that [BEAM-60] is the best way to handle this. Thoughts?

> Improve handling of eventually-consistent filepatterns
> ------------------------------------------------------
>
>                 Key: BEAM-1822
>                 URL: https://issues.apache.org/jira/browse/BEAM-1822
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Daniel Halperin
>
> Reading from an eventually consistent filepattern (e.g. located in a multi-regional Google Cloud Storage bucket, etc.) using FileBasedSource is dangerous, because it may silently process fewer data than the user thinks, in case not all files get returned by the match call.
> We should improve our handling of this case. I'd suggest to aim for minimizing the chance of silent data loss. Here's a couple of things we could do.
> - Let the user supply an expected number of files to be matched, and fail the pipeline if the actual number is different. For special filepatterns like XXX-of-YYY, we can autodetect the expected number.
> - Poll the filepattern for a while (perhaps for a period determined by the underlying IOChannelFactory that knows the typical eventual consistency convergence times of its filesystem), and either wait until it quiesces, or fail the pipeline if it doesn't



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)