You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Daniel Halperin (JIRA)" <ji...@apache.org> on 2016/07/19 00:09:20 UTC

[jira] [Resolved] (BEAM-434) Limit the number of output files a beam-examples execution writes

     [ https://issues.apache.org/jira/browse/BEAM-434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Halperin resolved BEAM-434.
----------------------------------
       Resolution: Fixed
         Assignee: Thomas Groh  (was: Amit Sela)
    Fix Version/s: 0.2.0-incubating

> Limit the number of output files a beam-examples execution writes
> -----------------------------------------------------------------
>
>                 Key: BEAM-434
>                 URL: https://issues.apache.org/jira/browse/BEAM-434
>             Project: Beam
>          Issue Type: Bug
>          Components: examples-java
>            Reporter: Amit Sela
>            Assignee: Thomas Groh
>            Priority: Minor
>             Fix For: 0.2.0-incubating
>
>
> When using `TextIO.Write.to("/path/to/output")` without any restrictions on the number of shards, it might generate many output files (depending on your input), for WordCount for example, you'll get as many output files as unique words in your input.
> Since I think examples are expected to execute in a friendly manner to "see" what it does and not optimize for performance in some way, I suggest to use `withoutSharding()` when writing the example output to an output file.
> Examples I could find that behave this way:
> org.apache.beam.examples.WordCount
> org.apache.beam.examples.complete.TfIdf
> org.apache.beam.examples.cookbook.DeDupExample



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)