You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Florian Scharinger (JIRA)" <ji...@apache.org> on 2016/07/03 23:44:10 UTC

[jira] [Commented] (BEAM-55) Allow users to compress FileBasedSink output files

    [ https://issues.apache.org/jira/browse/BEAM-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360708#comment-15360708 ] 

Florian Scharinger commented on BEAM-55:
----------------------------------------

We have the use case where we are writing an avro file per customer per day. Writing each file compressed should speed up writing and reading the files significantly, but still allows scalable reading as we would have thousands of files per day. At the moment our Dataflow job spends a significant time just reading the (uncompressed) files.

> Allow users to compress FileBasedSink output files
> --------------------------------------------------
>
>                 Key: BEAM-55
>                 URL: https://issues.apache.org/jira/browse/BEAM-55
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Daniel Halperin
>            Priority: Minor
>
> FileBasedSink (also TextIO.Write, AvroIO.Write, etc). does not have an option for compressing its output.
> In general, we discourage compression because it limits or blocks scalably reading from a file in parallel. However, users may want it -- so we should support the option (with appropriate warnings).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)