You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Christopher Hebert (JIRA)" <ji...@apache.org> on 2017/08/07 20:23:00 UTC

[jira] [Updated] (BEAM-2751) Write PCollection elements to individual files

     [ https://issues.apache.org/jira/browse/BEAM-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christopher Hebert updated BEAM-2751:
-------------------------------------
    Description: 
I'd like to write elements as individual files.

Rather than smashing thousands of outputs into a handful of files as TextIO does (output-00000-of-00005, output-00001-of-00005,...), I want to write each element into unique files.

So if I used WholeFileIO from [BEAM-2750] to read in three files (hi.txt, what.txt, and yes.txt) then I'd like to write the processed files out to individual files with user or data-defined filenames (like hi-modified.txt, what-modified.txt, and yes-modified.txt).

With a WholeFileIO, this would look like:

{code:java}
PCollection<KV<String, Byte[]>> fileNamesAndBytes = p.apply("Read", WholeFileIO.read().from("/path/to/input/dir/*"));
...
// Do stuff that change contents and file names
...
modifedFileNamesAndBytes.apply("Write", WholeFileIO.write().to("/path/to/output/dir/"));
{code}

This ticket complements [BEAM-2750].


  was:
I'd like to write elements as individual files.

Rather than smashing thousands of outputs into a handful of files as TextIO does (output-00000-of-00005, output-00001-of-00005,...), I want to write each element into unique files.

So if I used WholeFileIO from [BEAM-2750] to read in three files (hi.txt, what.txt, and yes.txt) then I'd like to write the processed files out to individual files with user or data-defined filenames (like hi-modified.txt, what-modified.txt, and yes-modified.txt).

With a WholeFileIO, this would look like:

{code:java}
PCollection<KV<String, Byte[]>> fileNamesAndBytes = p.apply("Read", WholeFileIO.read().from("/path/to/input/dir/*"));
...
// Do stuff that change contents and file names
...
modifedFileNamesAndBytes.apply("Write", WholeFileIO.write().to("/path/to/output/dir/"));
{code}



> Write PCollection elements to individual files
> ----------------------------------------------
>
>                 Key: BEAM-2751
>                 URL: https://issues.apache.org/jira/browse/BEAM-2751
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Christopher Hebert
>            Assignee: Davor Bonaci
>
> I'd like to write elements as individual files.
> Rather than smashing thousands of outputs into a handful of files as TextIO does (output-00000-of-00005, output-00001-of-00005,...), I want to write each element into unique files.
> So if I used WholeFileIO from [BEAM-2750] to read in three files (hi.txt, what.txt, and yes.txt) then I'd like to write the processed files out to individual files with user or data-defined filenames (like hi-modified.txt, what-modified.txt, and yes-modified.txt).
> With a WholeFileIO, this would look like:
> {code:java}
> PCollection<KV<String, Byte[]>> fileNamesAndBytes = p.apply("Read", WholeFileIO.read().from("/path/to/input/dir/*"));
> ...
> // Do stuff that change contents and file names
> ...
> modifedFileNamesAndBytes.apply("Write", WholeFileIO.write().to("/path/to/output/dir/"));
> {code}
> This ticket complements [BEAM-2750].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)