You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Christopher Hebert (JIRA)" <ji...@apache.org> on 2017/08/07 20:20:00 UTC

[jira] [Updated] (BEAM-2750) Read whole files as one PCollection element each

     [ https://issues.apache.org/jira/browse/BEAM-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christopher Hebert updated BEAM-2750:
-------------------------------------
    Summary: Read whole files as one PCollection element each  (was: Read whole files as one element each)

> Read whole files as one PCollection element each
> ------------------------------------------------
>
>                 Key: BEAM-2750
>                 URL: https://issues.apache.org/jira/browse/BEAM-2750
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Christopher Hebert
>            Assignee: Davor Bonaci
>
> I'd like to read whole files as one input each.
> If my input files are hi.txt, what.txt, and yes.txt, then the whole contents of hi.txt are an element of the returned PCollection, the whole contents of what.txt are the next element, etc., giving me a PCollection with three elements.
> This contrasts with TextIO which reads a new element for every line of text in the input files.
> This read (I'll call it WholeFileIO for now) would work like so:
> {code:java}
> PCollection<KV<String, Byte[]>> fileNamesAndBytes = p.apply("Read", WholeFileIO.read().from("/path/to/input/dir/*"));
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)