You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Eugene Nikolaiev (Jira)" <ji...@apache.org> on 2021/08/11 18:28:00 UTC
[jira] [Updated] (BEAM-12741) Read multiple files keeping track of
file names (Python)
[ https://issues.apache.org/jira/browse/BEAM-12741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Nikolaiev updated BEAM-12741:
------------------------------------
Fix Version/s: (was: 2.33.0)
> Read multiple files keeping track of file names (Python)
> --------------------------------------------------------
>
> Key: BEAM-12741
> URL: https://issues.apache.org/jira/browse/BEAM-12741
> Project: Beam
> Issue Type: Improvement
> Components: io-py-files
> Affects Versions: 2.31.0
> Reporter: Eugene Nikolaiev
> Priority: P3
> Labels: io, python, text
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> When reading lines from text files with multiple patterns it is sometimes useful to keep track of the file names from which the lines originated. Example: read tab-delimited files and map their lines to column headers coming from separate files.
> It would be nice to have a {{ReadAllFromTextWithFilename}} transform, which modifies {{ReadAllFromText}} transform in a similar way as {{ReadFromTextWithFilename}} modifies the {{ReadFromText}} transform to produce tuples of file names paired with text lines.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)