You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Eila Oriel Research <ei...@orielresearch.org> on 2022/08/19 20:49:47 UTC
Unzip large file
Hi all,
I am looking to unzip a large gz file. Can I restrict the job to 1 worker
on dataflow runner and count on the order of the lines to stay as in the
original gz file? If not, what will be the easiest way to unzip the file.
p = beam.Pipeline(options=options)
(p | 'Step 1.4.1 read gz file ' >> beam.io.ReadFromText(zip_file_name)
| 'step 1.4.2 write file' >> beam.io.WriteToText(unaip_file_name))
Thank you,
--
Eila Arich
Founder, CEO
Oriel Research Therapeutics
https://www.orielresearch.com/blog
eila@orielresearch.com
www.orielresearch.com
Newton, MA
[image: twitter] <https://twitter.com/eilalan1>
[image: linkedin] <https://www.linkedin.com/in/eilalandkof/>
[image: instagram] <https://www.instagram.com/eilalan/>