You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Eila Oriel Research <ei...@orielresearch.org> on 2022/08/19 20:49:47 UTC

Unzip large file

Hi all,

I am looking to unzip a large gz file. Can I restrict the job to 1 worker
on dataflow runner and count on the order of the lines to stay as in the
original gz file? If not, what will be the easiest way to unzip the file.

p =  beam.Pipeline(options=options)
(p | 'Step 1.4.1 read gz file ' >> beam.io.ReadFromText(zip_file_name)
   | 'step 1.4.2 write file' >> beam.io.WriteToText(unaip_file_name))

Thank you,
--
Eila Arich

Founder, CEO

Oriel Research Therapeutics

https://www.orielresearch.com/blog
eila@orielresearch.com
www.orielresearch.com
Newton, MA
[image: twitter] <https://twitter.com/eilalan1>
[image: linkedin] <https://www.linkedin.com/in/eilalandkof/>
[image: instagram] <https://www.instagram.com/eilalan/>