You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Anand Inguva (Jira)" <ji...@apache.org> on 2022/04/16 01:07:00 UTC
[jira] [Assigned] (BEAM-14176) Beam dataflow hangs with requirements.txt

     [ https://issues.apache.org/jira/browse/BEAM-14176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anand Inguva reassigned BEAM-14176:
-----------------------------------

    Assignee: Anand Inguva

> Beam dataflow hangs with requirements.txt
> -----------------------------------------
>
>                 Key: BEAM-14176
>                 URL: https://issues.apache.org/jira/browse/BEAM-14176
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Ryan Thompson
>            Assignee: Anand Inguva
>            Priority: P2
>
> Similar to this question:
> https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python
> Note: I could resolve this also by using setup.py.  However, it would be nice to have a better error message instead of hanging.
>  
> When trying to use a requirements.txt file and deploy to dataflow, beam is hanging.
> Here was the following last message.
> INFO:apache_beam.runners.portability.stager:Executing command: 
> ['/Users/ryanthompson/.virtualenvs/hackathon/bin/python', '-m', 'pip', 'download', '--dest', '/var/folders/6j/0z_b3j512gd6_mszhyy5p5qc0037d6/T/dataflow-requirements-cache', '-r', '/var/folders/6j/0z_b3j512gd6_mszhyy5p5qc0037d6/T/tmp68jk51_9/tmp_requirements.txt', '--exists-action', 'i', '--no-binary', ':all:']
> Here is a program that replicates:
> import logging
> import argparse
> import apache_beam as beam
> from apache_beam import Create
> from apache_beam.options.pipeline_options import PipelineOptions, SetupOptions
> import apache_beam.io.gcp.gcsfilesystem as gcsfs
> import py_midicsv as pm
> def midi_to_csv(file_name) -> str:
> fs = gcsfs.GCSFileSystem(PipelineOptions())
> file = fs.open(file_name, 'rb')
> return pm.midi_to_csv(file)
> def run(argv=None):
> parser = argparse.ArgumentParser()
> known_args, pipeline_args = parser.parse_known_args(argv)
> # For gs testing.
> input_filenames = ['gs://clouddfe-ryanthompson/hackathon/classical/bach/bach_846.mid']
> output_name = 'gs://clouddfe-ryanthompson/hackathon/output/midi_out'
> options = PipelineOptions(pipeline_args)
> options.view_as(SetupOptions).save_main_session = True
> options.view_as(SetupOptions).requirements_file = 'pipelines/requirements.txt'
> with beam.Pipeline(options=options) as p:
> (p | Create(input_filenames)
> mapped = input_pcol | 'Read File from GCS' >> beam.Map(midi_to_csv)
> written = mapped | 'Write to output files' >> beam.Map(logging.info))
> if __name__ == '__main__':
> logging.getLogger().setLevel(logging.INFO)
> run()
>  
> Here is my requirements.txt file:
> py-midicsv
>  
> Other possibly relevant information. 
> I tested with python 3.6, on macbook, with pycharm console



--
This message was sent by Atlassian Jira
(v8.20.1#820001)