You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Ahmet Altay (JIRA)" <ji...@apache.org> on 2017/04/10 17:00:45 UTC

[jira] [Resolved] (BEAM-680) Python Dataflow stages stale requirements-cache dependencies

     [ https://issues.apache.org/jira/browse/BEAM-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahmet Altay resolved BEAM-680.
------------------------------
       Resolution: Not A Problem
    Fix Version/s: Not applicable

> Python Dataflow stages stale requirements-cache dependencies
> ------------------------------------------------------------
>
>                 Key: BEAM-680
>                 URL: https://issues.apache.org/jira/browse/BEAM-680
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py
>            Reporter: Scott Wegner
>            Priority: Minor
>             Fix For: Not applicable
>
>
> When executing a python pipeline using a requirements.txt file, the Dataflow runner will stage all dependencies downloaded to its requirements cache directory, including those specified in the requirements.txt, and any previously cached dependencies. This results in bloated staging directory if previous pipeline runs from the same machine included different dependencies.
> Repro:
> # Initialize a virtualenv and pip install apache_beam
> # Create an empty requirements.txt file
> # Create a simple pipeline using DataflowPipelineRunner and a requirements.txt file, for example: [my_pipeline.py|https://gist.github.com/swegner/6df00df1423b48206c4ab5a7e917218a]
> # {{touch /tmp/dataflow-requirements-cache/extra-file.txt}}
> # Run the pipeline with a specified staging directory
> # Check the staged files for the job
> 'extra-file.txt' will be uploaded with the job, along with any other cached dependencies under /tmp/dataflow-requirements-cache.
> We should only be staging the dependencies necessary for a pipeline, not all previously-cached dependencies found on the machine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)