You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Robert Bradshaw (JIRA)" <ji...@apache.org> on 2018/03/20 06:23:00 UTC

[jira] [Resolved] (BEAM-1442) Performance improvement of the Python DirectRunner

     [ https://issues.apache.org/jira/browse/BEAM-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Bradshaw resolved BEAM-1442.
-----------------------------------
    Resolution: Fixed

While one is rarely "done" with possible performance improvements, I'm going to close this bug as done for 2.4.0 due to the significant improvements that make the Python runner at least execute at reasonable speed now. 

> Performance improvement of the Python DirectRunner
> --------------------------------------------------
>
>                 Key: BEAM-1442
>                 URL: https://issues.apache.org/jira/browse/BEAM-1442
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Pablo Estrada
>            Assignee: Charles Chen
>            Priority: Major
>              Labels: gsoc2017, mentor, python
>             Fix For: 2.4.0
>
>
> The DirectRunner for Python and Java are intended to act as policy enforcers, and correctness checkers for Beam pipelines; but there are users that run data processing tasks in them.
> Currently, the Python Direct Runner has less-than-great performance, although some work has gone into improving it. There are more opportunities for improvement.
> Skills for this project:
> * Python
> * Cython (nice to have)
> * Working through the Beam getting started materials (nice to have)
> To start figuring out this problem, it is advisable to run a simple pipeline, and study the `Pipeline.run` and `DirectRunner.run` methods. Ask questions directly on JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)