You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Charles Chen (JIRA)" <ji...@apache.org> on 2018/03/09 10:20:00 UTC
[jira] [Updated] (BEAM-1442) Performance improvement of the Python
DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated BEAM-1442:
-------------------------------
Fix Version/s: 2.4.0
> Performance improvement of the Python DirectRunner
> --------------------------------------------------
>
> Key: BEAM-1442
> URL: https://issues.apache.org/jira/browse/BEAM-1442
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Pablo Estrada
> Assignee: Charles Chen
> Priority: Major
> Labels: gsoc2017, mentor, python
> Fix For: 2.4.0
>
>
> The DirectRunner for Python and Java are intended to act as policy enforcers, and correctness checkers for Beam pipelines; but there are users that run data processing tasks in them.
> Currently, the Python Direct Runner has less-than-great performance, although some work has gone into improving it. There are more opportunities for improvement.
> Skills for this project:
> * Python
> * Cython (nice to have)
> * Working through the Beam getting started materials (nice to have)
> To start figuring out this problem, it is advisable to run a simple pipeline, and study the `Pipeline.run` and `DirectRunner.run` methods. Ask questions directly on JIRA.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)