You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Anant Bhandarkar (JIRA)" <ji...@apache.org> on 2017/05/08 08:50:04 UTC
[jira] [Created] (BEAM-2208) Apache Beam Python SDK is atleast 5
times slower
Anant Bhandarkar created BEAM-2208:
--------------------------------------
Summary: Apache Beam Python SDK is atleast 5 times slower
Key: BEAM-2208
URL: https://issues.apache.org/jira/browse/BEAM-2208
Project: Beam
Issue Type: Improvement
Components: runner-dataflow, sdk-py
Affects Versions: 0.6.0
Reporter: Anant Bhandarkar
Assignee: Daniel Halperin
Priority: Critical
I have been trying to run the Beam Word count example with a 2GB file.
When I run the Java Example for word count of this csv file the job gets completed in 7.15secs Mins.
Job ID
2017-04-18_23_57_02-2832613177376293063
But word count example with same file using Python SDK takes 28 to 35mins 2017-04-20_04_48_27-8924552896141769408
SDK version
Apache Beam SDK for Python 0.6.0
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)