You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Ahmet Altay (JIRA)" <ji...@apache.org> on 2017/12/22 19:21:00 UTC
[jira] [Resolved] (BEAM-2208) Python SDK wordcount on cloud
Dataflow runner is slow
[ https://issues.apache.org/jira/browse/BEAM-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ahmet Altay resolved BEAM-2208.
-------------------------------
Resolution: Cannot Reproduce
Fix Version/s: Not applicable
> Python SDK wordcount on cloud Dataflow runner is slow
> -----------------------------------------------------
>
> Key: BEAM-2208
> URL: https://issues.apache.org/jira/browse/BEAM-2208
> Project: Beam
> Issue Type: Improvement
> Components: runner-dataflow, sdk-py-core
> Affects Versions: 0.6.0
> Reporter: Anant Bhandarkar
> Assignee: Ahmet Altay
> Priority: Critical
> Fix For: Not applicable
>
>
> I have been trying to run the Beam Word count example with a 2GB file.
> When I run the Java Example for word count of this csv file the job gets completed in 7.15secs Mins.
> Job ID
> 2017-04-18_23_57_02-2832613177376293063
> But word count example with same file using Python SDK takes 28 to 35mins 2017-04-20_04_48_27-8924552896141769408
> SDK version
> Apache Beam SDK for Python 0.6.0
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)