You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Chesnay Schepler (JIRA)" <ji...@apache.org> on 2014/07/04 23:19:33 UTC

[jira] [Commented] (FLINK-671) Python interface for new API (Map/Reduce)

    [ https://issues.apache.org/jira/browse/FLINK-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052671#comment-14052671 ] 

Chesnay Schepler commented on FLINK-671:
----------------------------------------

Status Update:
* python processes are now shutdown in a clean way, instead of being destroyed
* trivial start script added so that the user doesn't need to specify -j /lib/stratosphere-language-binding...jar every time.
* benchmark revealed horrendous performance (~150x worse than JAPI)
** profiling revealed protocol buffer to be the cause 
** => protocol buffer completely removed
*** new protocol that works with the binary representation of data (which is luckily compatible between java and python)
*** rewrote/(finally simplified) the logic behind python's receiving end
*** spent some time optimizing stuff
*** small local wordcount (200k identical lines with 4 words each) now only 4x worse than JAPI
*** couldn't run 100GB WordCount successfully on the cluster. every time after ~18 minutes invalid data is sent to java, couldn't find the cause yet.

> Python interface for new API (Map/Reduce)
> -----------------------------------------
>
>                 Key: FLINK-671
>                 URL: https://issues.apache.org/jira/browse/FLINK-671
>             Project: Flink
>          Issue Type: Improvement
>          Components: Python API
>            Reporter: Chesnay Schepler
>            Assignee: Chesnay Schepler
>              Labels: github-import
>             Fix For: pre-apache
>
>         Attachments: pull-request-671-9139035883911146960.patch
>
>
> ([#615|https://github.com/stratosphere/stratosphere/issues/615] | [FLINK-615|https://issues.apache.org/jira/browse/FLINK-615])
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/pull/671
> Created by: [zentol|https://github.com/zentol]
> Labels: enhancement, java api, 
> Milestone: Release 0.6 (unplanned)
> Created at: Wed Apr 09 20:52:06 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)