You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by zentol <gi...@git.apache.org> on 2015/12/02 14:49:00 UTC

[GitHub] flink pull request: 3057 py con

GitHub user zentol opened a pull request:

    https://github.com/apache/flink/pull/1432

    3057 py con

    This PR allows bidirectional data exchange during the plan phase.
    To do this the following changes were made:
    * replaced OneWayMappedFileConnection with a simple TCP connection
    * added an iterator to the python ExEnv to deserialize data during the plan phase 
    * added a new java Streamer class, providing the PlanBinder with a single access point to both send and receive data
    
    A proof-of-concept JobExecutionResult was added as well, providing access to getNetRuntime()
    
    Most of the changes in this PR ended up as a refactoring of related classes. The data exchange code was residing completely within the PythonStreamer, Sender and Receiver classes, regardless of the phase(plan or runtime)they are used. All these classes are now split into classes for the respective phase: PythonSender/-Receiver/-Streamer are used at runtime, PythonPlanSender/-Receiver/-Streamer during the plan phase. Additionally, these classes were 
    renamed to be more consistent, and moved into separate packages.
    
    These changes are mostly about consistency. Process spawning is **always** handled within a Streamer class, which is also **always** used as the access point for both sending and receiving data. There is also no longer a (non-obvious) overlap in methods used in both phases, which frequently caused issues when these were modified.
    
    This PR also includes a fix for FLINK-3014, switching from InputStream.read() to DataInputStream.readFully().

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zentol/flink 3057_py_con

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1432
    
----
commit 6df41cd929836f5dfa73edfec88a946deb57e1b0
Author: zentol <ch...@apache.org>
Date:   2015-11-22T15:23:29Z

    [FLINK-3057] Bidrectional plan connection

commit a91ea807235717738263eb84c3ae317f93caa7e4
Author: zentol <ch...@apache.org>
Date:   2015-11-22T15:24:30Z

    [FLINK-3014] Replace InStr.read() calls with DataInStr.readFully()

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3057][py] Bidirectional Connection

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/1432


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3057][py] Bidirectional Connection

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1432#issuecomment-169329884
  
    I've quickly scrolled through the change. I'm wondering whether we should change the development process for the Python API a bit.
    It seems that you are currently the only committer working on the Python API. Maybe it makes sense that you push changes directly to `master` if they don't touch the rest of the system. Do you think that would improve the ease of development of the python API?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3057][py] Bidirectional Connection

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1432#issuecomment-170849507
  
    Well, in practice, even with PR's, there is not much QA going on for Python.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3057][py] Bidirectional Connection

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the pull request:

    https://github.com/apache/flink/pull/1432#issuecomment-170845951
  
    @rmetzger Well it surely would speed things up, making it very tempting to go this route. QA goes out the window though, which leaves me quite torn on this one :/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---