You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/06/22 16:32:01 UTC

[jira] [Commented] (FLINK-2097) Add support for JobSessions

    [ https://issues.apache.org/jira/browse/FLINK-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14595987#comment-14595987 ] 

ASF GitHub Bot commented on FLINK-2097:
---------------------------------------

GitHub user mxm opened a pull request:

    https://github.com/apache/flink/pull/858

    [FLINK-2097] Implement job session management

    This is a joint effort by @StephanEwen and me to introduce a session management in Flink. Session are used to keep a copy of the ExecutionGraph in the job manager for the session lifetime. It is important that the ExecutionGraph is not kept around longer because it consumes a lot of memory. Its intermediate results can also be freed. To integrate sessions properly into Flink, some refactoring was necessary. In particular these are:
    
    - JobId is created through the ExecutionEnvironment and passed through
    - Sessions can be termined by the ExecutionEnvironment or directly through the executor
    - Session are cancelled implicitly through "reapers" or shutdown hooks in the ExecutionEnvironment, otherwise they time out
    - LocalExecutor and RemoteExecutor manage sessions
    - The Client only deals with the communication with the job manager and is agnostic of session management
    
    With the session management, we will be able to properly support backtracking of produced intermediate results. This makes calls to count()/collect()/print() efficient and enables to write incremental/interactive jobs. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mxm/flink session-dev

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/858.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #858
    
----
commit 9852d392bfe69b056596acfba001ab0a574f0ac0
Author: Maximilian Michels <mx...@apache.org>
Date:   2015-05-13T15:06:47Z

    [FLINK-2097] [core] Implement job session management.
    
    Sessions make sure that the JobManager does not immediately discard a JobGraph after execution, but keeps it
    around for further operations to be attached to the graph. That is the basis if interactive sessions.
    
    This pull request implements a rudimentary session management. Together with the backtracking #640,
    this will enable users to submit jobs to the cluster and access intermediate results. Session handling ensures that the results are cleared eventually.
    
    ExecutionGraphs are kept as long as
    - no timeout occurred or
    - the session has not been explicitly ended

commit 65464ad19d39a29d41d071b2a4524b414e297147
Author: Stephan Ewen <se...@apache.org>
Date:   2015-05-29T12:35:33Z

    [FLINK-2097] [core] Improve session management.
    
     - The Client manages only connections to the JobManager, it is not job specific
     - Executors provide a more explicit life cycle and methods to start new sessions
     - Sessions are handled by the environments
     - The environments use reapers (local) and shutdown hooks (remote) to ensure session termination
       when the environment runs out of scope

commit 6d89edd4a63fa3971c0246f46c7b8c98f3fc6c30
Author: Maximilian Michels <mx...@apache.org>
Date:   2015-06-18T14:38:09Z

    [FLINK-2097] [core] Finalize session management

----


> Add support for JobSessions
> ---------------------------
>
>                 Key: FLINK-2097
>                 URL: https://issues.apache.org/jira/browse/FLINK-2097
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager
>    Affects Versions: 0.9
>            Reporter: Stephan Ewen
>            Assignee: Maximilian Michels
>             Fix For: 0.9
>
>
> Sessions make sure that the JobManager does not immediately discard a JobGraph after execution, but keeps it around for further operations to be attached to the graph. By keeping the JobGraph around, the cached streams (intermediate data) are also kept,
> That is the way of realizing interactive sessions on top of a streaming dataflow abstraction.
> ExecutionGraphs should be kept as long as
> - no timeout occurred or
> - the session has not been explicitly ended



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)